In this assignment, you will gain experience answering questions using statistical methods on spatial data. Specifically, you will:
Government officials in the state of Wisconsin want to understand certain demographic and housing data across the state. They want to study three variables: number of vacant housing units, median age, and number of owner occupied housing units. They want to make predictions about the future, and they have recruited you assist them in their analyses.
Using tidycensus
, retrieve data at the tract level for the state of Wisconsin
in the year 2021 for these census variables:
B25002_003
B01002_001
B07013_002
Then, you will isolate data for two individual census tracts. You can use the code below:
library(tidycensus)
library(sf)
library(dplyr)
## retrieve census data
wi_tracts_2021 <- get_acs(geography = _____,
state = _____,
year = _____,
variables = c(_____ = "_____",
_____ = "_____",
_____ = "_____"),
geometry = TRUE,
output = "wide")
## isolate several tracts
tract_a <- _____(_____, _____ == "Census Tract 12, Eau Claire County, Wisconsin")
tract_b <- _____(_____, _____ == "Census Tract 8.01, Eau Claire County, Wisconsin")
To which variable does each variable code correspond?
What are the assumptions of using z-scores?
Neglecting the assumption of statistical independence, for only one of these three variables is the use of z-scores appropriate. Which is it? Describe your methods for determining this, and include any graphical results used.
For the variable which is appropriate for z-scores, calculate the z-score
for tract_a
. Interpret this back to the original data. What does it mean?
For the variable which is appropriate for z-scores, calculate the z-score
for tract_b
. Interpret this back to the original data. What does it mean?
For the variable which is appropriate for z-scores, what is the probability of exceeding a value of 60?
For the variable which is appropriate for z-scores, what is the probability of finding a value less than 25?
For the variable which is appropriate for z-scores, what is the probability of finding a value between 30 and 40?