Exam 4 overview

Exam structure

  • 18 multiple choice (1 pt. each)
  • 4 TRUE/FALSE (0.5 pt. each)
  • 5 numerical answer (1 pt. each)

Conceptual question topics

Correlation

  • Be able to differentiate between assumptions for Pearson’s r and Spearman’s ρ.
  • Be able to look at a correlogram and note which relationships are:
    • Significant or insignificant
    • Strong or weak
    • Positive or negative

Regression

  • Be able to explain the various assumptions of OLS
  • Given a regression equation, be able to state how y changes with x (in terms of number of units)
  • Given a regression equation, be able to explain the meaning of the y-intercept.
  • Be able to explain the modifiable areal units problem (MAUP) and be familiar with examples of how it relates to quantitative methods
  • Know the ways in which performance is measured in a regression model:
    • Coefficient of determination
    • F-test and p-value
    • AIC
  • Given a regression result, be able to explain which variables are significant at a given p-value.

Spatial autocorrelation

  • Know the difference between positive and negative spatial autocorrelation
  • Given a map of local spatial autocorrelation results, be able to explain where the following are present:
    • Significant HIGH-HIGH spatial autocorrelation
    • Significant LOW-LOW spatial autocorrelation
    • Insignificant spatial autocorrelation
  • Be able to explain the various neighboring schemes used in computing spatial autocorrelation and how many neighbors each would produce given an example:
    • Queen’s, rook’s, and bishop’s contiguity
    • knn
    • Distance band
  • Given a set of results for Moran’s I, be able to explain at which bandwidths there is significant spatial autocorrelation.

Computation questions

Be prepared to make the following computations in R:

  • Correlation analysis using the cor function and the appropriate method.
  • A correlation test using the cor.test function and the appropriate method.
  • Regression analysis using the lm function using appropriate dependent and independent variables.
  • A calculation of variance inflation factor (VIF) using the vif function from the car package.

These computations will be conducted on the datasets below. Feel free to start exploring them before the exam!

  • swiss

  • OrchardSprays

  • cars (note this is different from mtcars!)

Practice questions

  1. Given the following correlogram, state the following:

    1. The strongest relationship
    2. The weakest relationship
    3. The shape of the variable drat
    4. The nature of the relationship (strong/weak/positive/negative) between mpg and disp

  1. Answer the questions associated with the following regression results:

    1. Which variables are significant at p < 0.10 in the model?
    2. What percentage of the DV is explained by the model?
    3. What is the direction of influence for the variables gear and cyl on the DV?
    4. Is the model strong or weak?
## 
## Call:
## lm(formula = mpg ~ wt + disp + cyl + drat + gear + carb, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8332 -1.5559 -0.2125  1.3036  5.5991 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 29.376002  11.069779   2.654   0.0136 *
## wt          -2.436813   1.403502  -1.736   0.0948 .
## disp        -0.002092   0.013411  -0.156   0.8773  
## cyl         -0.860657   0.865913  -0.994   0.3298  
## drat         1.104622   1.578304   0.700   0.4905  
## gear         0.813645   1.354587   0.601   0.5535  
## carb        -0.928269   0.634316  -1.463   0.1558  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.611 on 25 degrees of freedom
## Multiple R-squared:  0.8487, Adjusted R-squared:  0.8124 
## F-statistic: 23.37 on 6 and 25 DF,  p-value: 0.000000004056
  1. Answer the questions below given the following Moran’s I results:

    1. At which bandwidths are results significant at p < 0.05?
    2. Would you say there is significant spatial autocorrelation in this dataset?
## Reading layer `wi_hazards' from data source 
##   `https://gitlab.com/mhaffner/data/-/raw/master/wi_hazards.geojson' 
##   using driver `GeoJSON'
## Simple feature collection with 72 features and 16 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -92.88811 ymin: 42.49198 xmax: -86.80542 ymax: 47.08062
## Geodetic CRS:  NAD83
##   neighbors   morans_i    variance      p.value
## 1         3 0.27272946 0.019071164 0.0189062555
## 2         6 0.17419275 0.010496077 0.0330502777
## 3         9 0.17134996 0.007075910 0.0137465239
## 4        12 0.10285532 0.005377042 0.0553848442
## 5        15 0.12067017 0.004210056 0.0189088584
## 6        18 0.09224795 0.003453924 0.0352025700
## 7        21 0.10947707 0.002916663 0.0110711746
## 8        24 0.14712254 0.002516736 0.0006558609
  1. Given the following regression equation, a positive change in one unit of x3 would result in a change of how many units of y?

    Y=13.57+27.03x10.04x214.14x3

  2. Answer the questions below given the following Moran’s I results:

    1. At which bandwidths are results significant at p < 0.05?
    2. At which bandwidth is there the most significant Moran’s I results?
    3. Would you say there is significant spatial autocorrelation in this dataset?
##   neighbors      morans_i    variance   p.value
## 1         2  0.0009689922 0.010172580 0.4406772
## 2         4 -0.0675651868 0.005595502 0.7626809
## 3         6 -0.0561486963 0.003768838 0.7533869
## 4         8 -0.0814570120 0.002864538 0.8959480
## 5        10 -0.0767146129 0.002243023 0.9069843
## 6        12 -0.0271975455 0.001841091 0.6200487
## 7        14 -0.0250369521 0.001554033 0.6094290
## 8        16 -0.0319312309 0.001341049 0.6869930
  1. Given the following correlogram, answer the questions below.

    1. What is the strength and direction of the relationship between percent_black and percent_latinx?
    2. What is the correlation coefficient of the relationship between percent_hs_grad and percent_blue_collar?
    3. Which variable pairs are significantly correlated?
    4. Would the assumptions for Pearson’s r between percent_college and another variable hold? Why or why not?

  1. Given the following regression model output, answer the questions below.

    1. Which variable has the strongest effect on the dependent variable?
    2. Which variables are significant at p < 0.01?
    3. A positive change in 10 units of percent_white would have an effect on the dependent variable in how many units?
    4. Is the model significant?
    5. How would you assess the assumption of normality of residuals?
    6. How would you assess the assumption of Independence if these were spatial data?
## 
## Call:
## lm(formula = percent_college ~ percent_black + percent_white + 
##     percent_latinx + percent_hs_grad + percent_blue_collar)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.53518 -0.21972 -0.02336  0.22737  0.61165 
## 
## Coefficients:
##                     Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)          0.45489    0.02864  15.881 < 0.0000000000000002 ***
## percent_black        0.03679    0.02781   1.323              0.18913    
## percent_white       -0.02736    0.02752  -0.994              0.32276    
## percent_latinx       0.04790    0.03083   1.554              0.12358    
## percent_hs_grad     -0.01342    0.02907  -0.462              0.64538    
## percent_blue_collar -0.07976    0.03006  -2.654              0.00935 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2798 on 94 degrees of freedom
## Multiple R-squared:  0.1287, Adjusted R-squared:  0.08237 
## F-statistic: 2.777 on 5 and 94 DF,  p-value: 0.02197
  1. Given the following Local Moran’s I result, answer the following questions.

    1. Does the dataset exhibit more positive or negative spatial autocorrelation? How can you tell?
    2. Where is there the most significant spatial local autocorrelation?
    3. What is the nature of local spatial autocorrelation in the northern most zone?

  1. Using the built-in R dataset USArrests, conduct a correlation test with the variables Murder and UrbanPop. Answer the following questions:

    1. Which variable should be y and which should be x?
    2. What is the appropriate method to use?
    3. What is the correlation coefficient?
    4. What is the p-value?
  2. Using the dataset Boston from the MASS package, conduct a regression analysis with crim as the dependent variable and the variables rm, chas, indus, zn, medv, and nox.

    1. What is the p-value of the model?
    2. What is the most significant variable in the model?
    3. What is the coefficient of determination?
    4. What is the direction of influence for the variable chas?
    5. What is the direction of influence for the variable medv?
    6. Which variables possess significant multicollinearity (if any)?
    7. Which variables are not significant in the model?
    8. Are the residuals normally distributed?