Lab 8

Overview

In this assignment you will:

  • Create a scatterplot with a trend line
  • Compute correlations
  • Determine whether Pearson’s r or Spearman’s \(\rho\) is more appropriate based on a dataset’s characteristics

Correlation

You are tasked with evaluating the association between the number of vacant units (VACANT) and the number of burglaries (BREACH) in census blocks in the city of New Haven, Connecticut. Fill in the following lines of code to retrieve the data:

library(_____)

blocks <- _____("https://gitlab.com/mhaffner/data/-/raw/master/nh_blocks.geojson")
  1. State the null and alternative hypotheses (1 point).

  2. Create a scatterplot of the data. What is the direction of the relationship? Do you perceive it to be strong or weak based on this visual assessment (1 point)?

  3. Assess the assumptions for Pearson’s r and Spearman’s rho. Explain your reasoning for each, and use graphical depictions where appropriate. State which method is the most appropriate for the problem at hand (3 points).

  4. Complete the appropriate correlation test and report each of the following (2 points):

  1. The correlation coefficient (i.e., the value for r or rho)
  2. The p-value
  3. The test statistic
  4. The conclusion (assuming a 95% confidence level and a two-tailed test)
  1. Reason about whether or not the modifiable areal units problem (MAUP) affected results. If so, state how (1 point).

Questions for further exploration (optional)

Pick seven variables of interest and create a new sf dataset from the original using filter. Then, create (a) a correlation matrix and (b) a correlation or pair plot using a package such as GGally, corrplot, or ggcorrplot. Note: you will have to remove the geometry column in order for these packages to work as intended. E.g.,

blocks.df <- st_set_geometry(blocks, NULL)