EDS 223: assignment 2 rubric

Overview

Present-day environmental justice may reflect legacies of injustice in the past. The United States has a long history of racial segregation which is still visible. During the 1930’s the Home Owners’ Loan Corporation (HOLC), as part of the New Deal, rated neighborhoods based on their perceived safety for real estate investment. Their ranking system, (A (green), B (blue), C (yellow), D (red)) was then used to block access to loans for home ownership. Colloquially known as “redlining”, this practice has had widely-documented consequences not only for community wealth, but also health.¹ Redlined neighborhoods have less greenery² and are hotter than other neighborhoods.³

Check out coverage by the New York Times.

A recent study found that redlining has not only affected the environments communities are exposed to, it has also shaped our observations of biodiversity.⁴ Community or citizen science, whereby individuals share observations of species, is generating an enormous volume of data. Ellis-Soto and co-authors found that redlined neighborhoods remain the most undersampled areas across 195 US cities. This gap is highly concerning, because conservation decisions are made based on these data.

Check out coverage by EOS.

Data

EJScreen

We will be working with data from the United States Environmental Protection Agency’s EJScreen: Environmental Justice Screening and Mapping Tool.

According to the US EPA website:

This screening tool and data may be of interest to community residents or other stakeholders as they search for environmental or demographic information. It can also support a wide range of research and policy goals. The public has used EJScreen in many different locations and in many different ways.

EPA is sharing EJScreen with the public:
- to be more transparent about how we consider environmental justice in our work,
- to assist our stakeholders in making informed decisions about pursuing environmental justice and,
- to create a common starting point between the agency and the public when looking at issues related to environmental justice.

EJScreen provides on environmental and demographic information for the US at the Census tract and block group levels. You will be working with block group data that has been downloaded from the EPA site. To understand the associated data columns, you will need to explore the Technical Documentation and column description spreadsheet available in the data folder. I also encourage you to explore the limitations and caveats of the data.

Mapping Inequality

A team of researchers, led by the Digital Scholarship Lab at the University of Richmond have digitized maps and information from the HOLC as part of the Mapping Inequality project.

We will be working with maps of HOLC grade designations for Los Angeles. Information on the data can be found here.⁵

Biodiversity observations

The Global Biodiversity Information Facility is the largest aggregator of biodiversity observations in the world. Observations typically include a location and date that a species was observed.

We will be working observations of birds from 2021 onward.

Assignment

Investigate the legacy of redlining in current environmental (in)justice

library(tidyverse)
library(sf)
library(tmap)
rm(list = ls())

Read in EJScreen data and filter to Los Angeles County (5 points)

# read in EJScreen data
ejscreen <- st_read("../data/EJSCREEN_2023_BG_StatePct_with_AS_CNMI_GU_VI.gdb/") 

# filter to LA county
LA_ejscreen <- ejscreen %>%
  filter(CNTY_NAME %in% c("Los Angeles County"))

Make a map of wastewater discharge by census block groups. Indicate which census block groups are above the 95th percentile of wastewater discharge by adding a centroid. (10 points)

# create centroids of LA EJScreen data
# filter to census block groups > 95th wastewater discharge
LA_ejscreen_centroids <- st_centroid(LA_ejscreen) %>%
  filter(P_PWDIS > 95)

# make map of wastewater discharge
map1 <- tm_shape(LA_ejscreen) +
  tm_fill(fill = "PWDIS", title = "Wastewater discharge") +
  tm_graticules()

# add centroids to map
map1 +
  tm_shape(LA_ejscreen_centroids) +
  tm_dots()

Find the percent of census block groups that have:
- less than 5% of the population is considered low income (5 points)

LA_ejscreen %>%
  filter(LOWINCPCT < 0.05) %>%
  nrow()/nrow(LA_ejscreen)*100

# note: some solutions filtered to census block groups that had data
# this is not necessarily incorrect, but has a different interpretation
# (e.g. for census block groups with data, what percent have populations where < 5% are considered low income)

has_data <- LA_ejscreen %>%
  filter(!is.na(LOWINCPCT))

LA_ejscreen %>%
  filter(LOWINCPCT < 0.05) %>%
  nrow()/nrow(has_data)*100

Find the percent of census block groups that are:
- above the 80th percentile for Particulate Matter 2.5 AND
- above the 80th percentile for Superfund proximity (10 points)

LA_ejscreen %>% 
  filter(P_PM25 > 80) %>%
  filter(P_PNPL > 80) %>%
  nrow()/nrow(LA_ejscreen)*100

# note: some solutions filtered to census block groups that had data
# this is not necessarily incorrect, but has a different interpretation

has_data <- LA_ejscreen %>%
  filter(!is.na(P_PM25)) %>%
  filter(!is.na(P_PNPL))

LA_ejscreen %>% 
  filter(P_PM25 > 80) %>%
  filter(P_PNPL > 80) %>%
  nrow()/nrow(has_data)*100

Import redlining information for Los Angeles.

LA_redlining <- st_read("https://dsl.richmond.edu/panorama/redlining/static/downloads/geojson/CALosAngeles1939.geojson") %>% 
  st_make_valid()

Make a map of historical redlining boundaries, colored by HOLC grade. (5 points)

tm_shape(LA_redlining) +
  tm_fill(fill = "holc_grade", title = "HOLC grade") +
  tm_graticules()

Find the number of census block groups that fall within areas with HOLC grades hint: make sure the CRS match (15 points)

The original prompt was vague, so multiple topological relationships could be used. Below I show a few different approaches. Different topological relationships will give different answers – they are not necessarily incorrect, they simply have different interpretations.
- “intersects” returns all census block groups that overlap redlining areas (3951 CBGs).
- “within” returns only census block groups that fall inside of redlining areas (887 CBGs)

# first transform data to same CRS
LA_ejscreen <- st_transform(LA_ejscreen, crs= st_crs(LA_redlining))

# map both datasets to get a sense of their spatial pattern
plot(LA_ejscreen["ID"])
plot(LA_redlining["holc_grade"])

Approach 1

# try finding the CBGs intersecting redlining data by spatial subsetting
# the bracket operator automatically looks for "intersects"

redlining_cbgs1 <- LA_ejscreen[LA_redlining,]

# the number of census block groups
print(paste("approach 1:", nrow(redlining_cbgs1)))

Approach 2

# try using the st_intersects functions

sel_sgbp <- st_intersects(x = LA_ejscreen, y = LA_redlining) # returns binary predicate list
sel_logical = lengths(sel_sgbp) > 0 # create logical of which items do intersect
redlining_cbgs2 <- LA_ejscreen[sel_logical, ] # filter census block groups based on logicals

# the number of census block groups
print(paste("approach 2:", nrow(redlining_cbgs2)))

Approach 3

# try using st_filter with st_intersects
# in this case we need to define which topological relationship we want to use
# in this case we'll try "intersects"

redlining_cbgs3 <- LA_ejscreen %>%
  st_filter(y = LA_redlining, .predicate = st_intersects)

# the number of census block groups
print(paste("approach 3:", nrow(redlining_cbgs3)))

Approach 4

# approach 4
# try using st_filter with st_within

redlining_cbgs4 <- LA_ejscreen %>%
  st_filter(y = LA_redlining, .predicate = st_within)

# the number of census block groups
print(paste("approach 4:", nrow(redlining_cbgs4)))

Compare approaches 3 & 4

# test whether approaches 3 & 4 give the same result
# if the results don't match, compare answers

if(nrow(redlining_cbgs3) == nrow(redlining_cbgs4)){
  print("approaches 3 & 4 give the same result")
}else{
  print("approaches 3 & 4 don't give the same result")
  if(nrow(redlining_cbgs3) > nrow(redlining_cbgs4)){
      print("approach 3 returns more CBGS")
  }else{
    print("approach 4 returns more CBGS")
  }
}

# plot results from approaches 3 & 4 to investigate the difference
plot(redlining_cbgs3["ID"])
plot(redlining_cbgs4["ID"])

Approach 5

# try using st_join with st_intersects
# setting left = FALSE returns an inner join
redlining_cbgs5 <- st_join(x = LA_ejscreen, y = LA_redlining, join = st_intersects, left = FALSE) 

# we would expect the answers from this approach to match approach 3
if(nrow(redlining_cbgs3) == nrow(redlining_cbgs5)){
  print("approaches 3 & 5 give the same result")
}else{
  print("approaches 3 & 5 don't give the same result")
  if(nrow(redlining_cbgs3) > nrow(redlining_cbgs5)){
      print("approach 3 returns more CBGS")
  }else{
    print("approach 5 returns more CBGS")
  }
}

# let's try to figure out why 5 & 4 don't match

# make a map of approach 5
plot(redlining_cbgs5["ID"])
# the map looks normal... so what's going on?

# let's check the number of unique CBGs
length(unique(redlining_cbgs5$ID))

# does this match the answer from approach 4?
if(nrow(redlining_cbgs3) == length(unique(redlining_cbgs5$ID))){
  print("match!")
}else{
  print("doesn't match")
}

Approach 6

# try using st_join with st_intersects
# this time, let's set left = TRUE which returns a left join

redlining_cbgs6 <- st_join(x = LA_ejscreen, y = LA_redlining, join = st_intersects, left = TRUE) 

nrow(redlining_cbgs6)

# do the number of rows match the answer from approach 3?
if(nrow(redlining_cbgs3) == nrow(redlining_cbgs6)){
  print("match!")
}else{
  print("doesn't match")
}

# from approach 5, we figured out that maybe it's an issue with duplicate rows
length(unique(redlining_cbgs6$ID))

# does this match the answer from approach 5?
if(length(unique(redlining_cbgs6$ID)) == length(unique(redlining_cbgs5$ID))){
  print("match!")
}else{
  print("doesn't match")
}

# hmmm... what's going on?
# remember that left joins keep all rows in the "x" dataset
# we know that not all CBGs are in redlining zones, so maybe it's an issue with NAs
# let's filter out all rows without HOLC grades
redlining_cbgs6_filter <- redlining_cbgs6 %>%
  filter(!is.na(holc_grade))

nrow(redlining_cbgs6_filter)

# do the number of rows match the answer from approach 3?
if(nrow(redlining_cbgs3) == nrow(redlining_cbgs6_filter)){
  print("match!")
}else{
  print("doesn't match")
}

# but remember, the issue with the duplicates...
length(unique(redlining_cbgs6_filter$ID))

# do the number of unique IDs match the answer from approach 3?
if(nrow(redlining_cbgs3) == length(unique(redlining_cbgs6_filter$ID))){
  print("match!")
}else{
  print("doesn't match")
}

Approach 7

# try using st_join with st_within
redlining_cbgs7 <- st_join(x = LA_ejscreen, y = LA_redlining, join = st_within, left = FALSE)

nrow(redlining_cbgs7)

# we would expect this to match the answer from approach 4
if(nrow(redlining_cbgs7) == nrow(redlining_cbgs4)){
  print("match!")
}else{
  print("doesn't match")
}

There was some confusion in interpreting the question, and some folks thought it was asking for a summary of the number of CBGs within each HOLC grade. These answers will receive full credit, but we’ll walk through it to show what’s different.

Approach 8

# spatially join
redlining_cbgs8 <- st_join(LA_ejscreen, LA_redlining, join = st_intersects, left = FALSE)

# this number is again higher because CBGs may intersect multiple areas with HOLC grades
nrow(redlining_cbgs8)

# summarize the number of CBGS within each HOLC grade
# note: this will double count some CBGs within HOLC grades
redlining_cbgs8 %>% 
  group_by(holc_grade) %>% 
  count()

Summarize current conditions based on EJScreen data within historical redlining categories using the mean of the following variables:
-% low income.
- percentile for particulate Matter 2.5.
- percentile for low life expectancy.
- percentile for air toxics cancer risk (20 points)

LA_ejscreen <- st_transform(LA_ejscreen, crs= st_crs(LA_redlining))

LA <- st_intersection(LA_redlining, LA_ejscreen)

LA %>%
  group_by(holc_grade) %>%
  summarise(lowincpct = mean(LOWINCPCT, na.rm = TRUE),
            pm25 = mean(P_PM25, na.rm = TRUE),
            lifeexppct = mean(P_LIFEEXPPCT, na.rm = TRUE),
            cancer = mean(P_CANCER, na.rm = TRUE))

Investigate the legacy of redlining in biodiversity observations

For bird observations from 2022 that fall within neighborhoods with HOLC grads, find the percent of observations within each redlining categories and plot results. hint: make sure that the bird observations have the same CRS as redlining data. (20 points)

# read in bird data
gbif <- st_read("../data/gbif-birds-LA/") %>%
  filter(year == 2022) # filter to 2022

# transform data to match CRS
gbif <- st_transform(gbif, crs = st_crs(LA_redlining))

In the previous section we saw that topological relationships can give different answers. Let’s see if the same happens in this case.

Approach 1

# in the previous section we saw that topological relationships can give different answers
# let's see if the same happens in this case

# try st_join with st_intersects
gbif1 <- st_join(x = gbif, y = LA_redlining, join = st_intersects, left = FALSE)

gbif_summary1 <- gbif1 %>% 
  st_set_geometry(NULL) %>%
  group_by(holc_grade) %>%                  
  summarise(count = n()) %>%      
  mutate(percentage = (count / sum(count))*100 )

ggplot(data = gbif_summary1) +
  geom_bar(aes(x = holc_grade, y = percentage), stat = "identity") +
  labs(x = "HOLC grade", y = "Percentage of observations")

Approach 2

# approach 2
# try st_join with st_within
gbif2 <- st_join(x = gbif, y = LA_redlining, join = st_within, left = FALSE)

gbif_summary2 <- gbif2 %>% 
  st_set_geometry(NULL) %>%
  group_by(holc_grade) %>%                  
  summarise(count = n()) %>%      
  mutate(percentage = (count / sum(count))*100 )

ggplot(data = gbif_summary2) +
  geom_bar(aes(x = holc_grade, y = percentage), stat = "identity") +
  labs(x = "HOLC grade", y = "Percentage of observations")

Compare approaches 1 & 2

# the plots look similar, but let's test to see if the answers are the same

test_gbif <- left_join(gbif_summary1, gbif_summary2, by = "holc_grade") %>%
  mutate(difference = count.x - count.y)

sum(test_gbif$difference)

if(sum(test_gbif$difference) == 0){
  print("match!")
}else{
  print("doesn't match")
}

Now let’s look at an approach that didn’t work!

Approach 3

# find GBIF points that intersect with redlining zones
gbif_redlining <- gbif[LA_redlining,] 

# find GBIF points that intersect with each grade
gbif_a <- gbif_redlining[LA_redlining$holc_grade == "A",]
gbif_b <- gbif_redlining[LA_redlining$holc_grade == "B",]
gbif_c <- gbif_redlining[LA_redlining$holc_grade == "C",]
gbif_d <- gbif_redlining[LA_redlining$holc_grade == "D",]

# store results in data frame
gbif_summary3 <- data.frame("holc_grade" = c("A", "B", "C", "D"),
                            "count" = c(nrow(gbif_a),
                                        nrow(gbif_b),
                                        nrow(gbif_c),
                                        nrow(gbif_d)),
                           "percentage" = c(nrow(gbif_a)/nrow(gbif_redlining)*100,
                                            nrow(gbif_b)/nrow(gbif_redlining)*100,
                                            nrow(gbif_c)/nrow(gbif_redlining)*100,
                                            nrow(gbif_d)/nrow(gbif_redlining)*100))

ggplot(data = gbif_summary3) +
  geom_bar(aes(x = holc_grade, y = percentage), stat = "identity") +
  labs(x = "HOLC grade", y = "Percentage of observations")

# the plot looks different, but let's test 
test_gbif2 <- left_join(gbif_summary3, gbif_summary2, by = "holc_grade") %>%
  mutate(difference = count.x - count.y)

# interestingly the sum of the difference in the counts sums to zero
# (i.e. the total count of points is the same, they're just assigned to different grades)
sum(test_gbif2$difference)

# but the answers are different
min(test_gbif2$difference)

if(min(test_gbif2$difference) == 0){
  print("match!")
}else{
  print("doesn't match")
}

# Let's test what's going on
# make a map of grade A zones and points intersected with A
ggplot() +
  geom_sf(data = subset(LA_redlining, holc_grade == "A")) +
  geom_sf(data = gbif_a)
# points aren't in A grades!

# let's try again filtering sf object to A first
LA_redlining_a <- LA_redlining %>%
  filter(holc_grade == "A")
# and now intersect points
gbif_a <- gbif_redlining[LA_redlining_a,]

# make map again
ggplot() +
  geom_sf(data = subset(LA_redlining, holc_grade == "A")) +
  geom_sf(data = gbif_a)
# looks good!

Gee, G. C. (2008). A multilevel analysis of the relationship between institutional and individual racial discrimination and health status. American journal of public health, 98(Supplement_1), S48-S56.↩︎
Nardone, A., Rudolph, K. E., Morello-Frosch, R., & Casey, J. A. (2021). Redlines and greenspace: the relationship between historical redlining and 2010 greenspace across the United States. Environmental health perspectives, 129(1), 017006.↩︎
Hoffman, J. S., Shandas, V., & Pendleton, N. (2020). The effects of historical housing policies on resident exposure to intra-urban heat: a study of 108 US urban areas. Climate, 8(1), 12.↩︎
Ellis-Soto, D., Chapman, M., & Locke, D. H. (2023). Historical redlining is associated with increasing geographical disparities in bird biodiversity sampling in the United States. Nature Human Behaviour, 1-9.↩︎
Robert K. Nelson, LaDale Winling, Richard Marciano, Nathan Connolly, et al., “Mapping Inequality,” American Panorama, ed. Robert K. Nelson and Edward L. Ayers, accessed October 17, 2023, https://dsl.richmond.edu/panorama/redlining/↩︎

EDS 223: assignment 2 rubric

Ruth Oliver

2023-10-31

Overview

Data

EJScreen

Mapping Inequality

Biodiversity observations

Assignment

Investigate the legacy of redlining in current environmental (in)justice

Approach 1

Approach 2

Approach 3

Approach 4

Compare approaches 3 & 4

Approach 5

Approach 6

Approach 7

Approach 8

Investigate the legacy of redlining in biodiversity observations

Approach 1

Approach 2

Compare approaches 1 & 2

Approach 3