The primary motivation behind this package is to make data preparation steps easier for redistricting simulation methods within R. This vignette covers a few key tasks, primarily building a block level dataset of population data, subsetting by spatial relationship, and then running a basic simulation. This is shown in the context of dividing North Rockland Central School District in NY into 7 wards at the block level.

First, we want to build a block level dataset. The school district intersects two counties, though almost all of the population in the district comes from Rockland County. create_block_table allows you to build block level datasets with the primary variables needed for redistricting purposes - total population by race and voting age population (VAP) by race.

blocksRockland <- create_block_table(state = 'NY', county = 'Rockland')
blocksOrange <- create_block_table(state = 'NY', county = 'Orange')

blocks <- bind_rows(blocksRockland, blocksOrange)

Next, we need the shape for North Rockland, which can be obtained from the R package tigris as below. The same idea holds for having target areas, such as counties or legislative districts, though they are less likely to directly use the block level data.

school <- school_districts(state = 'NY') %>% filter(str_detect(NAME, 'North Rockland'))

The immediate and common difficulty here is that we have nearly 15,000 blocks, but the target region, in this case the school district outlined in red is significantly smaller than that.

blocks %>% ggplot() + 
  geom_sf() +
  geom_sf(data = school, fill = NA, color = 'red') +
  theme_void()

As a first pass, we can use the geo_filter function, which wraps sf’s st_intersects and filters only to those that intersect.

blocks <- blocks %>% geo_filter(to = school)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar

This drops us down to 852 blocks and is a conservative filtering, as you only need to intersect at a single point.

blocks %>% mutate(id = row_number()) %>% 
  ggplot() + geom_sf() +
  geom_sf(data = school, fill = NA, color = 'red') 

Yet, we probably want to go further than that, getting rid of the various external pieces. We can use geo_trim to do just that. First, we want to check what would be thrown away at the default area threshold of 1%. Below, I’ve first checked what would be trimmed away, by setting bool = TRUE and plotting it.

blocks$trim <- blocks %>% geo_trim(to = school, bool = TRUE)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
blocks %>% ggplot() + geom_sf(aes(fill = trim)) + 
  geom_sf(data = school, fill = NA, lwd = 1.5)

To me, it looks like we are subsetting correctly with this threshold, so we actually trim away this time.

blocks <- blocks %>% geo_trim(to = school)
## although coordinates are longitude/latitude, st_intersection assumes that they are planar
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

And indeed, just plotting the block geographies looks like the school district from before.

blocks %>% ggplot() +
  geom_sf() +
  theme_void()

Very often, at this step, we want to consider including information about other geographies, particularly towns, villages, or counties. In the package, for illustration purposes, I’ve included a small towns dataset from the Rockland County GIS Office.

data("towns")

blocks %>% ggplot() +
  geom_sf() +
  theme_void() +
  geom_sf(data = towns, aes(fill = as.character(ID)))

From this, we can then try to match our blocks to towns. I use the default setting, which relies on st_centerish, which comes from st_centroid when the centroid is within the shape and otherwise relies on st_point_on_surface to ensure that we are only matching by points within a shape.

matched <- geo_match(from = blocks, to = towns)

Now, I’ve used the default tiebreaker setting, which assigns all blocks to a town, even if they do not overlap.

blocks %>% 
  ggplot() +
  geom_sf(aes(fill = as.character(matched))) +
  theme_void() +
  labs(fill = 'Match')

So, we want to make two particular edits to the outcome. First, we create a fake Orange County Town for the blocks that come from Orange County, though there are only a few dozen people who live in those blocks.

blocks <- blocks %>% mutate(TownID = matched) %>% 
  mutate(TownID = ifelse(County != '087', 8, TownID)) 

Second, we can see that one block in 7 was matched to 1 because it didn’t properly intersect the towns and thus went for the closest town by distance between their centroids. The same issue happened for two blocks in Di

Now, to figure out what’s going on and try to clean it up, we can build an adjacency graph for each block by town and see which pieces are discontinuous.

adj <- redist.adjacency(shp = blocks)

comp <- check_contiguity(adjacency = adj, group = blocks$TownID)

which(comp$component > 1)
## [1] 406 571 581 585

Then, using that information, we can figure out three of these need to be renamed.

blocks$TownID[406] <- 7
blocks$TownID[581] <- 2
blocks$TownID[585] <- 4

Now all towns are completely connected or contiguous.

comp <- check_contiguity(adjacency = adj, group = blocks$TownID)

which(comp$component > 1)
## integer(0)

So, finally, we have the data in a simulation-ready state! We can now use the adjacency list created above with redist.adjacency to run a simulation using redist.smc. See the redist package for more information about what’s going on here.

sims005 <- redist.smc(adjobj = adj, 
                   popvec = blocks$Total, 
                   popcons = 0.005, 
                   nsims = 500,
                   ndists = 7, 
                   counties = blocks$TownNumber,
                   silent = TRUE)

cds <- as.matrix(sims005) %>% unique(MARGIN  = 2)

par <- redist.parity(district_membership = cds, population = blocks$Total)

comp <- redist.compactness(shp = blocks, district_membership = cds, adjacency = adj, measure = 'EdgesRemoved')

comp_m <- comp %>% group_by(nloop) %>% summarize(mean = mean(EdgesRemoved))
## `summarise()` ungrouping output (override with `.groups` argument)
pick <- tibble(parity = par) %>% bind_cols(comp_m) %>% slice_max(order_by = mean, n = 1) %>% pull(nloop)

In short, the above uses a Sequential Monte Carlo algorithm to draw 500 compact districts that try to preserve towns. From those, I pick a map that is on average, pretty compact.

Using that choice of districts, we can then summarize the block level data, using block2prec, but in this case it is summarizing to a ward level.

smry <- block2prec(blocks, matches = cds[,pick])

Then we have the basic information we want and we can look at the VAP data to see that we have one majority minority Hispanic ward and one potential coalition ward.

smry %>% select(matches_id, starts_with('VAP'))
## # A tibble: 7 x 6
##   matches_id   VAP VAPWhite VAPBlack VAPHisp VAPOther
##        <dbl> <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
## 1          1  5214     4480      131     472      131
## 2          2  5104     4086      209     539      270
## 3          3  5192     2319      977    1519      377
## 4          4  5326     3372      573    1043      338
## 5          5  5231     3211      455    1314      251
## 6          6  4838      769      362    3572      135
## 7          7  5218     1940      539    2521      218

And finally, we can also use block2prec to join the geometries of the wards to plot:

block2prec(blocks, matches = cds[,pick], geometry = TRUE) %>% 
  ggplot() + 
  geom_sf() + 
  theme_void() +
  geom_sf_label(aes(label = matches_id))