diff --git a/README.md b/README.md index c902cb2..d66e304 100644 --- a/README.md +++ b/README.md @@ -41,7 +41,7 @@ for Rhode Island. ** Many of maup's functions behave badly in geographic projections (i.e., lat/long coordinates), which are the default for shapefiles from the U.S. Census bureau. In order to find an appropriate CRS for a particular shapefile, consult the database -at [https://epsg.org](https://epsg.org).** +at [https://epsg.org](https://epsg.org). ** ```python @@ -54,7 +54,7 @@ at [https://epsg.org](https://epsg.org).** ``` -### Assigning precincts to districts +## Assigning precincts to districts The `assign` function in `maup` takes two sets of geometries called `sources` and `targets` and returns a pandas `Series`. The Series maps each geometry in @@ -83,7 +83,7 @@ As an aside, you can use that `precinct_to_district_assignment` object to create [gerrychain](https://gerrychain.readthedocs.io/en/latest/) `Partition` representing this districting plan. -### Aggregating block data to precincts +## Aggregating block data to precincts Precinct shapefiles usually come with election data, but not demographic data. In order to study their demographics, we need to aggregate demographic data from @@ -108,15 +108,13 @@ operation: ``` If you want to move data from one set of geometries to another but your source -and target geometries do not nest neatly (e.g., have overlaps), see +geometries do not nest cleanly into your target geometries, see [Prorating data when units do not nest neatly](#prorating-data-when-units-do-not-nest-neatly). -### Disaggregating data from precincts down to blocks +## Disaggregating data from precincts down to blocks It's common to have data at a coarser scale that you want to attach to -finer-scaled geometries. Usually this happens when vote totals for a certain -election are only reported at the county level, and we want to attach that data -to precinct geometries. +finer-scale geometries. For instance, this may happen when vote totals for a certain election are only reported at the county level, and we want to attach that data to precinct geometries. Let's say we want to prorate the vote totals in the columns `"PRES16D"`, `"PRES16R"` from our `precincts` GeoDataFrame down to our `blocks` GeoDataFrame. @@ -176,7 +174,7 @@ proration process due to the zero (or NaN) values for the weights corresponding the blocks in those precincts. If it is crucial to keep vote totals perfectly accurate, these votes will need to be assigned to the new units manually. -### Prorating data when units do not nest neatly +## Prorating data when units do not nest neatly Suppose you have a shapefile of precincts with some election results data and you want to join that data onto a different, more recent precincts shapefile. @@ -184,11 +182,10 @@ The two sets of precincts will have overlaps, and will not nest neatly like the blocks and precincts did in the above examples. (Not that blocks and precincts always nest neatly---in fact, they usually don't!) -In most cases, election data should be prorated from each old precincts to the new +In most cases, election data should be prorated from each old precinct to the new precincts with weights proportional to the population of the intersections between the old precinct and each new precinct. The most straightforward way to accomplish -this is to first disaggregate the data from the old precincts to Census blocks as in -the example above, and then reaggregate from blocks to the new precincts. +this is to first disaggregate the data from the old precincts to Census blocks as in the example above, and then reaggregate from blocks to the new precincts. ```python >>> old_precincts = precincts @@ -218,7 +215,7 @@ the example above, and then reaggregate from blocks to the new precincts. ``` As a sanity check, let's make sure that no votes were lost in either step. -Total votes in the old precincts: +Total votes in the old precincts, blocks, and new precincts: ```python >>> old_precincts[election_columns].sum() SEN18D 23401 @@ -239,9 +236,7 @@ dtype: float64 Oh no - what happened??? All votes were successfully disaggregated to blocks, but a significant percentage were lost when reaggregating to new precincts. -It turns out that when blocks were assigned to both old and new precincts, many blocks -were not assigned to any precincts. We can count how many blocks were unassigned in each -case: +It turns out that when blocks were assigned to both old and new precincts, many blocks were not assigned to any precincts. We can count how many blocks were unassigned in each case: ```python print(len(blocks)) @@ -253,7 +248,7 @@ print(blocks_to_new_precincts_assignment.isna().sum()) ``` So, out of 3,014 total Census blocks, 884 were not assigned to any old precinct and -1,227 were not assigned to any new precinct. If we plot the shapefiles, we can see why: +1,227 were not assigned to any new precinct. If we plot the GeoDataFrames, we can see why: ```python >>> blocks.plot() ``` @@ -273,8 +268,7 @@ So, out of 3,014 total Census blocks, 884 were not assigned to any old precinct ![Providence new precincts](./examples/Providence_new_precincts_plot.png) The boundaries of the regions covered by these shapefiles are substantially -different---and that doesn't even get into the possibility that the precinct shapefiles -may have gaps between precinct polygons that some blocks may fall into. +different---and that doesn't even get into the possibility that the precinct shapefiles may have gaps between precinct polygons that some blocks may fall into. Once we know to look for this issue, we can see that it affected the previous example as well: @@ -299,7 +293,7 @@ moving data around between shapefiles; see below for details about how maup can help with this. -### Progress bars +## Progress bars For long-running operations, the user might want to see a progress bar to estimate how much longer a task will take (and whether to abandon it altogether). @@ -326,72 +320,156 @@ set `maup.progress.enabled = True`: ``` -### Fixing topological issues, overlaps, and gaps +## Fixing topological issues, overlaps, and gaps Precinct shapefiles are often created by stitching together collections of precinct geometries sourced from different counties or different years. As a result, the shapefile often has gaps or overlaps between precincts where the -different sources disagree about the boundaries. These gaps and overlaps pose -problems when you are interested in working with the adjacency graph of the -precincts, and not just in mapping the precincts. This adjacency information is -especially important when studying redistricting, because districts are almost -always expected to be contiguous. +different sources disagree about the boundaries. (And by "often," we mean "for almost every shapefile that isn't produced by the U.S. Census Burueau.") +As we saw in the examples above, these issues can pose problems when moving data between shapefiles. -`maup` provides functions for closing gaps and resolving overlaps in a -collection of geometries. As an example, we'll apply both functions to these -geometries, which have both an overlap and a gap: +Even when working with a single shapefile, gaps and overlaps may cause problems if you are interested in working with the adjacency graph of the precincts. +This adjacency information is especially important when studying redistricting, because districts are almost always expected to be contiguous. -![Four polygons with a gap and some overlaps](./examples/plot.png) +Before doing anything else, it is wise to understand the current status of a shapefile with regard to topological issues. `maup` provides a `doctor` function to diagnose gaps, overlaps, and invalid geometries. If a shapefile has none of these issues, `maup.doctor` returns a value of `True`; otherwise it returns `False` along with a brief summary of the problems that it found. -Usually the gaps and overlaps in real shapefiles are tiny and easy to miss, but -this exaggerated example will help illustrate the functionality. +The blocks shapefile, like most shapefiles from the Census Bureau, is clean: +```python +>>> maup.doctor(blocks) +True +``` + +The old precincts shapefile, however, has some minor issues: +```python +>>> maup.doctor(old_precincts) +There are 2 overlaps. +There are 3 holes. +False +``` + +As of version 2.0.0, `maup` provides two repair functions with a variety of options for fixing these issues: + +1. `quick_repair` is the new name for the `autorepair` function from version 1.x (and `autorepair` still works as a synonym). This function makes fairly simplistic repairs to gaps and overlaps: + * Any polygon $Q$ created by the overlapping intersection of two geometries $P_1$ and $P_2$ is removed from both polygons and reassigned to the one with which it shares the greatest perimeter. + * Any polygon $Q$ representing a gap between geometries $P_1,\ldots, P_n$ is assigned to the one with which it shares the greatest perimeter. + + This function is probably sufficient when gaps and overlaps are all very small in area relative to the areas of the geometries, **AND** when the repaired file will only be used for operations like aggregating and prorating data. But it should **NOT** be relied upon when it is important for the repaired file to accurately represent adjacency relations between neighboring geometries, such as when a precinct shapefile is used as a basis for creating districting plans with contiguous districts. + + For instance, when a gap adjoins many geometries (which happens frequently along county boundaries in precinct shapefiles!), whichever geometry the gap is adjoined to becomes "adjacent" to **all** the other geometries adjoining the gap, which can lead to the creation of discontiguous districts in plans based on the repaired shapefile. + +2. `smart_repair` is a more sophisticated repair function designed to reproduce the "true" adjacency relations between geometries as accurately as possible. In the case of gaps that adjoin several geometries, this is accomplished by an algorithm that divides the gap into pieces to be assigned to different geometries instead of assigning the entire gap to a single geometry. -First, we'll use `shapely` to create the polygons from scratch: + In addition to repairing gaps and overlaps, `smart_repair` includes two optional features: + * In many cases, the shapefile geometries are intended to nest cleanly into some larger units; e.g., in many states, precincts should nest cleanly into counties. `smart_repair` allows the user to optionally specify a second shapefile---e.g., a shapefile of county boundaries within a state---and then performs the repair process so that the repaired geometries nest cleanly into the units in the second shapefile. + * Whether as a result of inaccurate boundaries in the original map or as an artifact of the repair algorithm, it may happen that some units share boundaries with very short perimeter but should actually be considered "queen adjacent"---i.e., intersecting at only a single point---rather than "rook adjacent"---i.e., intersecting along a boundary of positive length. `smart_repair` includes an optional step in which all rook adjacencies of length below a user-specified parameter are converted to queen adjacencies. +`smart_repair` can accept either a GeoSeries or GeoDataFrame as input, and the output type will be the same as the input type. The input must be projected to a non-geographic coordinate reference system (CRS)---i.e., **not** lat/long coordinates---in order to have sufficient precision for the repair. One option is to reproject a GeoDataFrame called `gdf` to a suitable UTM (Universal Transverse Mercator) projection via + ```python -from shapely.geometry import Polygon -geometries = geopandas.GeoSeries([ - Polygon([(0, 0), (2, 0), (2, 1), (1, 1), (1, 2), (0, 2)]), - Polygon([(2, 0), (4, 0), (4, 2), (2, 2)]), - Polygon([(0, 2), (2, 2), (2, 4), (0, 4)]), - Polygon([(2, 1), (4, 1), (4, 4), (2, 4)]), -]) +gdf = gdf.to_crs(gdf.estimate_utm_crs()) ``` -Now we'll close the gap: + +At a minimum, all overlaps will be repaired in the output. Optional arguments include: + * `snapped` (default value `True`): If `True`, all polygon vertices are snapped to a grid of size no more than $10^{-10}$ times the maximum of width/height of the entire shapefile extent. **HIGHLY RECOMMENDED** to avoid topological exceptions due to rounding errors. + * `fill_gaps` (default value `True`): If `True`, all simply connected gaps with area less than `fill_gaps_threshold` times the largest area of all geometries adjoining the gap are filled. Default threshold is $0.1$; setting `fill_gaps_threshold = None` will fill all simply connected gaps. + * `nest_within_regions` (default value `None`): If `nest_within_regions` is a secondary GeoSeries or GeoDataFrame of region boundaries (e.g., counties within a state) then the repair will be performed so that repaired geometries nest cleanly into the region boundaries; specifically, each repaired geometry will be contained in the region with which the original geometry has the largest area of intersection. Note that the CRS for the region GeoSeries/GeoDataFrame must be the same as that for the primary input. + * `min_rook_length` (default value `None`): If `min_rook_length` is given a numerical value, all rook adjacencies with length below this value will be replaced with queen adjacencies. Note that this is an absolute value and not a relative value, so make sure that the value provided is in the correct units with respect to the input GeoSeries/GeoDataFrame's CRS. + + +### Examples + +#### First, we'll use `shapely` and `geopandas` to create a GeoDataFrame of "toy precincts" from scratch. ```python -without_gaps = maup.close_gaps(geometries) +import random +import geopandas +import maup +from shapely.geometry import Polygon + +random.seed(2023) # For reproducibility + +ppolys = [] +for i in range(4): + for j in range(4): + poly = Polygon( + [(0.5*i + 0.1*k, 0.5*j + (random.random() - 0.5)/12) for k in range(6)] + + [(0.5*(i+1) + (random.random() - 0.5)/12, 0.5*j + 0.1*k) for k in range(1,6)] + + [(0.5*(i+1) - 0.1*k, 0.5*(j+1) + (random.random() - 0.5)/12) for k in range(1,6)] + + [(0.5*i + (random.random() - 0.5)/12, 0.5*(j+1) - 0.1*k) for k in range(1,5)] + ) + ppolys.append(poly) + +toy_precincts_df = geopandas.GeoDataFrame(geometry = geopandas.GeoSeries(ppolys)) +toy_precincts_df.plot(cmap = "tab20", alpha=0.7) ``` -The `without_gaps` geometries look like this: +![toy_precincts](./examples/toy_precincts.png) + +Check for gaps and overlaps: +```python +>>> maup.doctor(old_precincts) +There are 28 overlaps. +There are 23 holes. +False +``` +All the gaps between geometries in this example are below the default threshold, so a basic application of `smart_repair` will resolve all overlaps and fill all gaps: -![Four polygons with two overlapping](./examples/plot_without_gaps.png) +```python +toy_precincts_repaired_df = maup.smart_repair(toy_precincts_df) +toy_precincts_repaired_df.plot(cmap = "tab20", alpha=0.7) +``` -And then resolve the overlaps: +![toy_precincts_repaired](./examples/toy_precincts_repaired.png) +We can check that the repair succeeded: ```python -without_overlaps_or_gaps = maup.resolve_overlaps(without_gaps) +>>> maup.doctor(old_precincts) +True ``` -The `without_overlaps_or_gaps` geometries look like this: +Now suppose that the precincts are intended to nest cleanly into the following "toy counties:" -![Four squares](./examples/plot_without_gaps_or_overlaps.png) +```python +cpoly1 = Polygon([(0,0), (1,0), (1,1), (0,1)]) +cpoly2 = Polygon([(1,0), (2,0), (2,1), (1,1)]) +cpoly3 = Polygon([(0,1), (1,1), (1,2), (0,2)]) +cpoly4 = Polygon([(1,1), (2,1), (2,2), (1,2)]) -Alternatively, there is also a convenience `maup.autorepair()` function provided that -attempts to resolve topological issues as well as close gaps and resolve overlaps: +toy_counties_df = geopandas.GeoDataFrame(geometry = geopandas.GeoSeries([cpoly1, cpoly2, cpoly3, cpoly4])) + +toy_counties_df.plot(cmap='tab20') +``` +![toy_counties](./examples/toy_counties.png) +We can perform a "county-aware" repair as follows: ```python -without_overlaps_or_gaps = maup.autorepair(geometries) +toy_precincts_repaired_county_aware_df = maup.smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df) +toy_precincts_repaired_county_aware_df.plot(cmap = "tab20", alpha=0.7) ``` +![toy_precincts_repaired_county_aware](./examples/toy_precincts_repaired_county_aware.png) + +Next, suppose that we'd like to get rid of small rook adjacencies at corner points where 4 precincts meet. We might reasonably estimate that these all have length less than $0.1$, so we can accomplish this as follows: +```python +toy_precincts_repaired_county_aware_rook_to_queen_df = maup.smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df, min_rook_length = 0.1) +toy_precincts_repaired_county_aware_rook_to_queen_df.plot(cmap = "tab20", alpha=0.7) +``` +![toy_precincts_repaired_county_aware_rook_to_queen](./examples/toy_precincts_repaired_county_aware_rook_to_queen.png) + +The difference is hard to see, so let's zoom in on gap between the 4 original precincts in the upper left-hand corner. + +Original precincts: + +![toy_precincts_corner](./examples/toy_precincts_corner.png) + +County-aware repair: + +![toy_precincts_corner_repaired](./examples/toy_precincts_corner_repaired.png) + +County-aware repair with rook adjacency converted to queen: + +![toy_precincts_corner_repaired_rook_to_queen](./examples/toy_precincts_corner_repaired_rook_to_queen.png) -The functions `resolve_overlaps`, `close_gaps`, and `autorepair` accept a -`relative_threshold` argument. This threshold controls how large of a gap or -overlap the function will attempt to fix. The default value of -`relative_threshold` is `0.1`, which means that the functions will leave alone -any gap/overlap whose area is more than 10% of the area of the geometries that -might absorb that gap/overlap. In the above example, we set -`relative_threshold=None` to ensure that no gaps or overlaps were ignored. ## Modifiable areal unit problem @@ -401,3 +479,6 @@ the same spatial data will look different depending on how you divide up the space. Since `maup` is all about changing the way your data is aggregated and partitioned, we have named it after the MAUP to encourage users to use the toolkit thoughtfully and responsibly. + + + diff --git a/docs/conf.py b/docs/conf.py index 54f87d3..12e33d8 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -23,7 +23,7 @@ author = 'Max Hully, Max Fan' # The full version, including alpha/beta/rc tags -release = '1.1.0' +release = '2.0.0' # -- General configuration --------------------------------------------------- diff --git a/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.cpg b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.dbf b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.dbf new file mode 100644 index 0000000..139ebe9 Binary files /dev/null and b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.dbf differ diff --git a/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.prj b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shp b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shp new file mode 100644 index 0000000..0578b52 Binary files /dev/null and b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shp differ diff --git a/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shx b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shx new file mode 100644 index 0000000..adac2c1 Binary files /dev/null and b/examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shx differ diff --git a/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.cpg b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.dbf b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.dbf new file mode 100644 index 0000000..35ef81d Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.dbf differ diff --git a/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.prj b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shp b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shp new file mode 100644 index 0000000..7e78829 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shp differ diff --git a/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shx b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shx new file mode 100644 index 0000000..7c68ace Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shx differ diff --git a/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.cpg b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.dbf b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.dbf new file mode 100644 index 0000000..35ef81d Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.dbf differ diff --git a/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.prj b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shp b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shp new file mode 100644 index 0000000..42c1456 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shp differ diff --git a/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shx b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shx new file mode 100644 index 0000000..e70552b Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shx differ diff --git a/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.cpg b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.dbf b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.dbf new file mode 100644 index 0000000..c8cab30 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.dbf differ diff --git a/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.prj b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shp b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shp new file mode 100644 index 0000000..f36c7df Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shp differ diff --git a/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shx b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shx new file mode 100644 index 0000000..d378b2d Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shx differ diff --git a/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.cpg b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.dbf b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.dbf new file mode 100644 index 0000000..c8cab30 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.dbf differ diff --git a/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.prj b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shp b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shp new file mode 100644 index 0000000..d164c4d Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shp differ diff --git a/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shx b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shx new file mode 100644 index 0000000..6980186 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shx differ diff --git a/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.cpg b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.dbf b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.dbf new file mode 100644 index 0000000..e2563ed Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.dbf differ diff --git a/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.prj b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shp b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shp new file mode 100644 index 0000000..f355e45 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shp differ diff --git a/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shx b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shx new file mode 100644 index 0000000..98b53e3 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shx differ diff --git a/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.cpg b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.dbf b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.dbf new file mode 100644 index 0000000..e2563ed Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.dbf differ diff --git a/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.prj b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shp b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shp new file mode 100644 index 0000000..4651131 Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shp differ diff --git a/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shx b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shx new file mode 100644 index 0000000..0b7a57a Binary files /dev/null and b/examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shx differ diff --git a/examples/Shapefiles/bad_gap_region/bad_gap_region.cpg b/examples/Shapefiles/bad_gap_region/bad_gap_region.cpg new file mode 100644 index 0000000..cd89cb9 --- /dev/null +++ b/examples/Shapefiles/bad_gap_region/bad_gap_region.cpg @@ -0,0 +1 @@ +ISO-8859-1 \ No newline at end of file diff --git a/examples/Shapefiles/bad_gap_region/bad_gap_region.dbf b/examples/Shapefiles/bad_gap_region/bad_gap_region.dbf new file mode 100644 index 0000000..b6d6650 Binary files /dev/null and b/examples/Shapefiles/bad_gap_region/bad_gap_region.dbf differ diff --git a/examples/Shapefiles/bad_gap_region/bad_gap_region.prj b/examples/Shapefiles/bad_gap_region/bad_gap_region.prj new file mode 100644 index 0000000..471fc41 --- /dev/null +++ b/examples/Shapefiles/bad_gap_region/bad_gap_region.prj @@ -0,0 +1 @@ +PROJCS["NAD_1983_StatePlane_Colorado_Central_FIPS_0502_Feet",GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic"],PARAMETER["False_Easting",3000000.0],PARAMETER["False_Northing",1000000.0],PARAMETER["Central_Meridian",-105.5],PARAMETER["Standard_Parallel_1",39.75],PARAMETER["Standard_Parallel_2",38.45],PARAMETER["Latitude_Of_Origin",37.8333333333333],UNIT["US survey foot",0.304800609601219]] \ No newline at end of file diff --git a/examples/Shapefiles/bad_gap_region/bad_gap_region.shp b/examples/Shapefiles/bad_gap_region/bad_gap_region.shp new file mode 100644 index 0000000..d5cc0c4 Binary files /dev/null and b/examples/Shapefiles/bad_gap_region/bad_gap_region.shp differ diff --git a/examples/Shapefiles/bad_gap_region/bad_gap_region.shx b/examples/Shapefiles/bad_gap_region/bad_gap_region.shx new file mode 100644 index 0000000..a772190 Binary files /dev/null and b/examples/Shapefiles/bad_gap_region/bad_gap_region.shx differ diff --git a/examples/toy_counties.png b/examples/toy_counties.png new file mode 100644 index 0000000..c5d83a8 Binary files /dev/null and b/examples/toy_counties.png differ diff --git a/examples/toy_precincts.png b/examples/toy_precincts.png new file mode 100644 index 0000000..7e01ba8 Binary files /dev/null and b/examples/toy_precincts.png differ diff --git a/examples/toy_precincts_corner.png b/examples/toy_precincts_corner.png new file mode 100644 index 0000000..f5f8183 Binary files /dev/null and b/examples/toy_precincts_corner.png differ diff --git a/examples/toy_precincts_corner_repaired.png b/examples/toy_precincts_corner_repaired.png new file mode 100644 index 0000000..6e44c6d Binary files /dev/null and b/examples/toy_precincts_corner_repaired.png differ diff --git a/examples/toy_precincts_corner_repaired_rook_to_queen.png b/examples/toy_precincts_corner_repaired_rook_to_queen.png new file mode 100644 index 0000000..8738286 Binary files /dev/null and b/examples/toy_precincts_corner_repaired_rook_to_queen.png differ diff --git a/examples/toy_precincts_repaired.png b/examples/toy_precincts_repaired.png new file mode 100644 index 0000000..c41f8f3 Binary files /dev/null and b/examples/toy_precincts_repaired.png differ diff --git a/examples/toy_precincts_repaired_county_aware.png b/examples/toy_precincts_repaired_county_aware.png new file mode 100644 index 0000000..f1e7f17 Binary files /dev/null and b/examples/toy_precincts_repaired_county_aware.png differ diff --git a/examples/toy_precincts_repaired_county_aware_rook_to_queen.png b/examples/toy_precincts_repaired_county_aware_rook_to_queen.png new file mode 100644 index 0000000..3789848 Binary files /dev/null and b/examples/toy_precincts_repaired_county_aware_rook_to_queen.png differ diff --git a/maup/__init__.py b/maup/__init__.py index 626cdea..5daa866 100644 --- a/maup/__init__.py +++ b/maup/__init__.py @@ -3,11 +3,11 @@ from .assign import assign from .indexed_geometries import IndexedGeometries from .intersections import intersections, prorate -from .repair import close_gaps, autorepair, snap_to_grid, crop_to, doctor, resolve_overlaps +from .repair import close_gaps, resolve_overlaps, quick_repair, snap_to_grid, crop_to, expand_to, doctor +from .smart_repair import smart_repair from .normalize import normalize from .progress_bar import progress - # warn about https://github.com/geopandas/geopandas/issues/2199 if geopandas.options.use_pygeos: raise ImportError( @@ -16,7 +16,7 @@ "`geopandas.options.use_pygeos = False` before importing your shapefile." ) -__version__ = "1.1.3" +__version__ = "2.0.0" __all__ = [ "adjacencies", "assign", @@ -24,11 +24,13 @@ "intersections", "prorate", "close_gaps", - "autorepair", "resolve_overlaps", + "quick_repair", "snap_to_grid", "crop_to", + "expand_to", "doctor", + "smart_repair", "normalize", "progress" -] \ No newline at end of file +] diff --git a/maup/adjacencies.py b/maup/adjacencies.py index 3836eb0..60ba85e 100644 --- a/maup/adjacencies.py +++ b/maup/adjacencies.py @@ -1,8 +1,7 @@ import warnings from geopandas import GeoSeries, GeoDataFrame - -from shapely import make_valid +from shapely import make_valid from .indexed_geometries import IndexedGeometries, get_geometries from .progress_bar import progress @@ -26,12 +25,13 @@ def iter_adjacencies(geometries): for j, inter in inters.items(): yield (i, j), inter + def adjacencies( geometries, adjacency_type="rook", output_type="geoseries", *, warn_for_overlaps=True, warn_for_islands=True ): - """Returns adjacencies between geometries. + """Returns adjacencies between geometries. The default return type is a `GeoSeries` with a `MultiIndex`, whose (i, j)th entry is the pairwise intersection between geometry `i` and geometry `j`. We ensure that @@ -48,20 +48,20 @@ def adjacencies( orig_crs = geometries.crs geometries = get_geometries(geometries) geometries = make_valid(geometries) - + adjs = list(iter_adjacencies(geometries)) if adjs: index, geoms = zip(*adjs) else: - index, geoms = [[],[]] - + index, geoms = [[], []] + if output_type == "geodataframe": - inters = GeoDataFrame({"neighbors" : index, "geometry" : geoms}) + inters = GeoDataFrame({"neighbors" : index, "geometry" : geoms}, crs = geometries.crs) else: - inters = GeoSeries(geoms, index=index) + inters = GeoSeries(geoms, index=index, crs=geometries.crs) if adjacency_type == "rook": - inters = inters[inters.length > 0].copy() + inters = inters[inters.length > 0] if warn_for_overlaps: overlaps = inters[inters.area > 0] diff --git a/maup/assign.py b/maup/assign.py index 3a63c76..968fae8 100644 --- a/maup/assign.py +++ b/maup/assign.py @@ -24,7 +24,7 @@ def assign(sources, targets): dtype="float" ) assignment.update(assignments_by_area) - + # TODO: add a warning here if there are still unassigned source geometries. return assignment.astype(targets.index.dtype, errors="ignore") diff --git a/maup/indexed_geometries.py b/maup/indexed_geometries.py index 8070076..9f02abc 100644 --- a/maup/indexed_geometries.py +++ b/maup/indexed_geometries.py @@ -1,8 +1,6 @@ import pandas import geopandas -# Added numpy import to handle output of STRtree query import numpy -import warnings from shapely.prepared import prep from shapely.strtree import STRtree @@ -21,8 +19,7 @@ def __init__(self, geometries): self.spatial_index = STRtree(self.geometries) self.index = self.geometries.index - - def query(self, geometry): + def query(self, geometry): # IMPORTANT: When "geometry" is multi-part, this query will return a # (2 x n) array instead of a (1 x n) array, so it's safest to flatten the query # output before proceeding. @@ -31,12 +28,12 @@ def query(self, geometry): relevant_geometries = self.geometries.iloc[relevant_indices] return relevant_geometries - def intersections(self, geometry): - relevant_geometries = self.query(geometry) + def intersections(self, geometry): + relevant_geometries = self.query(geometry) intersections = relevant_geometries.intersection(geometry) return intersections[-(intersections.is_empty | intersections.isna())] - def covered_by(self, container): + def covered_by(self, container): relevant_geometries = self.query(container) prepared_container = prep(container) @@ -46,7 +43,7 @@ def covered_by(self, container): selected_geometries = relevant_geometries.apply(prepared_container.covers) return relevant_geometries[selected_geometries] - def assign(self, targets): + def assign(self, targets): target_geometries = get_geometries(targets) groups = [ self.covered_by(container).apply(lambda x: container_index) @@ -54,29 +51,25 @@ def assign(self, targets): target_geometries.items(), len(target_geometries) ) ] + # New in pandas 2.1.2: Only concatenate Series of positive length + groups = [group for group in groups if len(group) > 0] if groups: - # New in pandas 2.1.2: Only concatenate Series of positive length - groups = [group for group in groups if len(group) > 0] - if groups: - groups_concat = pandas.concat(groups) - # New in pandas 2.1.2: No reindexing allowed with a non-unique Index, - # so we need to find and remove repetitions. (This only happens when the - # targets have overlaps and some source is completely covered by more - # than one target.) - # Any that get removed here will be randomly assigned to one of the - # covering units at the assign_by_area step ub maup.assign. - groups_concat_index_list = list(groups_concat.index) - seen = set() - bad_indices = list(set([x for x in groups_concat_index_list if x in seen or seen.add(x)])) - if len(bad_indices)>0: - groups_concat = groups_concat.drop(bad_indices) - return groups_concat.reindex(self.index) - else: - return geopandas.GeoSeries().reindex(self.index) + groups_concat = pandas.concat(groups) + # New in pandas 2.1.2: No reindexing allowed with a non-unique Index, + # so we need to find and remove repetitions. (This only happens when the + # targets have overlaps and some source is completely covered by more + # than one target.) + # Any that get removed here will be randomly assigned to one of the + # covering units at the assign_by_area step ub maup.assign. + groups_concat_index_list = list(groups_concat.index) + seen = set() + bad_indices = list(set([x for x in groups_concat_index_list if x in seen or seen.add(x)])) + if len(bad_indices) > 0: + groups_concat = groups_concat.drop(bad_indices) + return groups_concat.reindex(self.index) else: return geopandas.GeoSeries().reindex(self.index) - def enumerate_intersections(self, targets): target_geometries = get_geometries(targets) for i, target in progress(target_geometries.items(), len(target_geometries)): diff --git a/maup/intersections.py b/maup/intersections.py index 40fd87a..a9ec53f 100644 --- a/maup/intersections.py +++ b/maup/intersections.py @@ -7,35 +7,26 @@ @require_same_crs - def intersections(sources, targets, output_type="geoseries", area_cutoff=None): - """ - Computes all of the nonempty intersections between two sets of geometries. - - By default, the returned `~geopandas.GeoSeries` will have a MultiIndex, where the - geometry at index *(i, j)* is the intersection of ``sources[i]`` and ``targets[j]`` - (if it is not empty). - + """Computes all of the nonempty intersections between two sets of geometries. + By default, the returned `~geopandas.GeoSeries` will have a MultiIndex, where the + geometry at index *(i, j)* is the intersection of ``sources[i]`` and ``targets[j]`` + (if it is not empty). If output_type == "geodataframe", the return type is a range-indexed GeoDataFrame - with "source" and "target" columns containing the indices i,j, respectively, for the - intersection of ``sources[i]`` and ``targets[j]``. - + with "source" and "target" columns containing the indices i,j, respectively, for the + intersection of ``sources[i]`` and ``targets[j]`` :param sources: geometries :type sources: :class:`~geopandas.GeoSeries` or :class:`~geopandas.GeoDataFrame` :param targets: geometries :type targets: :class:`~geopandas.GeoSeries` or :class:`~geopandas.GeoDataFrame` - :param output_type: type of output, "geoseries" or "geodataframe" - :type output_type: str + :rtype: :class:`~geopandas.GeoSeries` :param area_cutoff: (optional) if provided, only return intersections with area greater than ``area_cutoff`` :type area_cutoff: Number or None - :rtype: :class:`~geopandas.GeoSeries` or :class:`~geopandas.GeoDataFrame` """ - reindexed_sources = get_geometries_with_range_index(sources) reindexed_targets = get_geometries_with_range_index(targets) - spatially_indexed_sources = IndexedGeometries(reindexed_sources) records = [ @@ -45,11 +36,11 @@ def intersections(sources, targets, output_type="geoseries", area_cutoff=None): reindexed_targets ) ] - + df = GeoDataFrame.from_records(records, columns=["source", "target", "geometry"]) df = df.sort_values(by=["source", "target"]).reset_index(drop=True) df.crs = sources.crs - + geometries = df.set_index(["source", "target"]).geometry geometries.sort_index(inplace=True) geometries.crs = sources.crs diff --git a/maup/repair.py b/maup/repair.py index c5aa2d0..712514e 100644 --- a/maup/repair.py +++ b/maup/repair.py @@ -1,350 +1,392 @@ -import math -import pandas -import functools -import warnings - -from geopandas import GeoSeries, GeoDataFrame -from shapely.geometry import MultiPolygon, Polygon -from shapely.ops import unary_union -from shapely import make_valid - -from .adjacencies import adjacencies -from .assign import assign_to_max -from .crs import require_same_crs -from .indexed_geometries import get_geometries -from .intersections import intersections - - -""" -Some of these functions are based on the functions in Mary Barker's -check_shapefile_connectivity.py script in @gerrymandr/Preprocessing. -""" - -# IMPORTANT TO BE AWARE OF FOR FUTURE UPDATES: -# The old version of this file used buffer(0) to simplify geometries, but this only -# works properly for polygons. For 1-D objects such as LineStrings, it kills them off -# completely - and this resulted in some pretty disastrous choices in the -# absorb_by_shared_perimeter function when ALL the perimeters simplified to zero and -# the choice of which geometry to absorb into was essentially random! -# In this version, buffer(0) has been replaced by Shapely 2.0's make_valid function, -# which is MUCH better behaved - EXCEPT that when applied to a GeoSeries it apparently -# removes the CRS - which then creates problems for functions that use @require_same_crs. -# So here we need to be careful throughout to reassign the correct CRS to a GeoSeries -# after applying the make_valid function. - - -class AreaCroppingWarning(UserWarning): - pass - - -def holes_of_union(geometries): - """Returns any holes in the union of the given geometries.""" - geometries = get_geometries(geometries) - if not all( - isinstance(geometry, (Polygon, MultiPolygon)) for geometry in geometries - ): - raise TypeError(f"Must be a Polygon or MultiPolygon (got types {set([x.geom_type for x in geometries])})!") - - union = unary_union(geometries) - series = holes(union) - series.crs = geometries.crs - return series - - -def holes(geometry): - if isinstance(geometry, MultiPolygon): - return GeoSeries( - [ - Polygon(list(hole.coords)) - for polygon in geometry.geoms - for hole in polygon.interiors - ] - ) - elif isinstance(geometry, Polygon): - return GeoSeries([Polygon(list(hole.coords)) for hole in geometry.interiors]) - else: - raise TypeError("geometry must be a Polygon or MultiPolygon to have holes") - - - - -def close_gaps(geometries, relative_threshold=0.1): - """Closes gaps between geometries by assigning the hole to the polygon - that shares the most perimeter with the hole. - - If the area of the gap is greater than `relative_threshold` times the - area of the polygon, then the gap is left alone. The default value - of `relative_threshold` is 0.1. This is intended to preserve intentional - gaps while closing the tiny gaps that can occur as artifacts of - geospatial operations. Set `relative_threshold=None` to attempt close all - gaps. Due to floating point precision issues, all gaps may not be closed. - """ - geometries = get_geometries(geometries) - gaps = holes_of_union(geometries) - return absorb_by_shared_perimeter( - gaps, geometries, relative_threshold=relative_threshold - ) - - -def resolve_overlaps(geometries, relative_threshold=0.1): - """For any pair of overlapping geometries, assigns the overlapping area to the - geometry that shares the most perimeter with the overlap. Returns the GeoSeries - of geometries, which will have no overlaps. - - If the ratio of the overlap's area to either of the overlapping geometries' - areas is greater than `relative_threshold`, then the overlap is ignored. - The default `relative_threshold` is `0.1`. This default is chosen to include - tiny overlaps that can be safely auto-fixed while preserving major overlaps - that might indicate deeper issues and should be handled on a case-by-case - basis. Set `relative_threshold=None` to attempt to resolve all overlaps. Due - to floating point precision issues, all overlaps may not be resolved. - """ - geometries = get_geometries(geometries) - inters = adjacencies(geometries, warn_for_islands=False, warn_for_overlaps=False) - overlaps = inters[inters.area > 0].make_valid() - - if relative_threshold is not None: - left_areas, right_areas = split_by_level(geometries.area, overlaps.index) - under_threshold = ((overlaps.area / left_areas) < relative_threshold) & ( - (overlaps.area / right_areas) < relative_threshold - ) - overlaps = overlaps[under_threshold] - - if len(overlaps) == 0: - return geometries - - to_remove = GeoSeries( - pandas.concat([overlaps.droplevel(1), overlaps.droplevel(0)]), crs=overlaps.crs - ) - with_overlaps_removed = geometries.apply(lambda x: x.difference(unary_union(to_remove))) - - return absorb_by_shared_perimeter( - overlaps, with_overlaps_removed, relative_threshold=None - ) - -def autorepair(geometries, relative_threshold=0.1): - """ - Applies all the tricks in `maup.repair` with default args. Should work by default. - The default `relative_threshold` is `0.1`. This default is chosen to include - tiny overlaps that can be safely auto-fixed while preserving major overlaps - that might indicate deeper issues and should be handled on a case-by-case - basis. Set `relative_threshold=None` to attempt to resolve all overlaps. See - `resolve_overlaps()` and `close_gaps()` for more. - """ - orig_crs = geometries.crs - geometries = get_geometries(geometries) - - geometries = remove_repeated_vertices(geometries).make_valid() - geometries = resolve_overlaps(geometries, relative_threshold=relative_threshold).make_valid() - geometries = close_gaps(geometries, relative_threshold=relative_threshold).make_valid() - - return geometries - - -def remove_repeated_vertices(geometries): - """ - Removes repeated vertices. Vertices are considered to be repeated if they - appear consecutively, excluding the start and end points. - """ - return geometries.geometry.apply(lambda x: apply_func_to_polygon_parts(x, dedup_vertices)) - - -def snap_to_grid(geometries, n=-7): - """ - Snap the geometries to a grid by rounding to the nearest 10^n. Helps to - resolve floating point precision issues in shapefiles. - """ - func = functools.partial(snap_polygon_to_grid, n=n) - return geometries.geometry.apply(lambda x: apply_func_to_polygon_parts(x, func)) - - -@require_same_crs -def crop_to(source, target): - """ - Crops the source geometries to the target geometries. - """ - target_union = unary_union(get_geometries(target)) - cropped_geometries = get_geometries(source).apply(lambda x: x.intersection(target_union)) - - if (cropped_geometries.area == 0).any(): - warnings.warn("Some cropped geometries have zero area, likely due to\n"+ - "large differences in the union of the geometries in your\n"+ - "source and target shapefiles. This may become an issue\n"+ - "when maupping.\n", - AreaCroppingWarning - ) - - return cropped_geometries - -@require_same_crs -def expand_to(source, target): - """ - Expands the source geometries to the target geometries. - """ - geometries = get_geometries(source).make_valid() - - source_union = unary_union(geometries) - - leftover_geometries = get_geometries(target).apply(lambda x: x - source_union) - leftover_geometries = leftover_geometries[~leftover_geometries.is_empty].explode(index_parts=False) - - geometries = absorb_by_shared_perimeter( - leftover_geometries, get_geometries(source), relative_threshold=None - ) - - return geometries - - - -def doctor(source, target=None): - """ - Detects quality issues in a given set of source and target geometries. Quality - issues include overlaps, gaps, invalid geometries, repeated verticies, non-perfect - tiling, and not entirely overlapping source and targets. If `maup.doctor()` returns - `True`, votes should not be lost when prorating or assigning (beyond a few due to - rounding, etc.). Passing a target to doctor is optional. - """ - shapefiles = [source] - source_union = unary_union(get_geometries(source)) - - # Adding "health_check" variable to return instead of using assertions. - health_check = True - - if target is not None: - shapefiles.append(target) - - target_union = unary_union(get_geometries(target)) - sym_area = target_union.symmetric_difference(source_union).area - - if sym_area != 0: - print("The unions of target and source differ!") - health_check = False - - for shp in shapefiles: - if not shp.geometry.apply(lambda x: isinstance(x, (Polygon, MultiPolygon))).all(): - print("Some rows do not have geometries.") - health_check = False - - overlaps = count_overlaps(shp) - holes = len(holes_of_union(shp)) - - if overlaps != 0: - print("There are", overlaps, "overlaps.") - health_check = False - if holes != 0: - print("There are", holes, "holes.") - health_check = False - if not shp.is_valid.all(): - print("There are some invalid geometries.") - health_check = False - - return health_check - - -def count_overlaps(shp): - """ - Counts overlaps. Code is taken directly from the resolve_overlaps function in maup. - """ - inters = adjacencies(shp.geometry, warn_for_islands=False, warn_for_overlaps=False) - overlaps = inters[inters.area > 0].make_valid() - return len(overlaps) - - -def count_holes(shp): - gaps = holes_of_union(shp.geometry) - return(len(gaps)) - - -def apply_func_to_polygon_parts(shape, func): - if isinstance(shape, Polygon): - return func(shape) - elif isinstance(shape, MultiPolygon): - return MultiPolygon([func(poly) for poly in shape.geoms]) - else: - raise TypeError(f"Can only apply {func} to a Polygon or MultiPolygon (got {shape} with type {type(shape)})!") - - -def dedup_vertices(polygon): - if len(polygon.interiors) == 0: - deduped_vertices = [] - for c, p in enumerate(list(polygon.exterior.coords)): - if c == 0: - deduped_vertices.append(p) - elif p != deduped_vertices[-1]: - deduped_vertices.append(p) - return Polygon(deduped_vertices) - - else: - deduped_vertices_exterior = [] - for c, p in enumerate(list(polygon.exterior.coords)): - if c == 0: - deduped_vertices_exterior.append(p) - elif p != deduped_vertices_exterior[-1]: - deduped_vertices_exterior.append(p) - - deduped_vertices_interiors = [] - for interior_ring in polygon.interiors: - deduped_vertices_this_ring = [] - for c, p in enumerate(list(interior_ring.coords)): - if c == 0: - deduped_vertices_this_ring.append(p) - elif p != deduped_vertices_this_ring[-1]: - deduped_vertices_this_ring.append(p) - deduped_vertices_interiors.append(deduped_vertices_this_ring) - return Polygon(deduped_vertices_exterior, holes = deduped_vertices_interiors) - - -def snap_polygon_to_grid(polygon, n=-7): - if len(polygon.interiors) == 0: - return Polygon([(round(x, -n), round(y, -n)) for x, y in polygon.exterior.coords]) - else: - return Polygon([(round(x, -n), round(y, -n)) for x, y in polygon.exterior.coords], holes = [[(round(x, -n), round(y, -n)) for x, y in interior_ring.coords] for interior_ring in polygon.interiors]) - - -def split_by_level(series, multiindex): - return tuple( - multiindex.get_level_values(i).to_series(index=multiindex).map(series) - for i in range(multiindex.nlevels) - ) - - -@require_same_crs -def absorb_by_shared_perimeter(sources, targets, relative_threshold=None): - if len(sources) == 0: - return targets - - if len(targets) == 0: - raise IndexError("targets must be nonempty") - - inters = intersections(sources, targets, area_cutoff=None).make_valid() - - assignment = assign_to_max(inters.length) - - if relative_threshold is not None: - under_threshold = ( - sources.area / assignment.map(targets.area) - ) < relative_threshold - assignment = assignment[under_threshold] - - sources_to_absorb = GeoSeries( - sources.groupby(assignment).apply(unary_union), crs=sources.crs, - ) - - # Note that the following line produces a warning message when sources_to_absorb - # and targets have different indices: - - # "lib/python3.11/site-packages/geopandas/base.py:31: UserWarning: The indices of - # the two GeoSeries are different. - # warn("The indices of the two GeoSeries are different.") - - # This difference in indices is expected since not all target geometries may have sources - # to absorb, so it would be nice to remove this warning. - result = targets.union(sources_to_absorb) - - # The .union call only returns the targets who had a corresponding - # source to absorb. Now we fill in all of the unchanged targets. - result = result.reindex(targets.index) - did_not_absorb = result.isna() | result.is_empty - result.loc[did_not_absorb] = targets[did_not_absorb] - - return result +import functools +import warnings + +import pandas + +from geopandas import GeoSeries +from shapely.geometry import Polygon, MultiPolygon, LineString, MultiLineString +from shapely.ops import unary_union + +from .adjacencies import adjacencies +from .assign import assign_to_max +from .crs import require_same_crs +from .indexed_geometries import get_geometries +from .intersections import intersections + + +""" +Some of these functions are based on the functions in Mary Barker's +check_shapefile_connectivity.py script in @gerrymandr/Preprocessing. +""" + +# IMPORTANT TO BE AWARE OF FOR FUTURE UPDATES: +# A previous version of this file used buffer(0) to simplify geometries, but this only +# works properly for polygons. For 1-D objects such as LineStrings, it kills them off +# completely - and this resulted in some pretty disastrous choices in the +# absorb_by_shared_perimeter function when ALL the perimeters simplified to zero and +# the choice of which geometry to absorb into was essentially random! +# In this version, buffer(0) has been replaced by Shapely 2.0's make_valid function, +# which is MUCH better behaved - EXCEPT that when applied to a GeoSeries via +# geoseries = make_valid(geoseries) +# it apparently removes the CRS - which then creates problems for functions that use +# @require_same_crs. +# Note that it appears to work correctly if we use the format +# geoseries = geoseries.make_valid() + + +class AreaCroppingWarning(UserWarning): + pass + + +def holes_of_union(geometries): + """Returns any holes in the union of the given geometries.""" + geometries = get_geometries(geometries) + if not all( + isinstance(geometry, (Polygon, MultiPolygon)) for geometry in geometries + ): + raise TypeError(f"Must be a Polygon or MultiPolygon (got types {set([x.geom_type for x in geometries])})!") + + union = unary_union(geometries) + series = holes(union) + series.crs = geometries.crs + return series + + +def holes(geometry): + """Returns any holes in a Polygon or MultiPolygon.""" + if isinstance(geometry, MultiPolygon): + return GeoSeries( + [ + Polygon(list(hole.coords)) + for polygon in geometry.geoms + for hole in polygon.interiors + ] + ) + elif isinstance(geometry, Polygon): + return GeoSeries([Polygon(list(hole.coords)) for hole in geometry.interiors]) + else: + raise TypeError("geometry must be a Polygon or MultiPolygon to have holes") + + +def close_gaps(geometries, relative_threshold=0.1): + """Closes gaps between geometries by assigning the hole to the polygon + that shares the greatest perimeter with the hole. + + If the area of the gap is greater than `relative_threshold` times the + area of the polygon, then the gap is left alone. The default value + of `relative_threshold` is 0.1. This is intended to preserve intentional + gaps while closing the tiny gaps that can occur as artifacts of + geospatial operations. Set `relative_threshold=None` to attempt close all + gaps. Due to floating point precision issues, all gaps may not be closed. + """ + geometries = get_geometries(geometries) + gaps = holes_of_union(geometries) + return absorb_by_shared_perimeter( + gaps, geometries, relative_threshold=relative_threshold + ) + + +def resolve_overlaps(geometries, relative_threshold=0.1): + """For any pair of overlapping geometries, assigns the overlapping area to the + geometry that shares the greatest perimeter with the overlap. Returns the GeoSeries + of geometries, which will have no overlaps. + + If the ratio of the overlap's area to either of the overlapping geometries' + areas is greater than `relative_threshold`, then the overlap is ignored. + The default `relative_threshold` is `0.1`. This default is chosen to include + tiny overlaps that can be safely auto-fixed while preserving major overlaps + that might indicate deeper issues and should be handled on a case-by-case + basis. Set `relative_threshold=None` to attempt to resolve all overlaps. Due + to floating point precision issues, all overlaps may not be resolved. + """ + geometries = get_geometries(geometries) + inters = adjacencies(geometries, warn_for_islands=False, warn_for_overlaps=False) + overlaps = inters[inters.area > 0].make_valid() + + if relative_threshold is not None: + left_areas, right_areas = split_by_level(geometries.area, overlaps.index) + under_threshold = ((overlaps.area / left_areas) < relative_threshold) & ( + (overlaps.area / right_areas) < relative_threshold + ) + overlaps = overlaps[under_threshold] + + if len(overlaps) == 0: + return geometries + + to_remove = GeoSeries( + pandas.concat([overlaps.droplevel(1), overlaps.droplevel(0)]), crs=overlaps.crs + ) + with_overlaps_removed = geometries.apply(lambda x: x.difference(unary_union(to_remove))) + + return absorb_by_shared_perimeter( + overlaps, with_overlaps_removed, relative_threshold=None + ) + + +def quick_repair(geometries, relative_threshold=0.1): + """ + New name for autorepair function from Maup 1.x. + Uses simplistic algorithms to repair most gaps and overlaps. + + The default `relative_threshold` is `0.1`. This default is chosen to include + tiny overlaps that can be safely auto-fixed while preserving major overlaps + that might indicate deeper issues and should be handled on a case-by-case + basis. Set `relative_threshold=None` to attempt to resolve all overlaps. See + `resolve_overlaps()` and `close_gaps()` for more. + + For a more careful repair that takes adjacencies and higher-order overlaps + between geometries into account, consider using smart_repair instead. + """ + return autorepair(geometries, relative_threshold=relative_threshold) + + +def autorepair(geometries, relative_threshold=0.1): + """ + Uses simplistic algorithms to repair most gaps and overlaps. + + The default `relative_threshold` is `0.1`. This default is chosen to include + tiny overlaps that can be safely auto-fixed while preserving major overlaps + that might indicate deeper issues and should be handled on a case-by-case + basis. Set `relative_threshold=None` to attempt to resolve all overlaps. See + `resolve_overlaps()` and `close_gaps()` for more. + + For a more careful repair that takes adjacencies and higher-order overlaps + between geometries into account, consider using smart_repair instead. + """ + geometries = get_geometries(geometries) + + geometries = remove_repeated_vertices(geometries).make_valid() + geometries = resolve_overlaps(geometries, relative_threshold=relative_threshold).make_valid() + geometries = close_gaps(geometries, relative_threshold=relative_threshold).make_valid() + + return geometries + + +def remove_repeated_vertices(geometries): + """ + Removes repeated vertices. Vertices are considered to be repeated if they + appear consecutively, excluding the start and end points. + """ + return geometries.geometry.apply(lambda x: apply_func_to_polygon_parts(x, dedup_vertices)) + + +def snap_to_grid(geometries, n=-7): + """ + Snap the geometries to a grid by rounding to the nearest 10^n. Helps to + resolve floating point precision issues in shapefiles. + """ + func = functools.partial(snap_polygon_to_grid, n=n) + return geometries.geometry.apply(lambda x: apply_func_to_polygon_parts(x, func)) + + +@require_same_crs +def crop_to(source, target): + """ + Crops the source geometries to the target geometries. + """ + target_union = unary_union(get_geometries(target)) + cropped_geometries = get_geometries(source).apply(lambda x: x.intersection(target_union)) + + if (cropped_geometries.area == 0).any(): + warnings.warn("Some cropped geometries have zero area, likely due to\n" + + "large differences in the union of the geometries in your\n" + + "source and target shapefiles. This may become an issue\n" + + "when maupping.\n", + AreaCroppingWarning + ) + + return cropped_geometries + + +@require_same_crs +def expand_to(source, target): + """ + Expands the source geometries to the target geometries. + """ + geometries = get_geometries(source).make_valid() + + source_union = unary_union(geometries) + + leftover_geometries = get_geometries(target).apply(lambda x: x - source_union) + leftover_geometries = leftover_geometries[~leftover_geometries.is_empty].explode(index_parts=False) + + geometries = absorb_by_shared_perimeter( + leftover_geometries, get_geometries(source), relative_threshold=None + ) + + return geometries + + +def doctor(source, target=None, silent=False, accept_holes=False): + """ + Detects quality issues in a given set of source and target geometries. Quality + issues include overlaps, gaps, invalid geometries, non-perfect + tiling, and not entirely overlapping source and targets. If `maup.doctor()` returns + `True`, votes should not be lost when prorating or assigning (beyond a few due to + rounding, etc.). Passing a target to doctor is optional. + + If silent is True, then print outputs are suppressed. (Default is silent = False.) + + If accept_holes is True, then holes alone do not cause doctor to return a value of + False. (Default is accept_holes = False.) + """ + shapefiles = [source] + source_union = unary_union(get_geometries(source)) + + health_check = True + + if target is not None: + shapefiles.append(target) + + target_union = unary_union(get_geometries(target)) + sym_area = target_union.symmetric_difference(source_union).area + + if sym_area != 0: + if silent is False: + print("The unions of target and source differ!") + health_check = False + + for shp in shapefiles: + if not shp.geometry.apply(lambda x: isinstance(x, (Polygon, MultiPolygon))).all(): + if silent is False: + print("Some rows do not have geometries.") + health_check = False + + overlaps = count_overlaps(shp) + num_holes = len(holes_of_union(shp)) + + if overlaps != 0: + if silent is False: + print("There are", overlaps, "overlaps.") + health_check = False + if num_holes != 0: + if silent is False: + print("There are", num_holes, "holes.") + if accept_holes is False: + health_check = False + if not shp.is_valid.all(): + if silent is False: + print("There are some invalid geometries.") + health_check = False + + return health_check + + +def count_overlaps(shp): + """ + Counts overlaps between geometries. + Code is taken directly from the resolve_overlaps function in maup. + """ + inters = adjacencies(shp.geometry, warn_for_islands=False, warn_for_overlaps=False) + overlaps = inters[inters.area > 0].make_valid() + return len(overlaps) + + +def count_holes(shp): + """ + Counts gaps between geometries. + """ + gaps = holes_of_union(shp.geometry) + return len(gaps) + + +def apply_func_to_polygon_parts(shape, func): + if isinstance(shape, Polygon): + return func(shape) + elif isinstance(shape, MultiPolygon): + return MultiPolygon([func(poly) for poly in shape.geoms]) + else: + raise TypeError(f"Can only apply {func} to a Polygon or MultiPolygon (got {shape} with type {type(shape)})!") + + +def dedup_vertices(polygon): + if len(polygon.interiors) == 0: + deduped_vertices = [] + for c, p in enumerate(list(polygon.exterior.coords)): + if c == 0: + deduped_vertices.append(p) + elif p != deduped_vertices[-1]: + deduped_vertices.append(p) + return Polygon(deduped_vertices) + + else: + deduped_vertices_exterior = [] + for c, p in enumerate(list(polygon.exterior.coords)): + if c == 0: + deduped_vertices_exterior.append(p) + elif p != deduped_vertices_exterior[-1]: + deduped_vertices_exterior.append(p) + + deduped_vertices_interiors = [] + for interior_ring in polygon.interiors: + deduped_vertices_this_ring = [] + for c, p in enumerate(list(interior_ring.coords)): + if c == 0: + deduped_vertices_this_ring.append(p) + elif p != deduped_vertices_this_ring[-1]: + deduped_vertices_this_ring.append(p) + deduped_vertices_interiors.append(deduped_vertices_this_ring) + return Polygon(deduped_vertices_exterior, holes=deduped_vertices_interiors) + + +def snap_polygon_to_grid(polygon, n=-7): + if len(polygon.interiors) == 0: + return Polygon([(round(x, -n), round(y, -n)) for x, y in polygon.exterior.coords]) + else: + return Polygon([(round(x, -n), round(y, -n)) for x, y in polygon.exterior.coords], holes=[[(round(x, -n), round(y, -n)) for x, y in interior_ring.coords] for interior_ring in polygon.interiors]) + + +def snap_multilinestring_to_grid(multilinestring, n=-7): + if multilinestring.geom_type == "LineString": + return LineString([(round(x, -n), round(y, -n)) for x, y in multilinestring.coords]) + elif multilinestring.geom_type == "MultiLineString": + return MultiLineString([LineString([(round(x, -n), round(y, -n)) for x, y in linestring.coords]) for linestring in multilinestring.geoms]) + + +def split_by_level(series, multiindex): + return tuple( + multiindex.get_level_values(i).to_series(index=multiindex).map(series) + for i in range(multiindex.nlevels) + ) + + +@require_same_crs +def absorb_by_shared_perimeter(sources, targets, relative_threshold=None): + if len(sources) == 0: + return targets + + if len(targets) == 0: + raise IndexError("targets must be nonempty") + + inters = intersections(sources, targets, area_cutoff=None).make_valid() + + assignment = assign_to_max(inters.length) + + if relative_threshold is not None: + under_threshold = ( + sources.area / assignment.map(targets.area) + ) < relative_threshold + assignment = assignment[under_threshold] + + sources_to_absorb = GeoSeries( + sources.groupby(assignment).apply(unary_union), crs=sources.crs, + ) + + # Note that the following line produces a warning message when sources_to_absorb + # and targets have different indices: + + # "lib/python3.11/site-packages/geopandas/base.py:31: UserWarning: The indices of + # the two GeoSeries are different. + # warn("The indices of the two GeoSeries are different.") + + # This difference in indices is expected since not all target geometries may have sources + # to absorb, so it would be nice to remove this warning. + result = targets.union(sources_to_absorb) + + # The .union call only returns the targets who had a corresponding + # source to absorb. Now we fill in all of the unchanged targets. + result = result.reindex(targets.index) + did_not_absorb = result.isna() | result.is_empty + result.loc[did_not_absorb] = targets[did_not_absorb] + + return result diff --git a/maup/smart_repair.py b/maup/smart_repair.py new file mode 100644 index 0000000..6d06900 --- /dev/null +++ b/maup/smart_repair.py @@ -0,0 +1,1591 @@ +import math +import warnings +from collections import deque + +import numpy +import pandas +import shapely + +from geopandas import GeoSeries, GeoDataFrame +from shapely import make_valid, extract_unique_points +from shapely.strtree import STRtree +from shapely.ops import unary_union, polygonize, linemerge, nearest_points +from shapely.geometry import Polygon, MultiPolygon, Point, MultiPoint, LineString, MultiLineString +from shapely.geometry.polygon import orient +from tqdm import tqdm, TqdmWarning + +from .adjacencies import adjacencies +from .assign import assign +from .indexed_geometries import get_geometries +from .intersections import intersections +from .progress_bar import progress +from .repair import doctor, snap_to_grid + +warnings.filterwarnings('ignore', 'GeoSeries.isna', UserWarning) +warnings.filterwarnings("ignore", category=TqdmWarning) + +pandas.options.mode.chained_assignment = None + + +""" +Some of these functions are based on the functions in Mary Barker's +check_shapefile_connectivity.py script in @gerrymandr/Preprocessing. + +Updated functions for maup 2.0.0 were written by Jeanne Clelland. +""" + +######### +# MAIN REPAIR FUNCTION +######### + + +def smart_repair(geometries_df, snapped=True, fill_gaps=True, fill_gaps_threshold=0.1, + disconnection_threshold=0.0001, nest_within_regions=None, + min_rook_length=None): + """ + Repairs topology issues (overlaps, gaps, invalid polygons) in a geopandas + GeoDataFrame or GeoSeries, with an emphasis on preserving intended adjacency + relations between geometries as closely as possible. + + Specifically, the algorithm + (1) Applies shapely.make_valid to all polygon geometries. + (2) If snapped = True (default), snaps all polygon vertices to a grid of size no + more than 10^(-10) times the max of width/height of the entire shapefile extent. + HIGHLY RECOMMENDED to avoid topological exceptions due to rounding errors. + (3) Resolves all overlaps. + (4) If fill_gaps = True (default), closes all simply connected gaps with area + less than fill_gaps_threshold times the largest area of all geometries adjoining + the gap. Default threshold is 10%; if fill_gaps_threshold = None then all + simply connected gaps will be filled. + (5) If nest_within_regions is a secondary shapefile of region boundaries (e.g., + counties in a state) then all of the above will be performed so that repaired + geometries nest cleanly into the region boundaries; each repaired geometry + will be contained in the region with which the original geometry has the largest + area of intersection. Default value is None. + (6) If min_rook_length is given a numerical value, replaces all rook adjacencies + with length below this value with queen adjacencies. Note that this is an + absolute value and not a relative value, so make sure that the value provided + is in the correct units with respect to the shapefile's CRS. + Default value is None. + (7) Sometimes the repair process creates tiny fragments that are disconnected from + the district that they are assigned to. A final cleanup step assigns any such + fragments to a neighboring geometry if their area is less than + disconnection_threshold times the area of the largest connected component of + their assigned geometry. Default threshold is 0.01%, and this seems to work + well in practice. + """ + + # Keep a copy of the original input for comparisons later! + if isinstance(geometries_df, GeoSeries): + orig_input_type = "geoseries" + geometries_df = GeoDataFrame(geometry=geometries_df) + geometries0_df = geometries_df.copy() + elif isinstance(geometries_df, GeoDataFrame): + orig_input_type = "geodataframe" + geometries_df = geometries_df.copy() + geometries0_df = geometries_df.copy() + else: + raise TypeError("Input geometries must be in the form of a geopandas GeoSeries or GeoDataFrame.") + + # Ensure that geometries are 2-D and not 3-D: + for i in geometries_df.index: + geometries_df["geometry"][i] = shapely.wkb.loads( + shapely.wkb.dumps(geometries_df["geometry"][i], output_dimension=2)) + + # Ensure that crs is not geographic: + if geometries_df.crs is not None: + if geometries_df.crs.is_geographic: + raise Exception("Input geometries must be in a projected, non-geographic CRS. To project a GeoDataFrame 'gdf' to UTM, use 'gdf = gdf.to_crs(gdf.estimate_utm_crs())' ") + + # If nest_within_regions is not None, require it to have the same CRS as the main shapefile + # and set regions_df equal to a GeoDataFrame version. + # nest_within_regions is None, set regions_df equal to None so we can use it as a parameter later. + if nest_within_regions is None: + regions_df = None + else: + if isinstance(nest_within_regions, GeoSeries): + regions_df = GeoDataFrame(geometry=nest_within_regions) + elif isinstance(nest_within_regions, GeoDataFrame): + regions_df = nest_within_regions.copy() + else: + raise TypeError("nest_within_regions must be a geopandas GeoSeries or GeoDataFrame.") + + if nest_within_regions.crs != geometries_df.crs: + raise Exception("nest_within_regions must be in the same CRS as the geometries being repaired.") + if doctor(nest_within_regions, silent=True, accept_holes=True) is False: + raise Exception("nest_within_regions must be topologically clean---i.e., all geometries must be valid and there must be no overlaps between geometries. Generally the best source for region shapefiles is the U.S. Census Burueau.") + + # Before doing anything else, make sure all polygons are valid and remove any + # LineStrings and MultiLineStrings. + for i in geometries_df.index: + geometries_df["geometry"][i] = make_valid(geometries_df["geometry"][i]) + if geometries_df["geometry"][i].geom_type == "GeometryCollection": + geometries_df["geometry"][i] = unary_union([x for x in geometries_df["geometry"][i].geoms if x.geom_type in ("Polygon", "MultiPolygon")]) + + # If snapped is True, snap all polygon vertices to a grid of size no more than + # 10^(-10) times the max of width/height of the entire shapefile extent. + # (For instance, in Texas this would be less than 1/100th of an inch.) + # This avoids a rare "non-noded intersection" error due to a GEOS bug and leaves + # several orders of magnitude for additional intersection operations before hitting + # python's precision limit of about 10^(-15). + if snapped: + # These bounds are in the form (xmin, ymin, xmax, ymax) + geometries_total_bounds = geometries_df.total_bounds + largest_bound = max(geometries_total_bounds[2] - geometries_total_bounds[0], geometries_total_bounds[3] - geometries_total_bounds[1]) + snap_magnitude = int(math.log10(largest_bound)) - 10 + geometries_df["geometry"] = snap_to_grid(geometries_df["geometry"], n=snap_magnitude) + if nest_within_regions is not None: + regions_df["geometry"] = snap_to_grid(regions_df["geometry"], n=snap_magnitude) + + # Snapping could possibly have created some invalid polygons, so do another round + # of validity checks - and do a validity check for regions as well, if applicable. + for i in geometries_df.index: + geometries_df["geometry"][i] = make_valid(geometries_df["geometry"][i]) + if geometries_df["geometry"][i].geom_type == "GeometryCollection": + geometries_df["geometry"][i] = unary_union([x for x in geometries_df["geometry"][i].geoms if x.geom_type in ("Polygon", "MultiPolygon")]) + if nest_within_regions is not None: + for i in regions_df.index: + regions_df["geometry"][i] = make_valid(regions_df["geometry"][i]) + if regions_df["geometry"][i].geom_type == "GeometryCollection": + regions_df["geometry"][i] = unary_union([x for x in regions_df["geometry"][i].geoms if x.geom_type in ("Polygon", "MultiPolygon")]) + print("Snapping all geometries to a grid with precision 10^(", snap_magnitude, ") to avoid GEOS errors.") + + # Construct data about overlaps of all orders, plus holes. + overlap_tower, holes_df = building_blocks(geometries_df, nest_within_regions=regions_df) + + # Use data from the overlap tower to rebuild geometries with no overlaps. + # If nest_within_regions is not None, resolve overlaps and fill holes (if applicable) + # for each region separately. + + if nest_within_regions is None: + print("Resolving overlaps...") + reconstructed_df = reconstruct_from_overlap_tower(geometries_df, overlap_tower) + + # Use data about the holes to fill holes if applicable. + if fill_gaps: + # First remove any holes above the relative area threshold (if any). + # Also remove any non-simply connected holes since our algorithm breaks + # down in that case, regardless of whether or not a relative area + # threshold has been set. + holes_df, num_holes_dropped = drop_bad_holes(reconstructed_df, holes_df, fill_gaps_threshold=fill_gaps_threshold) + if num_holes_dropped > 0: + print(num_holes_dropped, "gaps will remain unfilled, because they either are not simply connected or exceed the area threshold.") + + print("Filling gaps...") + reconstructed_df = smart_close_gaps(reconstructed_df, holes_df) + + else: + if fill_gaps: + print("Resolving overlaps and filling gaps...") + else: + print("Resolving overlaps...") + + reconstructed_df = geometries_df.copy() + geometries_to_regions_assignment = assign(geometries_df.geometry, regions_df.geometry) + + for r_ind in nest_within_regions.index: + geometries_this_region_indices = [g_ind for g_ind in geometries_df.index if geometries_to_regions_assignment[g_ind] == r_ind] + geometries_this_region_df = geometries_df.loc[geometries_this_region_indices] + + overlap_tower_this_region = [] + for i in range(len(overlap_tower)): + overlap_tower_this_region.append(overlap_tower[i][overlap_tower[i]["region"] == r_ind]) + + reconstructed_this_region_df = reconstruct_from_overlap_tower(geometries_this_region_df, overlap_tower_this_region, nested=True) + + if fill_gaps: + holes_this_region_df = holes_df[holes_df["region"] == r_ind] + # First remove any holes above the relative area threshold (if any). + # Also remove any non-simply connected holes since our algorithm breaks + # down in that case, regardless of whether or not a relative area + # threshold has been set. + holes_this_region_df, num_holes_dropped_this_region = drop_bad_holes(reconstructed_this_region_df, holes_this_region_df, fill_gaps_threshold=fill_gaps_threshold) + if num_holes_dropped_this_region > 0: + print(num_holes_dropped_this_region, "gaps in region", r_ind, "will remain unfilled, because they either are not simply connected or exceed the area threshold.") + + reconstructed_this_region_df = smart_close_gaps(reconstructed_this_region_df, holes_this_region_df) + + reconstructed_df["geometry"].loc[list(reconstructed_this_region_df.index)] = reconstructed_this_region_df["geometry"] + + # Check for geometries that have become (more) disconnected, generally with an extra + # component of negligible area. If any are found and the area is negligible, + # reassign to an adjacent geometry by shared perimeter. + # If the area is not negligible, leave it alone and report it so that the user + # can decide what to do about it. + + disconnected_df = reconstructed_df[reconstructed_df["geometry"].apply(lambda x: x.geom_type != "Polygon")] + + # This will include geometries that were disconnected in the original; need to + # filter by whether they got worse. + + if len(disconnected_df) > 0: + disconnected_poly_indices = [] + for ind in disconnected_df.index: + if num_components(reconstructed_df["geometry"][ind]) > num_components(geometries0_df["geometry"][ind]): + disconnected_poly_indices.append(ind) + + if len(disconnected_poly_indices) > 0: + # These are the ones (if any) that got worse. + geometries = get_geometries(reconstructed_df) + spatial_index = STRtree(geometries) + index_by_iloc = dict((i, list(geometries.index)[i]) for i in range(len(geometries.index))) + + for g_ind in disconnected_poly_indices: + excess = num_components(reconstructed_df["geometry"][g_ind]) - num_components(geometries0_df["geometry"][g_ind]) + component_num_list = list(range(len(reconstructed_df["geometry"][g_ind].geoms))) + component_areas = [] + + for c_ind in range(len(reconstructed_df["geometry"][g_ind].geoms)): + component_areas.append((c_ind, reconstructed_df["geometry"][g_ind].geoms[c_ind].area)) + + component_areas_sorted = sorted(component_areas, key=lambda tup: tup[1]) + big_area = max([reconstructed_df["geometry"][g_ind].area, geometries0_df["geometry"][g_ind].area]) + + for i in range(excess): + # Check whether the ith smallest component has small enough area, and if + # so find a better polygon to add it to. + c_ind = component_areas_sorted[i][0] + this_fragment = reconstructed_df["geometry"][g_ind].geoms[c_ind] + if component_areas_sorted[i][1] < disconnection_threshold*big_area: + possible_intersect_integer_indices = [*set(numpy.ndarray.flatten(spatial_index.query(this_fragment)))] + possible_intersect_indices = [(index_by_iloc[k]) for k in possible_intersect_integer_indices] + + if nest_within_regions is not None: + # Restrict to geometries in the same region as this geometry + possible_intersect_indices = [ind for ind in possible_intersect_indices if geometries_to_regions_assignment[ind] == geometries_to_regions_assignment[g_ind]] + + shared_perimeters = [] + for g_ind2 in possible_intersect_indices: + if g_ind2 != g_ind and not (this_fragment.boundary).intersection(reconstructed_df["geometry"][g_ind2].boundary).is_empty: + shared_perimeters.append((g_ind2, (this_fragment.boundary).intersection(reconstructed_df["geometry"][g_ind2].boundary).length)) + + # If this is an isolated fragment and doesn't touch any other + # geometries, leave it alone; otherwise, choose a geometry to + # adjoin it to by largest shared perimeter. + if len(shared_perimeters) > 0: + component_num_list.remove(c_ind) # Tells us to take out this component later + max_shared_perim = sorted(shared_perimeters, key=lambda tup: tup[1])[-1] + poly_to_add_to = max_shared_perim[0] + reconstructed_df["geometry"][poly_to_add_to] = unary_union( + [reconstructed_df["geometry"][poly_to_add_to], this_fragment]) + + if len(component_num_list) == 1: + reconstructed_df["geometry"][g_ind] = reconstructed_df["geometry"][g_ind].geoms[component_num_list[0]] + elif len(component_num_list) > 1: + reconstructed_df["geometry"][g_ind] = MultiPolygon( + [reconstructed_df["geometry"][g_ind].geoms[c_ind] for c_ind in component_num_list]) + else: + print("WARNING: A component of the geometry at index", g_ind, "was badly disconnected and redistributed to other geometries!") + + # We should usually now be back to the correct number of components everywhere, but + # there may occasionally be exceptions, so check again and alert the user if not. + + disconnected_df_2 = reconstructed_df[reconstructed_df["geometry"].apply(lambda x: x.geom_type != "Polygon")] + if len(disconnected_df_2) > 0: + for ind in disconnected_df_2.index: + if num_components(reconstructed_df["geometry"][ind]) > num_components(geometries0_df["geometry"][ind]): + print("WARNING: A component of the geometry at index", ind, "may have been disconnected!") + + if min_rook_length is not None: + # Find all inter-polygon boundaries shorter than min_rook_length and replace them + # with queen adjacencies by manipulating coordinates of all surrounding polygon. + print("Converting small rook adjacencies to queen...") + reconstructed_df = small_rook_to_queen(reconstructed_df, min_rook_length) + + if orig_input_type == "geoseries": + return reconstructed_df.geometry + else: + return reconstructed_df + + +######### +# SUPPORTING FUNCTIONS +######### + +def num_components(geom): + """Counts the number of connected components of a shapely object.""" + if geom.is_empty: + return 0 + elif geom.geom_type in ("Polygon", "Point", "LineString"): + return 1 + elif geom.geom_type in ("MultiPolygon", "MultiLineString", "GeometryCollection"): + return len(geom.geoms) + + +def segments(curve): + """Extracts a list of the individual line segments from a LineString""" + return list(map(LineString, zip(curve.coords[:-1], curve.coords[1:]))) + + +def building_blocks(geometries_df, nest_within_regions=None): + """ + Partitions the extent of the input via all boundaries of all geometries + (and regions, if nest_within_regions is a shapefile of region boundaries); + associates to each polygon in the partition the set of polygons in the original + shapefile whose intersection created it, and organizes this data according to + order of the overlaps. (Order zero = hole) + """ + if isinstance(geometries_df, GeoDataFrame) is False: + raise TypeError("Primary input to building_blocks must be a GeoDataFrame.") + + geometries_df = geometries_df.copy() + if nest_within_regions is not None: + if isinstance(nest_within_regions, GeoDataFrame) is False: + raise TypeError("nest_within_regions must be either None or a GeoDataFrame.") + else: + regions_df = nest_within_regions.copy() + + # Make a list of all the boundaries of all the polygons. + # This won't work properly with MultiPolygons, so explode first: + boundaries = [] + geometries_exploded_df = geometries_df.explode(index_parts=False).reset_index(drop=True) + for i in geometries_exploded_df.index: + boundaries.append(shapely.boundary(geometries_exploded_df["geometry"][i])) + + # Include region boundaries if applicable: + if nest_within_regions is not None: + regions_exploded_df = regions_df.explode(index_parts=False).reset_index(drop=True) + for i in regions_exploded_df.index: + boundaries.append(shapely.boundary(regions_exploded_df["geometry"][i])) + + boundaries_exploded = [] + for geom in boundaries: + if geom.geom_type == "LineString": + boundaries_exploded.append(geom) + elif geom.geom_type == "MultiLineString": + boundaries_exploded += list(geom.geoms) + boundaries_union = shapely.node(MultiLineString(boundaries_exploded)) + + # Create a geodataframe with all the pieces created by overlaps of all orders, + # together with a set for each piece consisting of the polygons that created the overlap. + pieces_df = GeoDataFrame(columns=["polygon indices"], + geometry=GeoSeries(list(polygonize(boundaries_union))), + crs=geometries_df.crs) + + for i in pieces_df.index: + pieces_df["polygon indices"][i] = set() + + # Add a column to indicate the region for each piece; if there are no regions the + # entries will remain as None. + pieces_df["region"] = None + + g_spatial_index = STRtree(geometries_df["geometry"]) + g_index_by_iloc = dict((i, list(geometries_df.index)[i]) for i in range(len(geometries_df))) + + # If region boundaries are included, also create an STRtree for the regions + # and assign the main geometries to regions by largest area overlap. + if nest_within_regions is not None: + r_spatial_index = STRtree(regions_df["geometry"]) + r_index_by_iloc = dict((i, list(regions_df.index)[i]) for i in range(len(regions_df))) + geometries_to_regions_assignment = assign(geometries_df.geometry, regions_df.geometry) + + print("Identifying overlaps...") + for i in progress(pieces_df.index, len(pieces_df.index)): + # If region boundaries are included, identify the region for each piece. + # Note that "None" is a possibility, and that each piece will belong to a unique + # region because the regions shapefile MUST be clean. + if nest_within_regions is not None: + possible_region_integer_indices = [*set(numpy.ndarray.flatten(r_spatial_index.query(pieces_df["geometry"][i])))] + possible_region_indices = [r_index_by_iloc[k] for k in possible_region_integer_indices] + + for j in possible_region_indices: + if pieces_df["geometry"][i].representative_point().intersects(regions_df["geometry"][j]): + pieces_df["region"][i] = j + + # Now identify the set of geometries in the main geometry that each piece is + # contained in. If region boundaries are included, then while determining which + # geometries each piece is contained in, omit any geometries that are + # assigned to a region other than the one the piece is contained in. + possible_geom_integer_indices = [*set(numpy.ndarray.flatten(g_spatial_index.query(pieces_df["geometry"][i])))] + possible_geom_indices = [g_index_by_iloc[k] for k in possible_geom_integer_indices] + + for j in possible_geom_indices: + if nest_within_regions is not None: + if pieces_df["geometry"][i].representative_point().intersects(geometries_df["geometry"][j]): + if geometries_to_regions_assignment[j] == pieces_df["region"][i]: + pieces_df["polygon indices"][i] = pieces_df["polygon indices"][i].union({j}) + else: + if pieces_df["geometry"][i].representative_point().intersects(geometries_df["geometry"][j]): + pieces_df["polygon indices"][i] = pieces_df["polygon indices"][i].union({j}) + + # Organize this info into separate GeoDataFrames for overlaps of all orders - including + # order zero, which corresponds to gaps. + # This will be easier if we temporarily add a column for overlap degree. + overlap_degree_list = [len(x) for x in pieces_df["polygon indices"]] + pieces_df["overlap degree"] = overlap_degree_list + + # Here are the gaps: + holes_df = (pieces_df[pieces_df["overlap degree"] == 0]).reset_index(drop=True) + + # If region boundaries are included, drop all the polygons that didn't fall into any + # region, and also take the (exploded) unary unions of all the gaps in each region, + # since some pieces of geometries from other regions may now be gaps that are adjacent + # to other gaps. + if nest_within_regions is not None: + pieces_df = pieces_df[~pieces_df["region"].isna()].reset_index(drop=True) + holes_df = holes_df[~holes_df["region"].isna()].reset_index(drop=True) + + consolidated_holes_df = GeoDataFrame(columns=["polygon indices", "geometry", "region", "overlap degree"], + geometry="geometry", crs=holes_df.crs) + for r_ind in regions_df.index: + this_region_holes_df = holes_df[holes_df["region"] == r_ind] + this_region_consolidated_holes = GeoSeries([unary_union(this_region_holes_df["geometry"])]).explode(index_parts=False).reset_index(drop=True) + this_region_consolidated_holes_df = GeoDataFrame(geometry=this_region_consolidated_holes, crs=holes_df.crs) + + this_region_consolidated_holes_df.insert(0, "polygon indices", None) + for i in this_region_consolidated_holes_df.index: + this_region_consolidated_holes_df["polygon indices"][i] = set() + this_region_consolidated_holes_df.insert(2, "region", r_ind) + this_region_consolidated_holes_df.insert(2, "overlap degree", 0) + + consolidated_holes_df = pandas.concat([consolidated_holes_df, this_region_consolidated_holes_df]).reset_index(drop=True) + + holes_df = consolidated_holes_df + + # Here is a list of GeoDataFrames, one consisting of all overlaps of each order: + overlap_tower = [] + + for i in range(max(pieces_df["overlap degree"])): + overlap_tower.append(pieces_df[pieces_df["overlap degree"] == i+1]) + + # Drop unnecessary "overlap degree" column and reindex each GeoDataFrame: + for i in range(len(overlap_tower)): + del overlap_tower[i]["overlap degree"] + overlap_tower[i] = overlap_tower[i].reset_index(drop=True) + del holes_df["overlap degree"] + + # Drop the "polygon indices" column in the holes GeoDataFrame: + del holes_df["polygon indices"] + + return overlap_tower, holes_df + + +def reconstruct_from_overlap_tower(geometries_df, overlap_tower, nested=False): + """ + Rebuild the shapefile polygons with overlaps removed. + """ + # Keep a copy of the original input for comparisons later! + geometries0_df = geometries_df.copy() + + geometries_df = geometries_df.copy() + overlap_tower = [df.copy() for df in overlap_tower] + + geometries_df["geometry"] = Polygon() + + max_overlap_level = len(overlap_tower) + + # Start by assigning all order 1 pieces to the polygon they came from: + for ind in overlap_tower[0].index: + this_poly_ind = list(overlap_tower[0]["polygon indices"][ind])[0] + this_piece = overlap_tower[0]["geometry"][ind] + geometries_df["geometry"][this_poly_ind] = unary_union([geometries_df["geometry"][this_poly_ind], this_piece]) + + # We will need to know which geometries were disconnected by removing + # overlaps, so add columns for numbers of components in the original and refined + # geometries to each dataframe for future use. + geometries_df["num components orig"] = 0 + geometries_df["num components refined"] = 0 + + for ind in geometries_df.index: + geometries_df["num components orig"][ind] = num_components(geometries0_df["geometry"][ind]) + geometries_df["num components refined"][ind] = num_components(geometries_df["geometry"][ind]) + + # Now, start with the order 2 overlaps and gradually add overlaps at successively + # higher orders until done. + + # First look for geometries at the top level that were disconnected + # by the refinement process, and give them first dibs at grabbing overlaps + # until they are reconnected or run out of overlaps to grab. + # Note that this doesn't always completely work; in rare cases a single overlap + # can disconnect more than one polygon, and only one of them gets to grab it back. + # This will be addressed at the end of the reconstruction process. + + geometries_disconnected_df = geometries_df[geometries_df["num components refined"] > geometries_df["num components orig"]] + + for i in range(1, max_overlap_level): + overlaps_df = overlap_tower[i] + overlaps_df_unused_indices = overlaps_df.index.tolist() + + o_spatial_index = STRtree(overlaps_df["geometry"]) + o_index_by_iloc = dict((i, list(overlaps_df.index)[i]) for i in range(len(overlaps_df))) + + for g_ind in geometries_disconnected_df.index: + possible_overlap_integer_indices = [*set(numpy.ndarray.flatten(o_spatial_index.query(geometries_disconnected_df["geometry"][g_ind])))] + possible_overlap_indices_0 = [o_index_by_iloc[k] for k in possible_overlap_integer_indices] + possible_overlap_indices = list(set(possible_overlap_indices_0) & set(overlaps_df_unused_indices)) + + geom_finished = False + + for o_ind in possible_overlap_indices: + # If the corresponding overlap intersects this geometry (and was + # contained in it originally!), grab it. + if (geom_finished is False) and (g_ind in list(overlaps_df["polygon indices"][o_ind])) and (not geometries_disconnected_df["geometry"][g_ind].intersection(overlaps_df["geometry"][o_ind]).is_empty): + + if (geometries_disconnected_df["geometry"][g_ind].intersection(overlaps_df["geometry"][o_ind])).length > 0: + geometries_disconnected_df["geometry"][g_ind] = unary_union([ + geometries_disconnected_df["geometry"][g_ind], overlaps_df["geometry"][o_ind] + ]) + overlaps_df_unused_indices.remove(o_ind) + if num_components(geometries_disconnected_df["geometry"][g_ind]) == geometries_df["num components orig"][g_ind]: + geom_finished = True + + geometries_df["geometry"][g_ind] = geometries_disconnected_df["geometry"][g_ind] + + if geom_finished: + geometries_disconnected_df = geometries_disconnected_df.drop(g_ind) + + # That's all we can do for the disconnected geometries at this level. + # Go on to filling in the rest of the overlaps by greatest perimeter. + g_spatial_index = STRtree(geometries_df["geometry"]) + g_index_by_iloc = dict((i, list(geometries_df.index)[i]) for i in range(len(geometries_df))) + + if nested is False: + print("Assigning order", i+1, "pieces...") + for o_ind in overlaps_df_unused_indices: + this_overlap = overlaps_df["geometry"][o_ind] + shared_perimeters = [] + possible_geom_integer_indices = [*set(numpy.ndarray.flatten(g_spatial_index.query(overlaps_df["geometry"][o_ind])))] + possible_geom_indices = [g_index_by_iloc[k] for k in possible_geom_integer_indices] + + for g_ind in possible_geom_indices: + if (g_ind in list(overlaps_df["polygon indices"][o_ind])) and not (this_overlap.boundary).intersection(geometries_df["geometry"][g_ind].boundary).is_empty: + shared_perimeters.append((g_ind, (this_overlap.boundary).intersection(geometries_df["geometry"][g_ind].boundary).length)) + + if len(shared_perimeters) > 0: + max_shared_perim = sorted(shared_perimeters, key=lambda tup: tup[1])[-1] + poly_to_add_to = max_shared_perim[0] + geometries_df["geometry"][poly_to_add_to] = unary_union( + [geometries_df["geometry"][poly_to_add_to], this_overlap]) + else: + # It seems like this should never happen, but it still seems to on + # very rare occasions. + if nested is False: + print("Couldn't find a polygon to glue a component in the intersection of geometries", overlaps_df["polygon indices"][o_ind], "to") + + reconstructed_df = geometries_df + del reconstructed_df["num components orig"] + del reconstructed_df["num components refined"] + + return reconstructed_df + + +def drop_bad_holes(reconstructed_df, holes_df, fill_gaps_threshold): + """ Identify holes that won't be filled and drop them from holes_df """ + + holes_df = holes_df.copy() + + if fill_gaps_threshold is not None: + spatial_index = STRtree(reconstructed_df.geometry) + index_by_iloc = dict((i, list(reconstructed_df.index)[i]) for i in range(len(reconstructed_df.index))) + hole_indices_to_drop = [] + for h_ind in holes_df.index: + this_hole = holes_df["geometry"][h_ind] + if shapely.get_num_interior_rings(holes_df["geometry"][h_ind]) > 0: + hole_indices_to_drop.append(h_ind) + else: + possible_intersect_integer_indices = [*set(numpy.ndarray.flatten(spatial_index.query(this_hole)))] + possible_intersect_indices = [(index_by_iloc[k]) for k in possible_intersect_integer_indices] + actual_intersect_indices = [g_ind for g_ind in possible_intersect_indices if not this_hole.intersection(reconstructed_df["geometry"][g_ind]).is_empty] + if len(actual_intersect_indices) > 0: + max_geom_area = max(reconstructed_df["geometry"][g_ind].area for g_ind in actual_intersect_indices) + hole_area_ratio = this_hole.area/max_geom_area + if hole_area_ratio > fill_gaps_threshold: + hole_indices_to_drop.append(h_ind) + + else: + hole_indices_to_drop = [] + for h_ind in holes_df.index: + if shapely.get_num_interior_rings(holes_df["geometry"][h_ind]) > 0: + hole_indices_to_drop.append(h_ind) + + if len(hole_indices_to_drop) > 0: + holes_df = holes_df.drop(hole_indices_to_drop).reset_index(drop=True) + + return holes_df, len(hole_indices_to_drop) + + +def smart_close_gaps(geometries_df, holes_df): + """ + Fill simply connected gaps; general procedure is roughly as follows: + (1) Fill in gaps that only intersect one non-exterior geometry in the + obvious way. + (2) For remaining gaps, partially fill by "convexifying" boundaries with each + non-exterior geometry. This will have the effect of completely filling + gaps that only intersect 2 geometries and no exterior boundaries. + (3) For any gap that intersects 4 or more geometries nontrivially (including + exterior boundaries), find the non-adjacent pair with the shortest distance + between them and try to connect the pair by adding a "triangle" to each of the + non-exterior geometries in the pair. (Keep trying until this succeeds for + some pair.) This reduces the gap to 1 or 2 smaller gaps, each intersecting + strictly fewer geometries than the original. Put the smaller gaps back in the + queue for the next round. + (4) For any gap that intersects exactly 3 geometries (including exterior boundaries) + nontrivially, fill by a process that gives a portion of the gap to each of + the non-exterior geometries that it intersects. + """ + geometries_df = geometries_df.copy() + holes_df = holes_df.copy() + + # First step is to simplify gaps by convexifying the geometry boundaries: + geometries_df, holes_df = convexify_hole_boundaries(geometries_df, holes_df) + + # Now proceed with filling simplified gaps. + if len(holes_df) > 0: + holes_to_process = deque(list(holes_df["geometry"])) + this_region = list(holes_df["region"])[0] # All holes in this dataframe should be from the same region + if this_region is None: + pbar = tqdm(desc="Gaps to fill", total=len(holes_to_process)) + else: + pbar = tqdm(desc=f"Gaps to fill in region {this_region}", total=len(holes_to_process)) + else: + holes_to_process = deque([]) + pbar = tqdm(desc="Gaps to fill", total=len(holes_to_process)) + + while len(holes_to_process) > 0: + pbar_increment = 1 + this_hole = holes_to_process.popleft() + this_hole_df = GeoDataFrame(geometry=GeoSeries([this_hole]), crs=holes_df.crs) + this_hole_boundaries_df = construct_hole_boundaries(geometries_df, this_hole_df) + + # Break into cases depending on how many target geometries intersect this gap + # and how many line segments the gap boundary consists of. + # After convexification, all gaps must have at least 3 boundaries (possibly + # including an exterior boundary). + + if len(set(this_hole_boundaries_df["target"]).difference({-1})) == 1: + # Attach the gap to the unique non-exterior geometry that it intersects: + poly_to_add_to = list(set(this_hole_boundaries_df["target"]).difference({-1}))[0] + geometries_df["geometry"][poly_to_add_to] = unary_union([geometries_df["geometry"][poly_to_add_to], this_hole]) + + elif len(segments(this_hole.boundary)) == 3: # If the hole is a simple triangle + if len(set(this_hole_boundaries_df["target"]).difference({-1})) == 3: + # Find the incenter of the triangle and use it to divide the triangle into + # 3 smaller triangles. (The incenter is more natural for this purpose than + # the centroid, especially for long skinny triangles.) + this_hole_incenter = incenter(this_hole) + for thb_ind in this_hole_boundaries_df.index: + g_ind = this_hole_boundaries_df["target"][thb_ind] + this_segment = this_hole_boundaries_df["geometry"][thb_ind] + this_segment_poly_to_add = make_valid(Polygon([this_segment.boundary.geoms[0], this_segment.boundary.geoms[1], this_hole_incenter])) + geometries_df["geometry"][g_ind] = unary_union([geometries_df["geometry"][g_ind], this_segment_poly_to_add]) + + else: + # There are either 2 sides intersecting a common geometry or 1 + # side intersecting an exterior boundary. In this case join the entire + # triangle to the geometry that it shares the largest perimeter with. + touching_geoms = list(set(this_hole_boundaries_df["target"]).difference({-1})) + perim_1 = this_hole.intersection(geometries_df["geometry"][touching_geoms[0]]).length + perim_2 = this_hole.intersection(geometries_df["geometry"][touching_geoms[1]]).length + if perim_1 > perim_2: + poly_to_add_to = touching_geoms[0] + else: + poly_to_add_to = touching_geoms[1] + geometries_df["geometry"][poly_to_add_to] = unary_union([geometries_df["geometry"][poly_to_add_to], this_hole]) + + else: + this_hole_df = GeoDataFrame(geometry=GeoSeries([this_hole]), crs=holes_df.crs) + this_hole_boundaries_df = construct_hole_boundaries(geometries_df, this_hole_df) + + # If this_hole falls into one of the simple cases above, put it back + # in the queue. (Note that after convexification, + # this_hole_boundaries_df can only have length 2 if one of the + # boundaries is exterior and didn't get convexified.) + if len(this_hole_boundaries_df) == 3: + # Put the gap boundaries and target geometries into oriented order: + this_hole_boundaries = [this_hole_boundaries_df["geometry"][0]] + target_geometries = [this_hole_boundaries_df["target"][0]] + + if this_hole_boundaries_df["geometry"][1].coords[0] == this_hole_boundaries_df["geometry"][0].coords[-1]: + this_hole_boundaries.append(this_hole_boundaries_df["geometry"][1]) + target_geometries.append(this_hole_boundaries_df["target"][1]) + this_hole_boundaries.append(this_hole_boundaries_df["geometry"][2]) + target_geometries.append(this_hole_boundaries_df["target"][2]) + elif this_hole_boundaries_df["geometry"][2].coords[0] == this_hole_boundaries_df["geometry"][0].coords[-1]: + this_hole_boundaries.append(this_hole_boundaries_df["geometry"][2]) + target_geometries.append(this_hole_boundaries_df["target"][2]) + this_hole_boundaries.append(this_hole_boundaries_df["geometry"][1]) + target_geometries.append(this_hole_boundaries_df["target"][1]) + + # If one of the boundaries is an exterior region boundary, find + # the shortest path between the vertex that isn't one of its + # endpoints and the nearest point in this boundary, and divide + # the hole between the other two adjacent geometries along this path. + # Otherwise, for each of the three boundary endpoints, construct + # the angle bisector of the two adjacent line segments and extend + # this line beyond the extent of the hole. Intersections of these + # 3 line segments will determine the endpoints of the new boundaries. + + if -1 in target_geometries: + ext_boundary_position = target_geometries.index(-1) + # Cyclically permute so that the exterior boundary is in the + # 1st position: + this_hole_boundaries = this_hole_boundaries[ext_boundary_position:] + this_hole_boundaries[0:ext_boundary_position] + target_geometries = target_geometries[ext_boundary_position:] + target_geometries[0:ext_boundary_position] + + main_vertex = Point(this_hole_boundaries[2].coords[0]) + nearest_ext_boundary_point = nearest_points(main_vertex, extract_unique_points(this_hole_boundaries[0]))[1] + + ext_boundary_points = list(extract_unique_points(this_hole_boundaries[0]).geoms) + nearest_point_position = ext_boundary_points.index(nearest_ext_boundary_point) + + if nearest_point_position == 0: + # Add the entire hole to target_geometries[1]. + geometries_df["geometry"][target_geometries[1]] = unary_union([geometries_df["geometry"][target_geometries[1]], this_hole]) + + elif nearest_point_position == len(ext_boundary_points) - 1: + # Add the entire hole to target_geometries[2]. + geometries_df["geometry"][target_geometries[2]] = unary_union([geometries_df["geometry"][target_geometries[2]], this_hole]) + + else: + this_hole_triangulation = triangulate_polygon(this_hole) + sp = LineString(shortest_path_in_polygon(this_hole, main_vertex, nearest_ext_boundary_point, full_triangulation=this_hole_triangulation)) + + poly1_to_add_boundary = unary_union([this_hole_boundaries[1], sp, LineString(ext_boundary_points[nearest_point_position:])]) + poly1_to_add = polygonize(poly1_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[1]] = unary_union([geometries_df["geometry"][target_geometries[1]], poly1_to_add]) + + poly2_to_add_boundary = unary_union([this_hole_boundaries[2], sp, LineString(ext_boundary_points[0:nearest_point_position+1])]) + poly2_to_add = polygonize(poly2_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[2]] = unary_union([geometries_df["geometry"][target_geometries[2]], poly2_to_add]) + + else: + max_line_length = this_hole.boundary.length/2 + vertices = [] + bisectors = [] + + for i in range(3): + this_vertex = numpy.array(this_hole_boundaries[i].coords[0]) + vertices.append(Point(this_hole_boundaries[i].coords[0])) + this_vec_1_raw = numpy.array(this_hole_boundaries[i].coords[1]) - this_vertex + this_vec_2_raw = numpy.array(this_hole_boundaries[i-1].coords[-2]) - this_vertex + this_unit_vec_1 = this_vec_1_raw/math.sqrt(this_vec_1_raw[0]**2 + this_vec_1_raw[1]**2) + this_unit_vec_2 = this_vec_2_raw/math.sqrt(this_vec_2_raw[0]**2 + this_vec_2_raw[1]**2) + this_bisector_vec_raw = this_unit_vec_1 + this_unit_vec_2 + this_bisector_unit_vec = this_bisector_vec_raw/math.sqrt(this_bisector_vec_raw[0]**2 + this_bisector_vec_raw[1]**2) + this_bisector = LineString([tuple(this_vertex), tuple(this_vertex + max_line_length*this_bisector_unit_vec)]) + bisectors.append(this_bisector) + + # Points of intersection of the bisectors: + i_points = [bisectors[0].intersection(bisectors[1]), bisectors[1].intersection(bisectors[2]), bisectors[2].intersection(bisectors[0])] + + # Note that these points could coincide - e.g., if the convexified + # hole is a triangle - and the rest of the construction would be very + # simple. + # Also - even though this is geometrically impossible(!), + # rounding errors can create a situation in which two + # of these points are equal but different from the 3rd. + # In this case, assume that the one that appears twice + # is actually the common value for all three. + + if i_points[0] == i_points[1] or i_points[0] == i_points[2]: + # Construct pieces to append to geometries and append them. + middle_point = i_points[0] + for i in range(3): + poly_to_add_boundary = unary_union([this_hole_boundaries[i], LineString([this_hole_boundaries[i].coords[-1], middle_point, this_hole_boundaries[i].coords[0]])]) + poly_to_add = polygonize(poly_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[i]] = unary_union([geometries_df["geometry"][target_geometries[i]], poly_to_add]) + + elif i_points[1] == i_points[2]: + # Construct pieces to append to geometries and append them. + middle_point = i_points[1] + for i in range(3): + poly_to_add_boundary = unary_union([this_hole_boundaries[i], LineString([this_hole_boundaries[i].coords[-1], middle_point, this_hole_boundaries[i].coords[0]])]) + poly_to_add = polygonize(poly_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[i]] = unary_union([geometries_df["geometry"][target_geometries[i]], poly_to_add]) + + else: + # In general, each bisector intersects the other two + # bisectors in distinct points. To accurately construct + # the path to the more distant one, we need to include + # the nearer one as an intermediate point. + # And we might as well go ahead and find the incenter of the + # triangle formed by the intersection points, and include it + # on the path to the more distant one so we can completely + # fill the hole without a separate step. + middle_point = incenter(Polygon(i_points)) + + # The first bisector contains the 1st and 3rd intersection points. + if vertices[0].distance(i_points[0]) > vertices[0].distance(i_points[2]): + v0_to_i01_path = LineString([vertices[0], i_points[2], middle_point, i_points[0]]) + v0_to_i02_path = LineString([vertices[0], i_points[2]]) + else: + v0_to_i01_path = LineString([vertices[0], i_points[0]]) + v0_to_i02_path = LineString([vertices[0], i_points[0], middle_point, i_points[2]]) + + # The second bisector contains the 1st and 2nd intersection points. + if vertices[1].distance(i_points[0]) > vertices[1].distance(i_points[1]): + v1_to_i01_path = LineString([vertices[1], i_points[1], middle_point, i_points[0]]) + v1_to_i12_path = LineString([vertices[1], i_points[1]]) + else: + v1_to_i01_path = LineString([vertices[1], i_points[0]]) + v1_to_i12_path = LineString([vertices[1], i_points[0], middle_point, i_points[1]]) + + # The third bisector contains the 2nd and 3rd intersection points. + if vertices[2].distance(i_points[1]) > vertices[2].distance(i_points[2]): + v2_to_i12_path = LineString([vertices[2], i_points[2], middle_point, i_points[1]]) + v2_to_i02_path = LineString([vertices[2], i_points[2]]) + else: + v2_to_i12_path = LineString([vertices[2], i_points[1]]) + v2_to_i02_path = LineString([vertices[2], i_points[1], middle_point, i_points[2]]) + + # Construct and adjoin new polygon pieces one at a time. + poly0_to_add_boundary = unary_union([this_hole_boundaries[0], v0_to_i01_path, v1_to_i01_path]) + poly0_to_add = polygonize(poly0_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[0]] = unary_union([geometries_df["geometry"][target_geometries[0]], poly0_to_add]) + + poly1_to_add_boundary = unary_union([this_hole_boundaries[1], v1_to_i12_path, v2_to_i12_path]) + poly1_to_add = polygonize(poly1_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[1]] = unary_union([geometries_df["geometry"][target_geometries[1]], poly1_to_add]) + + poly2_to_add_boundary = unary_union([this_hole_boundaries[2], v2_to_i02_path, v0_to_i02_path]) + poly2_to_add = polygonize(poly2_to_add_boundary)[0] + geometries_df["geometry"][target_geometries[2]] = unary_union([geometries_df["geometry"][target_geometries[2]], poly2_to_add]) + + else: # If len(this_hole_boundaries_df) >= 4 + this_hole_triangulation = triangulate_polygon(this_hole) + thb_distances = [] + + for i in this_hole_boundaries_df.index: + for j in this_hole_boundaries_df.index: + if j > i: + this_distance = this_hole_boundaries_df["geometry"][i].distance(this_hole_boundaries_df["geometry"][j]) + if this_distance != 0: + thb_distances.append((i, j, this_distance)) + + thb_distance_data_sorted = deque(sorted(thb_distances, key=lambda tup: tup[2])) + + found_triangles = False + while found_triangles is False and len(thb_distance_data_sorted) > 0: + boundary_distance_data = thb_distance_data_sorted.popleft() + boundaries_to_connect = (boundary_distance_data[0], boundary_distance_data[1]) + + nhb1 = this_hole_boundaries_df["geometry"][boundaries_to_connect[0]] + nhb2 = this_hole_boundaries_df["geometry"][boundaries_to_connect[1]] + geom1 = this_hole_boundaries_df["target"][boundaries_to_connect[0]] + geom2 = this_hole_boundaries_df["target"][boundaries_to_connect[1]] + + # Construct the shortest paths between + # (1) initial points of both boundaries; + # (2) terminal points of both boundaries. + # These paths will intersect, possibly at a vertex or along entire + # hole boundary segments, but generically---and provably for at + # at leat one non-adjacent pair---at a single interior point of + # the hole. + # In the generic case, these paths together with the two + # hole boundaries will form a pair of "triangles" that each share a + # boundary of positive length with one of the two hole boundaries. + # Find the closest pair that satisfy this generic intersection + # condition and adjoin each of the triangles formed in this way + # to its adjacent geometry. This will create + # two smaller holes - which already have convexified geometry + # boundaries by construction! - and we put these back on the queue + # for the next round. + + if not (geom1 == -1 and geom2 == -1): + # If one of the boundaries is exterior (and in the rare case that BOTH + # boundaries are exterior, skip this pair and go on to the next one): + # Find the closest point on the exterior boundary to the non-exterior + # geometry, construct a "triangle" by taking the shortest paths from + # the endpoints of the non-exterior geometry to this point, and + # (assuming it has positive area) adjoining it to the non-exterior + # geometry. + if geom1 == -1 or geom2 == -1: + if geom1 == -1: + nhb_ext = nhb1 + nhb_int = nhb2 + geom_int = geom2 + elif geom2 == -1: + nhb_ext = nhb2 + nhb_int = nhb1 + geom_int = geom1 + point1 = nhb_int.boundary.geoms[0] + point2 = nhb_int.boundary.geoms[1] + nearest_ext_boundary_point = nearest_points(nhb_int, extract_unique_points(nhb_ext))[1] + path1 = LineString(shortest_path_in_polygon(this_hole, point1, nearest_ext_boundary_point, full_triangulation=this_hole_triangulation)) + path2 = LineString(shortest_path_in_polygon(this_hole, point2, nearest_ext_boundary_point, full_triangulation=this_hole_triangulation)) + polys_to_add_boundary = shapely.node(MultiLineString([nhb_int, path1, path2])) + polys_to_add = polygonize(polys_to_add_boundary) + if len(polys_to_add) > 0: + for poly_to_add in polys_to_add: + if poly_to_add.area > 0: + found_triangles = True + geometries_df["geometry"][geom_int] = unary_union([geometries_df["geometry"][geom_int], poly_to_add]) + this_hole = this_hole.difference(poly_to_add) + + else: + # Start by constructing the shortest paths between the initial point + # of each boundary and the terminal point of the other. If these + # paths are disjoint, then this pair of boundaries is strongly + # mutually visible and we want to connect them. Otherwise, + # skip this pair and move on to the next one. + point11 = nhb1.boundary.geoms[0] + point12 = nhb1.boundary.geoms[1] + point21 = nhb2.boundary.geoms[0] + point22 = nhb2.boundary.geoms[1] + + test_path1_vertices = shortest_path_in_polygon(this_hole, point11, point22, full_triangulation=this_hole_triangulation) + test_path2_vertices = shortest_path_in_polygon(this_hole, point12, point21, full_triangulation=this_hole_triangulation) + if len(set(test_path1_vertices).intersection(set(test_path2_vertices))) == 0: + # In this case we should be good to add triangles formed + # by crossing paths between the initial and terminal + # points between the two boundaries! + # A minor exception is when both boundaries target the same + # geometry, in which case we want the paths to NOT cross in + # order to preserve convexity of the geometry boundaries in the + # new holes, and we'll add a single polygon instead of a + # pair of triangles. + + found_triangles = True + + if geom1 == geom2: + path1 = LineString(shortest_path_in_polygon(this_hole, point11, point22, full_triangulation=this_hole_triangulation)) + path2 = LineString(shortest_path_in_polygon(this_hole, point12, point21, full_triangulation=this_hole_triangulation)) + else: + path1 = LineString(shortest_path_in_polygon(this_hole, point11, point21, full_triangulation=this_hole_triangulation)) + path2 = LineString(shortest_path_in_polygon(this_hole, point12, point22, full_triangulation=this_hole_triangulation)) + + polys_to_add_boundary = shapely.node(MultiLineString([nhb1, nhb2, path1, path2])) + polys_to_add = polygonize(polys_to_add_boundary) + # polys_to_add will consist of either 1 or 2 polygons, + # each sharing a positive-length boundary witha unique geometry. + # Add each polygon to the geometry that it shares a boundary with. + nhb1_segments = segments(nhb1) + nhb2_segments = segments(nhb2) + for poly_to_add in polys_to_add: + poly_to_add = orient(poly_to_add) + # Cover all bases with both possible orientations for + # boundary segments, even though the proper orientation + # SHOULD always be correct. + poly_segments_oriented = segments(poly_to_add.boundary) + poly_segments_reverse = [shapely.reverse(segment) for segment in poly_segments_oriented] + poly_segments_all = set(poly_segments_oriented + poly_segments_reverse) + if (len(set(nhb1_segments).intersection(poly_segments_all)) > 0) and (len(set(nhb2_segments).intersection(poly_segments_all)) == 0): + geometries_df["geometry"][geom1] = unary_union([geometries_df["geometry"][geom1], poly_to_add]) + this_hole = this_hole.difference(poly_to_add) + + elif (len(set(nhb1_segments).intersection(poly_segments_all)) == 0) and (len(set(nhb2_segments).intersection(poly_segments_all)) > 0): + geometries_df["geometry"][geom2] = unary_union([geometries_df["geometry"][geom2], poly_to_add]) + this_hole = this_hole.difference(poly_to_add) + + elif geom1 == geom2: + geometries_df["geometry"][geom1] = unary_union([geometries_df["geometry"][geom1], poly_to_add]) + this_hole = this_hole.difference(poly_to_add) + + else: + print("Internal triangle construction went weird!") + print("Hole boundaries:") + for i in this_hole_boundaries_df.index: + print("Target:", this_hole_boundaries_df["target"][i]) + print(list(this_hole_boundaries_df["geometry"][i].coords)) + print("poly_to_add boundaries:") + print(list(poly_to_add.boundary.coords)) + + # Now put the new hole(s) created by removing triangles back in the queue: + if found_triangles and not this_hole.is_empty: + if this_hole.geom_type == "MultiPolygon": # 2 holes to add + holes_to_add = [orient(geom) for geom in this_hole.geoms] + elif this_hole.geom_type == "Polygon": # 1 hole to add + holes_to_add = [orient(this_hole)] + holes_to_process.extend(holes_to_add) + pbar_increment -= len(holes_to_add) + + elif found_triangles is False: + # This is rare, but it does happen occasionally in the scenario where + # there's a large external boundary that, if it weren't external, + # would grab most (or maybe even all) of the hole in the + # convexification process. + # In this case, just assign the entire hole to the geometry with which + # it shares the largest perimeter. (This is fairly close to what + # tends to happen with large, non-convexified external boundaries + # anyway!) + shared_perimeters = [] + for i in this_hole_boundaries_df.index: + if this_hole_boundaries_df["target"][i] != -1: + shared_perimeters.append((this_hole_boundaries_df["target"][i], this_hole_boundaries_df["geometry"][i].length)) + if len(shared_perimeters) > 0: + max_shared_perim = sorted(shared_perimeters, key=lambda tup: tup[1])[-1] + poly_to_add_to = max_shared_perim[0] + geometries_df["geometry"][poly_to_add_to] = unary_union( + [geometries_df["geometry"][poly_to_add_to], this_hole]) + + pbar.update(pbar_increment) + + pbar.close() + + return geometries_df + + +def small_rook_to_queen(geometries_df, min_rook_length): + """ + Convert all rook adjacencies between geometries with total adjacency length less + than min_rook_length to queen adjacencies. + """ + + geometries_df = geometries_df.copy() + + # The input should be clean, so these should all be 1-D or less: + adj_df = adjacencies(geometries_df, output_type="geodataframe") + adj_df.crs = geometries_df.crs + + # Identify the adjacencies whose TOTAL length is less than the threshold; + # since this is all about adjacency relations, there's no need to fix + # a small component when there's also a large component that will create + # an adjacency regardless. + # Add column for boundary length and select the small ones: + adj_df["boundary length"] = adj_df["geometry"].length + small_adj_df = adj_df[adj_df["boundary length"] < min_rook_length] + + # Get rid of point geometries, linemerge the MultiLineStrings, and then + # explode into components. (Then get rid of points again.) + for ind in small_adj_df.index: + if small_adj_df["geometry"][ind].geom_type == "GeometryCollection": + small_adj_list = list(small_adj_df["geometry"][ind].geoms) + small_adj_list_no_point = [x for x in small_adj_list if x.geom_type != "Point"] + small_adj_df["geometry"][ind] = MultiLineString(small_adj_list_no_point) + + if small_adj_df["geometry"][ind].geom_type == "MultiLineString": + small_adj_df["geometry"][ind] = linemerge(small_adj_df["geometry"][ind]) + + small_adj_df = small_adj_df.explode(index_parts=False).reset_index(drop=True) + + small_adj_df_indices_to_drop = [] + for ind in small_adj_df.index: + if small_adj_df["geometry"][ind].geom_type == "Point": + small_adj_df_indices_to_drop.append(ind) + + if len(small_adj_df_indices_to_drop) > 0: + small_adj_df = small_adj_df.drop(small_adj_df_indices_to_drop) + + # Next, construct small disks around each adjacency, and add them to a list. + # We'll take their unary union later in case any of them overlap. + disks_to_remove_list = [] + for a_ind in small_adj_df.index: + this_adj = small_adj_df["geometry"][a_ind] + adj_diam = this_adj.length + fat_point_radius = 0.6*adj_diam # slightly more than the radius from the midpoint to the endpoints + endpoint1 = this_adj.coords[0] + endpoint2 = this_adj.coords[-1] + midpoint = LineString([endpoint1, endpoint2]).centroid + disk_to_remove = midpoint.buffer(fat_point_radius) + disks_to_remove_list.append(disk_to_remove) + + if len(disks_to_remove_list) > 0: + # Make a list of the convex hulls of all the components of the unary union, + # and make sure none of THOSE intersect. (Note that is is only necessary if there + # is more than 1 disk to remove.) + polys_to_remove_list = disks_to_remove_list + polys_to_remove_complete = False + while polys_to_remove_complete is False: + all_polys_to_remove = unary_union(polys_to_remove_list) + if all_polys_to_remove.geom_type == "Polygon": # if it's all one big polygon now + merged_polys_to_remove_list = [all_polys_to_remove] + else: + merged_polys_to_remove_list = list(all_polys_to_remove.geoms) + + convex_polys_to_remove_list = [shapely.convex_hull(x) for x in merged_polys_to_remove_list] + + if len(convex_polys_to_remove_list) == 1: + polys_to_remove_complete = True + elif unary_union(convex_polys_to_remove_list).geom_type == "MultiPolygon": + # Note that if the unary union is a Polygon, then this next condition + # below can't hold anyway and we want polys_to_remove_complete to remain + # False. + if len(unary_union(convex_polys_to_remove_list).geoms) == len(convex_polys_to_remove_list): + polys_to_remove_complete = True + + polys_to_remove_list = convex_polys_to_remove_list + + # Build an STRtree to use for finding intersecting geometries. + g_spatial_index = STRtree(geometries_df["geometry"]) + g_index_by_iloc = dict((i, list(geometries_df.index)[i]) for i in range(len(geometries_df))) + + for a_ind in range(len(polys_to_remove_list)): + poly_to_remove = polys_to_remove_list[a_ind] + + # Identify geometries that might intersect this polygon. + possible_geom_integer_indices = [*set(numpy.ndarray.flatten(g_spatial_index.query(poly_to_remove)))] + possible_geom_indices = [g_index_by_iloc[k] for k in possible_geom_integer_indices] + + # Use the boundaries of these geometries together with the boundary of the disk to + # polygonize and divide geometries into pieces inside and outside the disk. + boundaries = [geometries_df["geometry"][i].boundary for i in possible_geom_indices] + boundaries.append(LineString(list(poly_to_remove.exterior.coords))) + + boundaries_exploded = [] + for geom in boundaries: + if geom.geom_type == "LineString": + boundaries_exploded.append(geom) + elif geom.geom_type == "MultiLineString": + boundaries_exploded += list(geom.geoms) + boundaries_union = shapely.node(MultiLineString(boundaries_exploded)) + + pieces_df = GeoDataFrame(columns=["polygon indices"], + geometry=GeoSeries(list(polygonize(boundaries_union))), + crs=geometries_df.crs) + + # Associate the pieces to the main geometries. (Note that if there are + # gaps, some pieces may be unassigned.) + for i in pieces_df.index: + pieces_df["polygon indices"][i] = set() + + for i in pieces_df.index: + temp_possible_geom_integer_indices = [*set(numpy.ndarray.flatten(g_spatial_index.query(pieces_df["geometry"][i])))] + temp_possible_geom_indices = [g_index_by_iloc[k] for k in temp_possible_geom_integer_indices] + + for j in temp_possible_geom_indices: + if pieces_df["geometry"][i].representative_point().intersects(geometries_df["geometry"][j]): + pieces_df["polygon indices"][i] = pieces_df["polygon indices"][i].union({j}) + + # Now rebuild the disk from the pieces that are inside the circle, and drop them from + # pieces_df. Then we'll give the pieces outside the circle back to the geometries that they came from. + + poly_to_remove_refined = Polygon() + + pieces_df_indices_to_drop = [] + for p_ind in pieces_df.index: + if pieces_df["geometry"][p_ind].representative_point().intersects(poly_to_remove): + poly_to_remove_refined = unary_union([poly_to_remove_refined, pieces_df["geometry"][p_ind]]) + pieces_df_indices_to_drop.append(p_ind) + if len(pieces_df_indices_to_drop) > 0: + pieces_df = pieces_df.drop(pieces_df_indices_to_drop) + + for g_ind in possible_geom_indices: + geometries_df["geometry"][g_ind] = Polygon() + + for p_ind in pieces_df.index: + if len(pieces_df["polygon indices"][p_ind]) == 1: # Note that it won't be >1 if the file is clean! + this_poly_ind = list(pieces_df["polygon indices"][p_ind])[0] + this_piece = pieces_df["geometry"][p_ind] + if this_poly_ind in possible_geom_indices: + # This check is needed because the geometries in possible_geom_incides can form a + # non-simply-connected region, in which case the interior holes - which may consist + # of multiple geometries each - may be assigned someplace they shouldn't be! + geometries_df["geometry"][this_poly_ind] = unary_union([geometries_df["geometry"][this_poly_ind], this_piece]) + + # Find the boundary arcs between geometries and poly_to_remove_refined (and make sure each arc is a connected piece): + possible_geoms = geometries_df.loc[possible_geom_indices] + poly_to_remove_boundaries_df = intersections(GeoDataFrame(geometry=GeoSeries([poly_to_remove_refined], crs=geometries_df.crs)), possible_geoms, output_type="geodataframe") + + for b_ind in poly_to_remove_boundaries_df.index: + if poly_to_remove_boundaries_df["geometry"][b_ind].geom_type == "MultiLineString": + poly_to_remove_boundaries_df["geometry"][b_ind] = linemerge(poly_to_remove_boundaries_df["geometry"][b_ind]) + + poly_to_remove_boundaries_df = poly_to_remove_boundaries_df.explode(index_parts=False).reset_index(drop=True) + poly_to_remove_centroid_coords = poly_to_remove_refined.centroid.coords[0] + + # For each boundary arc, create a "pie wedge" from the center of poly_to_remove_refined + # subtending this arc. (Since the polygon is convex, these are guaranteed to piece + # together nicely.) + for b_ind in poly_to_remove_boundaries_df.index: + boundary_arc_coords = list(poly_to_remove_boundaries_df["geometry"][b_ind].coords) + boundary_wedge_coords = boundary_arc_coords + [poly_to_remove_centroid_coords] + + g_ind = poly_to_remove_boundaries_df["target"][b_ind] + + geometries_df["geometry"][g_ind] = unary_union([geometries_df["geometry"][g_ind], Polygon(boundary_wedge_coords)]) + + return geometries_df + + +def construct_hole_boundaries(geometries_df, holes_df): + """ + Construct a GeoDataFrame with all positive-length intersections between hole + and geometry boundaries, including intersections between hole boundaries and + exterior boundaries, if applicable. + """ + geometries_df = geometries_df.copy() + holes_df = holes_df.copy() + + # Be sure gaps are correctly oriented: + for h_ind in holes_df.index: + holes_df.geometry[h_ind] = orient(holes_df.geometry[h_ind]) + + # Do this WITHOUT using geometric intersection operations, which seem to be prone to + # inexplicable rounding errors (GEOS bugs?) + # Start by constructing an STRtree to find geometries that may intersect gaps. + + g_spatial_index = STRtree(geometries_df["geometry"]) + g_index_by_iloc = dict((i, list(geometries_df.index)[i]) for i in range(len(geometries_df))) + + # Initialize the geodataframe for the gap boundaries + hole_boundaries_df = GeoDataFrame(columns=["source", "target"], geometry=GeoSeries([]), crs=geometries_df.crs) + + # For each gap and each geometry that it might possibly intersect, find all + # common LineStrings in their boundaries (if any) and take their unary union to + # construct the appropriate boundary between them. (Note that this requires paying + # VERY careful attention to orientations!) + for h_ind in holes_df.index: + this_hole = holes_df["geometry"][h_ind] + this_hole_segments = segments(this_hole.boundary) + this_hole_segments_used = [] + + possible_geom_integer_indices = [*set(numpy.ndarray.flatten(g_spatial_index.query(holes_df["geometry"][h_ind])))] + possible_geom_indices = [g_index_by_iloc[k] for k in possible_geom_integer_indices] + + for g_ind in possible_geom_indices: + + this_geom = geometries_df["geometry"][g_ind] + if this_geom.geom_type == "Polygon": + this_geom_geoms = [orient(this_geom)] + elif this_geom.geom_type == "MultiPolygon": + this_geom_geoms = [orient(geom) for geom in this_geom.geoms] + + this_geom_boundary_components = [] + for geom in this_geom_geoms: + this_geom_boundary = geom.boundary + if this_geom_boundary.geom_type == "LineString": + this_geom_boundary_components += [this_geom_boundary] + elif this_geom_boundary.geom_type == "MultiLineString": + this_geom_boundary_components += list(this_geom_boundary.geoms) + + this_geom_segments = set() + for component in this_geom_boundary_components: + this_geom_segments = this_geom_segments.union(set(segments(component))) + + this_hole_this_geom_segments = [segment for segment in this_hole_segments if (segment in this_geom_segments or shapely.reverse(segment) in this_geom_segments)] + + if len(this_hole_this_geom_segments) > 0: + + this_hole_segments_used += this_hole_this_geom_segments + this_hole_boundary_df = GeoDataFrame(geometry=GeoSeries([linemerge(this_hole_this_geom_segments)]), crs=geometries_df.crs) + this_hole_boundary_df.insert(0, "source", h_ind) + this_hole_boundary_df.insert(1, "target", g_ind) + + hole_boundaries_df = pandas.concat([hole_boundaries_df, this_hole_boundary_df]).reset_index(drop=True) + + # Finally, check for any exterior boundary: + if len(this_hole_segments) > len(this_hole_segments_used): + exterior_segments = [segment for segment in this_hole_segments if segment not in this_hole_segments_used] + this_hole_exterior_boundary_df = GeoDataFrame(geometry=GeoSeries([linemerge(exterior_segments)]), crs=geometries_df.crs) + this_hole_exterior_boundary_df.insert(0, "source", h_ind) + this_hole_exterior_boundary_df.insert(1, "target", -1) + hole_boundaries_df = pandas.concat([hole_boundaries_df, this_hole_exterior_boundary_df]).reset_index(drop=True) + + hole_boundaries_df = hole_boundaries_df.explode(index_parts=False).reset_index(drop=True) + + return hole_boundaries_df + + +def incenter(triangle): + """ + Find the incenter (intersection point of the angle bisectors) of a triangle. + """ + triangle_vertices = triangle.boundary.coords + triangle_segments = segments(triangle.boundary) + + if len(triangle_segments) != 3: + raise TypeError("Input must be a triangle!") + + x_a = triangle_vertices[0][0] + y_a = triangle_vertices[0][1] + x_b = triangle_vertices[1][0] + y_b = triangle_vertices[1][1] + x_c = triangle_vertices[2][0] + y_c = triangle_vertices[2][1] + a = triangle_segments[1].length + b = triangle_segments[2].length + c = triangle_segments[0].length + + # The incenter will be a weighted average of the coordinates of the vertices, + # with coefficients proportional to a,b,c. + + alpha = a/(a + b + c) + beta = b/(a + b + c) + gamma = c/(a + b + c) + + x_i = alpha*x_a + beta*x_b + gamma*x_c + y_i = alpha*y_a + beta*y_b + gamma*y_c + + # Occasionally for very tiny triangles, rounding errors produce a point not + # contained in the triangle. In this case, replace the computed point with + # the nearest vertex of the triangle. + if not triangle.contains(Point(x_i, y_i)): + point_to_return = nearest_points(Point(x_i, y_i), MultiPoint([Point(x_a, y_a), Point(x_b, y_b), Point(x_c, y_c)]))[1] + else: + point_to_return = Point(x_i, y_i) + + return point_to_return + + +def triangulate_polygon(polygon): + """ + Triangulate a not-necessarily-convex simple polygon, based on the ear clipping + method. + """ + + triangles = [] + poly = polygon + + while len(segments(poly.boundary)) > 3: + poly_vertices = list(extract_unique_points(poly).geoms) + + # Find an ear to cut from the polygon and add it to the list of triangles. + for i in range(len(poly_vertices)): + triangle_to_check = Polygon([poly_vertices[i-1], poly_vertices[i], poly_vertices[i+1]]) + if poly.contains(triangle_to_check) and LineString([poly_vertices[i-1], poly_vertices[i+1]]).intersection(poly.boundary).difference(MultiPoint([poly_vertices[i-1], poly_vertices[i+1]])).is_empty: + triangles.append(triangle_to_check) + poly = poly.difference(triangle_to_check) + break + + # Remaining polygon is now a triangle, so add it to the list. + triangles.append(poly) + + return triangles + + +def shortest_path_in_polygon(polygon, start, end, full_triangulation=None): + """ + Finds the shortest path between any two vertices in a not-necessarily-convex + simple polygon. The polygon must be valid and simply connected, + and "start" and "end" must be vertices of the polygon. + + Optional input full_triangulation allows triangulation to be computed in + advance to avoid repetition when multiple paths need to be computed within + the same polygon. + """ + if not (polygon.is_valid and polygon.geom_type == "Polygon"): + raise TypeError("shortest_path_in_polygon: Input polygon must be a valid Polygon.") + if not extract_unique_points(polygon).contains(MultiPoint([start, end])): + raise TypeError("shortest_path_in_polygon: Start and end points must be vertices of the polygon.") + + # First check for the easy case: If the line segment between the start and end points is + # contained in the polygon, then that's the shortest path. (And the rest of the algorithm + # won't work correctly because the simplified polygon will degenerate.) + + if polygon.contains(LineString([start, end])) or polygon.boundary.contains(LineString([start, end])): + return [start, end] + + else: + # First make sure the polygon is oriented so that we're clear on what + # "left" and "right" mean. + polygon = orient(polygon) + + # Create two paths around polygon.boundary to the start and end points, oriented + # appropriately. + boundary_points = list(extract_unique_points(polygon.boundary).geoms) + start_index = boundary_points.index(start) + end_index = boundary_points.index(end) + if start_index < end_index: + path_1 = LineString(boundary_points[start_index:end_index+1]) + path_2 = LineString(boundary_points[end_index:] + boundary_points[0:start_index+1]) + else: + path_1 = LineString(boundary_points[start_index:] + boundary_points[0:end_index+1]) + path_2 = LineString(boundary_points[end_index:start_index+1]) + + if (extract_unique_points(path_1).geoms[0] == start) and (extract_unique_points(path_2).geoms[0] == end): + right_path = path_1 + left_path = shapely.reverse(path_2) + + elif (extract_unique_points(path_2).geoms[0] == start) and (extract_unique_points(path_1).geoms[0] == end): + right_path = path_2 + left_path = shapely.reverse(path_1) + + right_path_points = list(extract_unique_points(right_path).geoms) + left_path_points = list(extract_unique_points(left_path).geoms) + + # Now triangulate the polygon, but only keep the triangles that have one vertex + # in each of the right and left paths (not counting the starting and ending points) + # in order to create a "sleeve" for the shortest path. + if full_triangulation is None: + full_triangulation = triangulate_polygon(polygon) + + triangulation = [] + for triangle in full_triangulation: + if not triangle.boundary.intersection(MultiPoint(right_path_points[1:-1])).is_empty and not triangle.boundary.intersection(MultiPoint(left_path_points[1:-1])).is_empty: + triangulation.append(triangle) + + # Put the triangles for the sleeve in the correct order: + initial_triangle = [triangle for triangle in triangulation if start in extract_unique_points(triangle.boundary).geoms][0] + + ordered_triangulation = [initial_triangle] + triangulation.remove(initial_triangle) + + while len(triangulation) > 0: + leading_triangle = ordered_triangulation[-1] + next_triangle = [triangle for triangle in triangulation if leading_triangle.intersection(triangle).geom_type == "LineString"][0] + ordered_triangulation.append(next_triangle) + triangulation.remove(next_triangle) + + # Regard the sleeve given by the union of these triangles as the "simplified" + # polygon; the shortest path must be contained in this simplfied polygon. + polygon_simplified = unary_union(ordered_triangulation) + + # Now use the ordered triangulation to order the vertices of the simplified polygon, + # as well as the left and right paths restricted to the simplified polygon. + ordered_path_vertices = [start] + right_path_simplified_points = [start] + left_path_simplified_points = [start] + + for triangle in ordered_triangulation: + this_triangle_vertices = set(extract_unique_points(triangle.boundary).geoms) + this_triangle_new_vertices = this_triangle_vertices.difference(set(ordered_path_vertices)) + ordered_path_vertices = ordered_path_vertices + list(this_triangle_new_vertices) + for vertex in this_triangle_new_vertices: + if vertex in right_path_points: + right_path_simplified_points.append(vertex) + if vertex in left_path_points: + left_path_simplified_points.append(vertex) + + # Initialize the found_shortest_path and the left and right funnel edges: + found_shortest_path = [start] + + left_funnel = [left_path_simplified_points[0], left_path_simplified_points[1]] + right_funnel = [right_path_simplified_points[0], right_path_simplified_points[1]] + + # We've already used the first 3 points on this list, so take them out. + ordered_path_vertices = ordered_path_vertices[3:] + + # Now find the shortest path! + for point in ordered_path_vertices: + apex = found_shortest_path[-1] + + if point in left_path_simplified_points: + this_funnel = left_funnel + other_funnel = right_funnel + reflex_sign = 1 + elif point in right_path_simplified_points: + this_funnel = right_funnel + other_funnel = left_funnel + reflex_sign = -1 + + # Start by checking whether this point can see the apex, and if so, the + # apex becomes the new predecessor on this funnel. + # Otherwise, find the first point on this funnel that can see point, + # and check whether this is a *reflex* vertex. If so, cut off the rest + # of this funnel and add point to it. If not, find the first seen + # vertex on the *other* funnel; it's guaranteed to be reflex. Make it + # the new apex, and add the other funnel up to this point to + # found_shortest_path. + if polygon_simplified.contains(LineString([apex, point])) or polygon_simplified.boundary.contains(LineString([apex, point])): + this_funnel = [apex, point] + + else: + for i in range(1, len(this_funnel)): + if polygon_simplified.contains(LineString([this_funnel[i], point])) or polygon_simplified.boundary.contains(LineString([this_funnel[i], point])): + first_seen = i + break + + seg1 = list(LineString([this_funnel[first_seen-1], this_funnel[first_seen]]).coords) + seg2 = list(LineString([this_funnel[first_seen], point]).coords) + + vec1 = (seg1[1][0] - seg1[0][0], seg1[1][1] - seg1[0][1]) + vec2 = (seg2[1][0] - seg2[0][0], seg2[1][1] - seg2[0][1]) + + cross_prod = vec1[0]*vec2[1] - vec1[1]*vec2[0] + + if cross_prod*reflex_sign >= 0: + # If this vertex is reflex: + this_funnel = this_funnel[0:first_seen+1] + [point] + else: + first_seen = min(i for i in range(1, len(other_funnel)) if polygon_simplified.contains(LineString([other_funnel[i], point])) or polygon_simplified.boundary.contains(LineString([other_funnel[i], point]))) + found_shortest_path += other_funnel[1: first_seen+1] + apex = other_funnel[first_seen] + this_funnel = [apex, point] + other_funnel = other_funnel[first_seen:] + + # Reassign this_funnel and other_funnel to left_funnel and right_funnel: + if point in left_path_simplified_points: + left_funnel = this_funnel + right_funnel = other_funnel + elif point in right_path_simplified_points: + right_funnel = this_funnel + left_funnel = other_funnel + + # Still need to complete the path by adding the portion from the current apex to + # the endpoint: + found_shortest_path += left_funnel[1:] + + return found_shortest_path + + +def convexify_hole_boundaries(geometries_df, holes_df): + """ + Partially fill gaps as follows: + (1) Assign any gap that only adjoins 1 geometry to that geometry. + (2) For each gap that adjoins at least 2 geometries, "convexify" the geometries + surrounding the gap by replacing the gap's boundary with each geometry by the + shortest path within the gap between its endpoints and "filling in" the + geometry up to the new boundary. (Exterior boundaries, if any, are left alone.) + If there are only 2 non-exterior (and no exterior) geometries intersecting + the gap, this will fill the gap completely; otherwise it will usually leave one or + more smaller gaps remaining. The convexity of the geometry boundaries will simplify + the process of filling the remaining gap(s). + """ + geometries_df = geometries_df.copy() + holes_df = holes_df.copy() + + completed_holes_df = GeoDataFrame(columns=["region"], geometry=GeoSeries([]), crs=holes_df.crs) + + if len(holes_df) > 0: + holes_to_process = deque(list(holes_df["geometry"])) + this_region = list(holes_df["region"])[0] # All holes in this dataframe should be from the same region + if this_region is None: + pbar = tqdm(desc="Gaps to simplify", total=len(holes_to_process)) + else: + pbar = tqdm(desc=f"Gaps to simplify in region {this_region}", total=len(holes_to_process)) + else: + holes_to_process = deque([]) + pbar = tqdm(desc="Gaps to simplify", total=len(holes_to_process)) + + while len(holes_to_process) > 0: + pbar_increment = 1 + this_hole = holes_to_process.popleft() + this_hole_df = GeoDataFrame(geometry=GeoSeries([this_hole]), crs=holes_df.crs) + this_hole_boundaries_df = construct_hole_boundaries(geometries_df, this_hole_df) + + # Take care of some trivial cases: + if len(set(this_hole_boundaries_df["target"]).difference({-1})) == 0: + # This is probably a small component of a region that isn't assigned to + # any geometry in that region. Just leave it alone and let it be a hole. + if this_region is not None: + print("Found a component of the region at index", this_region, "that does not intersect any geometry assigned to that region.") + + elif len(set(this_hole_boundaries_df["target"]).difference({-1})) == 1: + # Attach the hole to the unique non-exterior geometry that it intersects: + poly_to_add_to = list(set(this_hole_boundaries_df["target"]).difference({-1}))[0] + geometries_df["geometry"][poly_to_add_to] = unary_union([geometries_df["geometry"][poly_to_add_to], this_hole]) + + else: + # Each remaining hole intersects at least 2 geometries nontrivially. + # "Convexify" the geometries surrounding this hole by replacing the + # boundary with each geometry by the shortest path within the hole + # between its endpoints and "filling in" the geometry up to the + # new boundary. (Exterior boundaries, if any, should be left alone.) + new_hole_in_progress = this_hole + this_hole_triangulation = triangulate_polygon(new_hole_in_progress) + + for thb_ind in this_hole_boundaries_df.index: + thb = this_hole_boundaries_df["geometry"][thb_ind] + this_geom = this_hole_boundaries_df["target"][thb_ind] + + if this_geom != -1: + start = list(extract_unique_points(thb).geoms)[0] + end = list(extract_unique_points(thb).geoms)[-1] + + sp = LineString(shortest_path_in_polygon(this_hole, start, end, full_triangulation=this_hole_triangulation)) + + piece_to_add_boundary = unary_union([thb, sp]) + if piece_to_add_boundary.geom_type == "MultiLineString": + piece_to_add_boundary = linemerge(piece_to_add_boundary) + + piece_to_add = unary_union(polygonize(piece_to_add_boundary)) + geometries_df["geometry"][this_geom] = unary_union([geometries_df["geometry"][this_geom], piece_to_add]) + new_hole_in_progress = new_hole_in_progress.difference(piece_to_add) + + if not new_hole_in_progress.is_empty: + if new_hole_in_progress.geom_type == "Polygon": + new_holes = [new_hole_in_progress] + elif new_hole_in_progress.geom_type == "MultiPolygon": + new_holes = list(new_hole_in_progress.geoms) + + for new_hole in new_holes: + new_hole = orient(new_hole) + new_hole_df = GeoDataFrame(geometry=GeoSeries([new_hole]), crs=holes_df.crs) + new_hole_df.insert(0, "region", this_region) + completed_holes_df = pandas.concat([completed_holes_df, new_hole_df]).reset_index(drop=True) + + pbar.update(pbar_increment) + + pbar.close() + + return geometries_df, completed_holes_df diff --git a/notebooks/.ipynb_checkpoints/Maup data management demo-checkpoint.ipynb b/notebooks/.ipynb_checkpoints/Maup data management demo-checkpoint.ipynb new file mode 100644 index 0000000..63cd062 --- /dev/null +++ b/notebooks/.ipynb_checkpoints/Maup data management demo-checkpoint.ipynb @@ -0,0 +1,5636 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1372272b", + "metadata": {}, + "source": [ + "### Demo notebook for data management using Maup, based on Denver County, CO" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "44231122", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import geopandas as gpd\n", + "import maup\n", + "\n", + "maup.progress.enabled = True\n", + "\n", + "pd.options.mode.chained_assignment = None\n", + "pd.set_option('display.max_columns', None)" + ] + }, + { + "cell_type": "markdown", + "id": "ccae0af8", + "metadata": {}, + "source": [ + "### Goal: Add population data and election data from 2016 and 2018 to 2020 precincts." + ] + }, + { + "cell_type": "markdown", + "id": "bc693ada", + "metadata": {}, + "source": [ + "### Here are the shapefiles that we'll need:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2a4e791a", + "metadata": {}, + "outputs": [], + "source": [ + "blocks_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shp\")\n", + "precincts2016_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shp\")\n", + "precincts2018_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shp\")\n", + "precincts2020_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shp\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "b614fcef", + "metadata": {}, + "source": [ + "### Take a look at what information each of these shapefiles contains:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9d840231", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20Rgeometry
0080310005013004080310005013004Block 300414341434G5040S113600+39.7445040-105.0362730000000000000000000000000000000000000000000000000POLYGON ((3130205.522 1696255.187, 3130218.309...
1080310043034019080310043034019Block 401916311231G5040S173060+39.7138779-104.932222733290110027250010002422100100240100023372500100029011002POLYGON ((3159569.782 1685233.398, 3159568.175...
2080310055031008080310055031008Block 100811161126G5040S171200+39.6308773-105.029607138250140176220140143152101401430000033862201401425014017POLYGON ((3132214.313 1655090.209, 3132219.592...
3080310032041004080310032041004Block 100418311631G5040S156480+39.7376302-104.96899931088810007121685100006101168510007830000761081685100006881000712POLYGON ((3149182.154 1694282.223, 3149242.061...
4080310031011022080310031011022Block 102218331831G5040S147590+39.7475700-104.960387338370000011360000012912800000110000003813600000137000001POLYGON ((3151562.306 1697493.116, 3151561.779...
............................................................................................................................................................................................................
10144080310069032005080310069032005Block 200519311932G5040S186260+39.6706226-104.911557251300110613113000001945112601106110011054511130000019300110613POLYGON ((3165300.448 1669539.900, 3165299.396...
10145080319800011033080319800011033Block 103317331733G5040S118600+39.8255061-104.7465304000000000000000000000000000000000000000000000000POLYGON ((3211448.603 1727256.205, 3211526.127...
10146080319800011024080319800011024Block 102417331733G5040S1696250+39.8342487-104.7591599000000000000000000000000000000000000000000000000POLYGON ((3204485.897 1729532.389, 3204502.996...
10147080310004013008080310004013008Block 300814341434G5040S176610+39.7711189-105.019748372690000123680000015825600000210000117236800000169000012POLYGON ((3134776.811 1706335.477, 3134790.024...
10148080310009031012080310009031012Block 101214341434G5040S185850+39.7189546-105.02940608320034033235418034013563414032018192000032208455180340132013403323POLYGON ((3132206.908 1687391.415, 3132550.655...
\n", + "

10149 rows × 67 columns

\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "0 08 031 000501 3004 080310005013004 Block 3004 \n", + "1 08 031 004303 4019 080310043034019 Block 4019 \n", + "2 08 031 005503 1008 080310055031008 Block 1008 \n", + "3 08 031 003204 1004 080310032041004 Block 1004 \n", + "4 08 031 003101 1022 080310031011022 Block 1022 \n", + "... ... ... ... ... ... ... \n", + "10144 08 031 006903 2005 080310069032005 Block 2005 \n", + "10145 08 031 980001 1033 080319800011033 Block 1033 \n", + "10146 08 031 980001 1024 080319800011024 Block 1024 \n", + "10147 08 031 000401 3008 080310004013008 Block 3008 \n", + "10148 08 031 000903 1012 080310009031012 Block 1012 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "0 1 4 34 1 4 34 G5040 S \n", + "1 1 6 31 1 2 31 G5040 S \n", + "2 1 1 16 1 1 26 G5040 S \n", + "3 1 8 31 1 6 31 G5040 S \n", + "4 1 8 33 1 8 31 G5040 S \n", + "... ... ... ... ... ... ... ... ... \n", + "10144 1 9 31 1 9 32 G5040 S \n", + "10145 1 7 33 1 7 33 G5040 S \n", + "10146 1 7 33 1 7 33 G5040 S \n", + "10147 1 4 34 1 4 34 G5040 S \n", + "10148 1 4 34 1 4 34 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "0 11360 0 +39.7445040 -105.0362730 0 0 \n", + "1 17306 0 +39.7138779 -104.9322227 33 29 \n", + "2 17120 0 +39.6308773 -105.0296071 38 25 \n", + "3 15648 0 +39.7376302 -104.9689993 108 88 \n", + "4 14759 0 +39.7475700 -104.9603873 38 37 \n", + "... ... ... ... ... ... ... \n", + "10144 18626 0 +39.6706226 -104.9115572 51 30 \n", + "10145 11860 0 +39.8255061 -104.7465304 0 0 \n", + "10146 169625 0 +39.8342487 -104.7591599 0 0 \n", + "10147 17661 0 +39.7711189 -105.0197483 72 69 \n", + "10148 18585 0 +39.7189546 -105.0294060 83 20 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "0 0 0 0 0 0 0 0 0 \n", + "1 0 1 1 0 0 2 7 25 \n", + "2 0 1 4 0 1 7 6 22 \n", + "3 1 0 0 0 7 12 16 85 \n", + "4 0 0 0 0 0 1 1 36 \n", + "... ... ... ... ... ... ... ... ... \n", + "10144 0 1 1 0 6 13 11 30 \n", + "10145 0 0 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 0 0 \n", + "10147 0 0 0 0 1 2 3 68 \n", + "10148 0 3 4 0 33 23 54 18 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 0 1 0 0 0 \n", + "2 0 1 4 0 1 4 \n", + "3 1 0 0 0 0 6 \n", + "4 0 0 0 0 0 1 \n", + "... ... ... ... ... ... ... \n", + "10144 0 0 0 0 1 9 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 0 0 0 0 1 \n", + "10148 0 3 4 0 1 3 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "0 0 0 0 0 0 0 0 \n", + "1 24 2 21 0 0 1 0 \n", + "2 31 5 21 0 1 4 0 \n", + "3 101 16 85 1 0 0 0 \n", + "4 29 1 28 0 0 0 0 \n", + "... ... ... ... ... ... ... ... \n", + "10144 45 11 26 0 1 1 0 \n", + "10145 0 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 0 \n", + "10147 58 2 56 0 0 0 0 \n", + "10148 56 34 14 0 3 2 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 2 4 0 1 0 \n", + "2 1 4 3 0 0 0 \n", + "3 7 8 3 0 0 0 \n", + "4 0 1 1 0 0 0 \n", + "... ... ... ... ... ... ... \n", + "10144 6 11 0 0 1 1 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 2 1 0 0 0 \n", + "10148 18 19 2 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 0 2 33 7 25 \n", + "2 0 0 3 38 6 22 \n", + "3 0 7 6 108 16 85 \n", + "4 0 0 0 38 1 36 \n", + "... ... ... ... ... ... ... \n", + "10144 0 5 4 51 11 30 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 1 1 72 3 68 \n", + "10148 0 32 20 84 55 18 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 0 1 0 0 0 \n", + "2 0 1 4 0 1 4 \n", + "3 1 0 0 0 0 6 \n", + "4 0 0 0 0 0 1 \n", + "... ... ... ... ... ... ... \n", + "10144 0 0 0 0 1 9 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 0 0 0 0 1 \n", + "10148 0 3 4 0 1 3 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "0 0 0 0 0 0 0 0 \n", + "1 29 0 1 1 0 0 2 \n", + "2 25 0 1 4 0 1 7 \n", + "3 88 1 0 0 0 7 12 \n", + "4 37 0 0 0 0 0 1 \n", + "... ... ... ... ... ... ... ... \n", + "10144 30 0 1 1 0 6 13 \n", + "10145 0 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 0 \n", + "10147 69 0 0 0 0 1 2 \n", + "10148 20 1 3 4 0 33 23 \n", + "\n", + " geometry \n", + "0 POLYGON ((3130205.522 1696255.187, 3130218.309... \n", + "1 POLYGON ((3159569.782 1685233.398, 3159568.175... \n", + "2 POLYGON ((3132214.313 1655090.209, 3132219.592... \n", + "3 POLYGON ((3149182.154 1694282.223, 3149242.061... \n", + "4 POLYGON ((3151562.306 1697493.116, 3151561.779... \n", + "... ... \n", + "10144 POLYGON ((3165300.448 1669539.900, 3165299.396... \n", + "10145 POLYGON ((3211448.603 1727256.205, 3211526.127... \n", + "10146 POLYGON ((3204485.897 1729532.389, 3204502.996... \n", + "10147 POLYGON ((3134776.811 1706335.477, 3134790.024... \n", + "10148 POLYGON ((3132206.908 1687391.415, 3132550.655... \n", + "\n", + "[10149 rows x 67 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "22521abd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
COUNTYFPNAMELSADG16PREDCliG16PRERTruG16PRELJohG16PREGSteG16PREIMcMG16PREOthG16USSDBenG16USSRGleG16USSLWilG16USSGMenG16USSOthgeometry
0031Denver 10172316334211973016426179POLYGON ((3125703.139 1681147.799, 3125702.944...
1031Denver 10275716623257773817428149POLYGON ((3129675.455 1682118.384, 3129674.269...
2031Denver 103678241442241470322734189POLYGON ((3129642.865 1679001.901, 3129708.711...
3031Denver 20367711433102565114320152POLYGON ((3146588.324 1694971.352, 3146587.373...
4031Denver 2089868859384292210640545POLYGON ((3146274.972 1693705.886, 3146340.216...
.............................................
341031Denver 916929449722381092648349167POLYGON ((3164780.350 1664424.144, 3165010.513...
342031Denver 92751628534912652130620121POLYGON ((3163781.804 1663194.808, 3163746.863...
343031Denver 9237393365525316732348561412POLYGON ((3173239.768 1665060.235, 3173239.714...
344031Denver 9375523354587355335519102POLYGON ((3173156.086 1655193.513, 3173161.344...
345031Denver 9334032222811834082311280POLYGON ((3169603.158 1661791.346, 3169558.084...
\n", + "

346 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " COUNTYFP NAMELSAD G16PREDCli G16PRERTru G16PRELJoh G16PREGSte \\\n", + "0 031 Denver 101 723 163 34 21 \n", + "1 031 Denver 102 757 166 23 25 \n", + "2 031 Denver 103 678 241 44 22 \n", + "3 031 Denver 203 677 114 33 10 \n", + "4 031 Denver 208 986 88 59 38 \n", + ".. ... ... ... ... ... ... \n", + "341 031 Denver 916 929 449 72 23 \n", + "342 031 Denver 927 516 285 34 9 \n", + "343 031 Denver 923 739 336 55 25 \n", + "344 031 Denver 937 552 335 45 8 \n", + "345 031 Denver 933 403 222 28 11 \n", + "\n", + " G16PREIMcM G16PREOth G16USSDBen G16USSRGle G16USSLWil G16USSGMen \\\n", + "0 1 9 730 164 26 17 \n", + "1 7 7 738 174 28 14 \n", + "2 4 14 703 227 34 18 \n", + "3 2 5 651 143 20 15 \n", + "4 4 2 922 106 40 54 \n", + ".. ... ... ... ... ... ... \n", + "341 8 10 926 483 49 16 \n", + "342 12 6 521 306 20 12 \n", + "343 3 16 732 348 56 14 \n", + "344 7 3 553 355 19 10 \n", + "345 8 3 408 231 12 8 \n", + "\n", + " G16USSOth geometry \n", + "0 9 POLYGON ((3125703.139 1681147.799, 3125702.944... \n", + "1 9 POLYGON ((3129675.455 1682118.384, 3129674.269... \n", + "2 9 POLYGON ((3129642.865 1679001.901, 3129708.711... \n", + "3 2 POLYGON ((3146588.324 1694971.352, 3146587.373... \n", + "4 5 POLYGON ((3146274.972 1693705.886, 3146340.216... \n", + ".. ... ... \n", + "341 7 POLYGON ((3164780.350 1664424.144, 3165010.513... \n", + "342 1 POLYGON ((3163781.804 1663194.808, 3163746.863... \n", + "343 12 POLYGON ((3173239.768 1665060.235, 3173239.714... \n", + "344 2 POLYGON ((3173156.086 1655193.513, 3173161.344... \n", + "345 0 POLYGON ((3169603.158 1661791.346, 3169558.084... \n", + "\n", + "[346 rows x 14 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8849c8cf", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
COUNTYFPVTDSTNAMECD116FPSLDUSTSLDLSTPRECIDAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RUSH18DUSH18RTOTPOPNH_WHITENH_BLACKNH_AMINNH_ASIANNH_NHPINH_OTHERNH_2MOREHISPH_WHITEH_BLACKH_AMINH_ASIANH_NHPIH_OTHERH_2MOREVAPHVAPWVAPBVAPAMINVAPASIANVAPNHPIVAPOTHERVAP2MOREVAPgeometry
0031031745Denver 7450103300713307167451084303107331610463341137272105025411212640000000000000000000000000POLYGON ((3167607.595 1714575.543, 3167607.566...
1031031540Denver 5400103400513405165407601937512157152567681967131767711790000000000000000000000000POLYGON ((3139202.439 1699577.829, 3139240.848...
2031031744Denver 74401033007133071674410003091003312994307105128097126710442619131301100028001002529128313011000POLYGON ((3180920.093 1705405.138, 3180920.088...
3031031530Denver 53001031005131051653064562864867462566517210729330082211000091171221062933008POLYGON ((3144153.912 1694345.882, 3144155.827...
4031031940Denver 940010310091310916940395267400273386271422254375247417244274258108002420000202684252108002POLYGON ((3165244.173 1654452.896, 3165241.340...
..........................................................................................................................................
351031031102Denver 102010340011340116102577134581129572135556130559126585123416943411856422053031011553307913013101152653186435355393210117POLYGON ((3133625.259 1681579.939, 3133626.217...
352031031101Denver 101010340011340116101615122619123615125613127599122634115441457866341092734357717802598441509154278521134993128792521POLYGON ((3125719.959 1678986.315, 3125719.959...
353031031924Denver 92401031009131091692446384476874738546894458864688744631586746196853620412096371555048659364184114296401856826133POLYGON ((3180600.942 1663222.850, 3180600.864...
354031031604Denver 604010330061330616604717133734126718130745121708114737116457889314081899046164108941326267050810632206877959911361944101POLYGON ((3173051.077 1693417.146, 3173051.078...
355031031746Denver 7460103300713307167464621464691464461604951234451274901270000000000000000000000000POLYGON ((3178132.638 1716671.381, 3178133.440...
\n", + "

356 rows × 45 columns

\n", + "
" + ], + "text/plain": [ + " COUNTYFP VTDST NAME CD116FP SLDUST SLDLST PRECID AG18D \\\n", + "0 031 031745 Denver 745 01 033 007 1330716745 1084 \n", + "1 031 031540 Denver 540 01 034 005 1340516540 760 \n", + "2 031 031744 Denver 744 01 033 007 1330716744 1000 \n", + "3 031 031530 Denver 530 01 031 005 1310516530 64 \n", + "4 031 031940 Denver 940 01 031 009 1310916940 395 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 031 031102 Denver 102 01 034 001 1340116102 577 \n", + "352 031 031101 Denver 101 01 034 001 1340116101 615 \n", + "353 031 031924 Denver 924 01 031 009 1310916924 463 \n", + "354 031 031604 Denver 604 01 033 006 1330616604 717 \n", + "355 031 031746 Denver 746 01 033 007 1330716746 462 \n", + "\n", + " AG18R SOS18D SOS18R TRE18D TRE18R GOV18D GOV18R REG18D REG18R \\\n", + "0 303 1073 316 1046 334 1137 272 1050 254 \n", + "1 193 751 215 715 256 768 196 713 176 \n", + "2 309 1003 312 994 307 1051 280 971 267 \n", + "3 5 62 8 64 8 67 4 62 5 \n", + "4 267 400 273 386 271 422 254 375 247 \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "351 134 581 129 572 135 556 130 559 126 \n", + "352 122 619 123 615 125 613 127 599 122 \n", + "353 84 476 87 473 85 468 94 458 86 \n", + "354 133 734 126 718 130 745 121 708 114 \n", + "355 146 469 146 446 160 495 123 445 127 \n", + "\n", + " USH18D USH18R TOTPOP NH_WHITE NH_BLACK NH_AMIN NH_ASIAN NH_NHPI \\\n", + "0 1121 264 0 0 0 0 0 0 \n", + "1 771 179 0 0 0 0 0 0 \n", + "2 1044 261 91 31 30 1 1 0 \n", + "3 66 5 172 107 29 3 3 0 \n", + "4 417 244 274 258 1 0 8 0 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 585 123 4169 434 118 56 422 0 \n", + "352 634 115 4414 578 66 34 109 2 \n", + "353 468 87 4463 1586 746 19 685 3 \n", + "354 737 116 4578 893 1408 18 990 4 \n", + "355 490 127 0 0 0 0 0 0 \n", + "\n", + " NH_OTHER NH_2MORE HISP H_WHITE H_BLACK H_AMIN H_ASIAN H_NHPI \\\n", + "0 0 0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 0 \n", + "2 0 0 28 0 0 1 0 0 \n", + "3 0 8 22 11 0 0 0 0 \n", + "4 0 2 4 2 0 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 5 30 3101 1553 30 79 13 0 \n", + "352 7 34 3577 1780 25 98 4 4 \n", + "353 6 204 1209 637 15 5 5 0 \n", + "354 6 164 1089 413 26 26 7 0 \n", + "355 0 0 0 0 0 0 0 0 \n", + "\n", + " H_OTHER H_2MORE VAP HVAP WVAP BVAP AMINVAP ASIANVAP NHPIVAP \\\n", + "0 0 0 0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 0 0 \n", + "2 25 2 91 28 31 30 1 1 0 \n", + "3 9 1 171 22 106 29 3 3 0 \n", + "4 2 0 268 4 252 1 0 8 0 \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "351 1310 115 2653 1864 353 55 39 321 0 \n", + "352 1509 154 2785 2113 499 31 28 79 2 \n", + "353 486 59 3641 841 1429 640 18 568 2 \n", + "354 508 106 3220 687 795 991 13 619 4 \n", + "355 0 0 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP 2MOREVAP geometry \n", + "0 0 0 POLYGON ((3167607.595 1714575.543, 3167607.566... \n", + "1 0 0 POLYGON ((3139202.439 1699577.829, 3139240.848... \n", + "2 0 0 POLYGON ((3180920.093 1705405.138, 3180920.088... \n", + "3 0 8 POLYGON ((3144153.912 1694345.882, 3144155.827... \n", + "4 0 2 POLYGON ((3165244.173 1654452.896, 3165241.340... \n", + ".. ... ... ... \n", + "351 1 17 POLYGON ((3133625.259 1681579.939, 3133626.217... \n", + "352 5 21 POLYGON ((3125719.959 1678986.315, 3125719.959... \n", + "353 6 133 POLYGON ((3180600.942 1663222.850, 3180600.864... \n", + "354 4 101 POLYGON ((3173051.077 1693417.146, 3173051.078... \n", + "355 0 0 POLYGON ((3178132.638 1716671.381, 3178133.440... \n", + "\n", + "[356 rows x 45 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2018_df" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "f6af11f7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRECIDSTATEFP20COUNTYFP20NAMECD18SUD18SLD18VTDST20NOTESPRES20DSEN20DPRES20RSEN20Rgeometry
0131051653008031Denver 53013105031530None73751612POLYGON ((3144474.619 1694874.799, 3144509.314...
1131061660508031Denver 60513106031605None720702110137POLYGON ((3162200.074 1691024.061, 3162164.358...
2131061660608031Denver 60613106031606None832819107121POLYGON ((3162459.795 1693616.802, 3162459.063...
3131061660708031Denver 60713106031607None597595102113POLYGON ((3165142.137 1693683.542, 3165141.880...
4131061660908031Denver 60913106031609None2412335971POLYGON ((3152968.867 1690916.131, 3153065.627...
.............................................
351133071674608031Denver 74613307031746None17681699386469POLYGON ((3172960.862 1716278.085, 3172953.814...
352134051654008031Denver 54013405031540None15581477416489POLYGON ((3139192.261 1699587.973, 3139181.643...
353134051654108031Denver 54113405031541None15541504246297POLYGON ((3148046.593 1705187.168, 3148047.303...
354134051654208031Denver 54213405031542None881828200247POLYGON ((3133494.258 1698947.396, 3133494.426...
355134051654308031Denver 54313405031543None8378369390POLYGON ((3140186.339 1689621.761, 3140186.210...
\n", + "

356 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " PRECID STATEFP20 COUNTYFP20 NAME CD18 SUD18 SLD18 VTDST20 \\\n", + "0 1310516530 08 031 Denver 530 1 31 05 031530 \n", + "1 1310616605 08 031 Denver 605 1 31 06 031605 \n", + "2 1310616606 08 031 Denver 606 1 31 06 031606 \n", + "3 1310616607 08 031 Denver 607 1 31 06 031607 \n", + "4 1310616609 08 031 Denver 609 1 31 06 031609 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 1330716746 08 031 Denver 746 1 33 07 031746 \n", + "352 1340516540 08 031 Denver 540 1 34 05 031540 \n", + "353 1340516541 08 031 Denver 541 1 34 05 031541 \n", + "354 1340516542 08 031 Denver 542 1 34 05 031542 \n", + "355 1340516543 08 031 Denver 543 1 34 05 031543 \n", + "\n", + " NOTES PRES20D SEN20D PRES20R SEN20R \\\n", + "0 None 73 75 16 12 \n", + "1 None 720 702 110 137 \n", + "2 None 832 819 107 121 \n", + "3 None 597 595 102 113 \n", + "4 None 241 233 59 71 \n", + ".. ... ... ... ... ... \n", + "351 None 1768 1699 386 469 \n", + "352 None 1558 1477 416 489 \n", + "353 None 1554 1504 246 297 \n", + "354 None 881 828 200 247 \n", + "355 None 837 836 93 90 \n", + "\n", + " geometry \n", + "0 POLYGON ((3144474.619 1694874.799, 3144509.314... \n", + "1 POLYGON ((3162200.074 1691024.061, 3162164.358... \n", + "2 POLYGON ((3162459.795 1693616.802, 3162459.063... \n", + "3 POLYGON ((3165142.137 1693683.542, 3165141.880... \n", + "4 POLYGON ((3152968.867 1690916.131, 3153065.627... \n", + ".. ... \n", + "351 POLYGON ((3172960.862 1716278.085, 3172953.814... \n", + "352 POLYGON ((3139192.261 1699587.973, 3139181.643... \n", + "353 POLYGON ((3148046.593 1705187.168, 3148047.303... \n", + "354 POLYGON ((3133494.258 1698947.396, 3133494.426... \n", + "355 POLYGON ((3140186.339 1689621.761, 3140186.210... \n", + "\n", + "[356 rows x 14 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df" + ] + }, + { + "cell_type": "markdown", + "id": "b93f583a", + "metadata": {}, + "source": [ + "### So the blocks file has lots of population data and the precinct files each have election data for one year.\n", + "### It might be convenient to rename some of the election columns in the 2016 file so that they have the same format as the other years." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ad1aa151", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['COUNTYFP', 'NAMELSAD', 'G16PREDCli', 'G16PRERTru', 'G16PRELJoh',\n", + " 'G16PREGSte', 'G16PREIMcM', 'G16PREOth', 'G16USSDBen', 'G16USSRGle',\n", + " 'G16USSLWil', 'G16USSGMen', 'G16USSOth', 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "4cd76b85", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2016_df = precincts2016_df.rename(columns = {\n", + " 'G16PREDCli': 'PRES16D',\n", + " 'G16PRERTru': 'PRES16R',\n", + " 'G16USSDBen': 'SEN16D',\n", + " 'G16USSRGle': 'SEN16R'\n", + "})" + ] + }, + { + "cell_type": "markdown", + "id": "6118e4a4", + "metadata": {}, + "source": [ + "### In order to move all this data around, we'll need assignments of blocks to precincts for each of the precinct files." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b226cfc9", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 786.59it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 371.12it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 892.50it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:01<00:00, 347.47it/s]\n", + "100%|████████████████████████████████████████| 346/346 [00:00<00:00, 725.00it/s]\n", + "100%|████████████████████████████████████████| 346/346 [00:01<00:00, 311.78it/s]\n" + ] + } + ], + "source": [ + "blocks_to_precincts2020_assignment = maup.assign(blocks_df.geometry, precincts2020_df.geometry)\n", + "blocks_to_precincts2018_assignment = maup.assign(blocks_df.geometry, precincts2018_df.geometry)\n", + "blocks_to_precincts2016_assignment = maup.assign(blocks_df.geometry, precincts2016_df.geometry)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cdcaf044", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 300\n", + "1 56\n", + "2 41\n", + "3 73\n", + "4 262\n", + " ... \n", + "10144 96\n", + "10145 234\n", + "10146 234\n", + "10147 292\n", + "10148 313\n", + "Length: 10149, dtype: int64" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_to_precincts2020_assignment" + ] + }, + { + "cell_type": "markdown", + "id": "fbed2841", + "metadata": {}, + "source": [ + "### First step: Aggregate population data from blocks to 2020 precincts.\n", + "### (We'll just use a few of the population columns for this demo.)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "61cf6d7e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['STATEFP20', 'COUNTYFP20', 'TRACTCE20', 'BLOCKCE20', 'GEOID20',\n", + " 'NAME20', 'CD116', 'SLDL20', 'SLDU20', 'CD118', 'SLDL22', 'SLDU22',\n", + " 'MTFCC20', 'FUNCSTAT20', 'ALAND20', 'AWATER20', 'INTPTLAT20',\n", + " 'INTPTLON20', 'TOTPOP20', 'WHITE20', 'BLACK20', 'AMIN20', 'ASIAN20',\n", + " 'NHPI20', 'OTHER20', '2MORE20', 'HISP20', 'NH_WHITE20', 'NH_BLACK20',\n", + " 'NH_AMIN20', 'NH_ASIAN20', 'NH_NHPI20', 'NH_OTHER20', 'NH_2MORE20',\n", + " 'VAP20', 'HVAP20', 'WVAP20', 'BVAP20', 'AMINVAP20', 'ASIANVAP20',\n", + " 'NHPIVAP20', 'OTHERVAP20', '2MOREVAP20', 'H_WHITE20', 'H_BLACK20',\n", + " 'H_AMIN20', 'H_ASIAN20', 'H_NHPI20', 'H_OTHER20', 'H_2MORE20',\n", + " 'TOTPOP20R', 'HISP20R', 'NHWHITE20R', 'NHBLACK20R', 'NHAMIN20R',\n", + " 'NHASIAN20R', 'NH_NHPI20R', 'NHOTHER20R', 'NH2MORE20R', 'WHITE20R',\n", + " 'BLACK20R', 'AMIN20R', 'ASIAN20R', 'NHPI20R', 'OTHER20R', '2MORE20R',\n", + " 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d1bf7b5e", + "metadata": {}, + "outputs": [], + "source": [ + "pop_cols = ['TOTPOP20', 'VAP20']" + ] + }, + { + "cell_type": "markdown", + "id": "5fd55604", + "metadata": {}, + "source": [ + "### We can use the assignment of blocks to precincts to aggregate populations from blocks up to precincts:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "5da80e36", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2020_df[pop_cols] = blocks_df[pop_cols].groupby(blocks_to_precincts2020_assignment).sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "9905a96f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TOTPOP20VAP20
0171171
11132930
212071047
3926738
4371283
\n", + "
" + ], + "text/plain": [ + " TOTPOP20 VAP20\n", + "0 171 171\n", + "1 1132 930\n", + "2 1207 1047\n", + "3 926 738\n", + "4 371 283" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[pop_cols].head()" + ] + }, + { + "cell_type": "markdown", + "id": "36a553ac", + "metadata": {}, + "source": [ + "### Check that we didn't gain/lose any population in the aggregation step:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "4d6bbc16", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "TOTPOP20 715522\n", + "VAP20 581062\n", + "dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[pop_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "e575bd32", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "TOTPOP20 715522\n", + "VAP20 581062\n", + "dtype: int64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[pop_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "1d9a3f5d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRECIDSTATEFP20COUNTYFP20NAMECD18SUD18SLD18VTDST20NOTESPRES20DSEN20DPRES20RSEN20RgeometryTOTPOP20VAP20
0131051653008031Denver 53013105031530None73751612POLYGON ((3144474.619 1694874.799, 3144509.314...171171
1131061660508031Denver 60513106031605None720702110137POLYGON ((3162200.074 1691024.061, 3162164.358...1132930
2131061660608031Denver 60613106031606None832819107121POLYGON ((3162459.795 1693616.802, 3162459.063...12071047
3131061660708031Denver 60713106031607None597595102113POLYGON ((3165142.137 1693683.542, 3165141.880...926738
4131061660908031Denver 60913106031609None2412335971POLYGON ((3152968.867 1690916.131, 3153065.627...371283
...................................................
351133071674608031Denver 74613307031746None17681699386469POLYGON ((3172960.862 1716278.085, 3172953.814...32842196
352134051654008031Denver 54013405031540None15581477416489POLYGON ((3139192.261 1699587.973, 3139181.643...33663268
353134051654108031Denver 54113405031541None15541504246297POLYGON ((3148046.593 1705187.168, 3148047.303...27742643
354134051654208031Denver 54213405031542None881828200247POLYGON ((3133494.258 1698947.396, 3133494.426...19941830
355134051654308031Denver 54313405031543None8378369390POLYGON ((3140186.339 1689621.761, 3140186.210...16781367
\n", + "

356 rows × 16 columns

\n", + "
" + ], + "text/plain": [ + " PRECID STATEFP20 COUNTYFP20 NAME CD18 SUD18 SLD18 VTDST20 \\\n", + "0 1310516530 08 031 Denver 530 1 31 05 031530 \n", + "1 1310616605 08 031 Denver 605 1 31 06 031605 \n", + "2 1310616606 08 031 Denver 606 1 31 06 031606 \n", + "3 1310616607 08 031 Denver 607 1 31 06 031607 \n", + "4 1310616609 08 031 Denver 609 1 31 06 031609 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 1330716746 08 031 Denver 746 1 33 07 031746 \n", + "352 1340516540 08 031 Denver 540 1 34 05 031540 \n", + "353 1340516541 08 031 Denver 541 1 34 05 031541 \n", + "354 1340516542 08 031 Denver 542 1 34 05 031542 \n", + "355 1340516543 08 031 Denver 543 1 34 05 031543 \n", + "\n", + " NOTES PRES20D SEN20D PRES20R SEN20R \\\n", + "0 None 73 75 16 12 \n", + "1 None 720 702 110 137 \n", + "2 None 832 819 107 121 \n", + "3 None 597 595 102 113 \n", + "4 None 241 233 59 71 \n", + ".. ... ... ... ... ... \n", + "351 None 1768 1699 386 469 \n", + "352 None 1558 1477 416 489 \n", + "353 None 1554 1504 246 297 \n", + "354 None 881 828 200 247 \n", + "355 None 837 836 93 90 \n", + "\n", + " geometry TOTPOP20 VAP20 \n", + "0 POLYGON ((3144474.619 1694874.799, 3144509.314... 171 171 \n", + "1 POLYGON ((3162200.074 1691024.061, 3162164.358... 1132 930 \n", + "2 POLYGON ((3162459.795 1693616.802, 3162459.063... 1207 1047 \n", + "3 POLYGON ((3165142.137 1693683.542, 3165141.880... 926 738 \n", + "4 POLYGON ((3152968.867 1690916.131, 3153065.627... 371 283 \n", + ".. ... ... ... \n", + "351 POLYGON ((3172960.862 1716278.085, 3172953.814... 3284 2196 \n", + "352 POLYGON ((3139192.261 1699587.973, 3139181.643... 3366 3268 \n", + "353 POLYGON ((3148046.593 1705187.168, 3148047.303... 2774 2643 \n", + "354 POLYGON ((3133494.258 1698947.396, 3133494.426... 1994 1830 \n", + "355 POLYGON ((3140186.339 1689621.761, 3140186.210... 1678 1367 \n", + "\n", + "[356 rows x 16 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df" + ] + }, + { + "cell_type": "markdown", + "id": "35bf6d3f", + "metadata": {}, + "source": [ + "### Next step: Disaggregate votes from all three precinct files to blocks, using Voting Age Population (VAP20) for the weights. " + ] + }, + { + "cell_type": "markdown", + "id": "c78b0a30", + "metadata": {}, + "source": [ + "### 2016:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "c3c840ad", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['COUNTYFP', 'NAMELSAD', 'PRES16D', 'PRES16R', 'G16PRELJoh',\n", + " 'G16PREGSte', 'G16PREIMcM', 'G16PREOth', 'SEN16D', 'SEN16R',\n", + " 'G16USSLWil', 'G16USSGMen', 'G16USSOth', 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "34f07c2b", + "metadata": {}, + "outputs": [], + "source": [ + "elec2016_cols = ['PRES16D', 'PRES16R', 'SEN16D', 'SEN16R']" + ] + }, + { + "cell_type": "markdown", + "id": "6b37197a", + "metadata": {}, + "source": [ + "### Now we assign a \"weight\" to each block that's equal to the fraction of the total population of ALL blocks assigned to the same precinct that's contained in that block.\n", + "### IMPORTANT: Some precincts have zero population, which leads to a zero denominator and an undefined weight for all blocks assigned to that precinct. We can solve this problem by replacing NaN values with zeros.\n", + "### Occasionally some zero-population precincts contain a small, nonzero number of votes, and these votes will be lost in the disaggregation from precincts down to blocks. " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "e4694fc7", + "metadata": {}, + "outputs": [], + "source": [ + "weights2016 = blocks_df[\"VAP20\"] / blocks_to_precincts2016_assignment.map(blocks_df[\"VAP20\"].groupby(blocks_to_precincts2016_assignment).sum())\n", + "weights2016 = weights2016.fillna(0)" + ] + }, + { + "cell_type": "markdown", + "id": "97376acf", + "metadata": {}, + "source": [ + "### Here's the disaggregation step, using the \"prorate\" function:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "3fa669eb", + "metadata": {}, + "outputs": [], + "source": [ + "prorated2016 = maup.prorate(blocks_to_precincts2016_assignment, precincts2016_df[elec2016_cols], weights2016)\n", + "blocks_df[elec2016_cols] = prorated2016" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d4e5120a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRES16DPRES16RSEN16DSEN16R
00.0000000.0000000.0000000.000000
115.3982305.13982315.1008855.883186
210.4026858.01006711.1308728.197315
361.0108475.95728858.5457636.505085
417.1045581.57439716.4825742.313003
\n", + "
" + ], + "text/plain": [ + " PRES16D PRES16R SEN16D SEN16R\n", + "0 0.000000 0.000000 0.000000 0.000000\n", + "1 15.398230 5.139823 15.100885 5.883186\n", + "2 10.402685 8.010067 11.130872 8.197315\n", + "3 61.010847 5.957288 58.545763 6.505085\n", + "4 17.104558 1.574397 16.482574 2.313003" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2016_cols].head()" + ] + }, + { + "cell_type": "markdown", + "id": "b048f421", + "metadata": {}, + "source": [ + "### Check to see whether we gained/lost any votes:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "ede614d4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551\n", + "PRES16R 62690\n", + "SEN16D 238774\n", + "SEN16R 71078\n", + "dtype: int64" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "b1b1bc43", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551.0\n", + "PRES16R 62690.0\n", + "SEN16D 238774.0\n", + "SEN16R 71078.0\n", + "dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "adfe2fa6", + "metadata": {}, + "source": [ + "### 2018:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "ff59b2e4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['COUNTYFP', 'VTDST', 'NAME', 'CD116FP', 'SLDUST', 'SLDLST', 'PRECID',\n", + " 'AG18D', 'AG18R', 'SOS18D', 'SOS18R', 'TRE18D', 'TRE18R', 'GOV18D',\n", + " 'GOV18R', 'REG18D', 'REG18R', 'USH18D', 'USH18R', 'TOTPOP', 'NH_WHITE',\n", + " 'NH_BLACK', 'NH_AMIN', 'NH_ASIAN', 'NH_NHPI', 'NH_OTHER', 'NH_2MORE',\n", + " 'HISP', 'H_WHITE', 'H_BLACK', 'H_AMIN', 'H_ASIAN', 'H_NHPI', 'H_OTHER',\n", + " 'H_2MORE', 'VAP', 'HVAP', 'WVAP', 'BVAP', 'AMINVAP', 'ASIANVAP',\n", + " 'NHPIVAP', 'OTHERVAP', '2MOREVAP', 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2018_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "5cc8bbe8", + "metadata": {}, + "outputs": [], + "source": [ + "elec2018_cols = ['AG18D', 'AG18R', 'SOS18D', 'SOS18R', 'TRE18D', 'TRE18R', 'GOV18D', 'GOV18R', 'REG18D', 'REG18R']" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "6189e09e", + "metadata": {}, + "outputs": [], + "source": [ + "weights2018 = blocks_df[\"VAP20\"] / blocks_to_precincts2018_assignment.map(blocks_df[\"VAP20\"].groupby(blocks_to_precincts2018_assignment).sum())\n", + "weights2018 = weights2018.fillna(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "ec05ed4c", + "metadata": {}, + "outputs": [], + "source": [ + "prorated2018 = maup.prorate(blocks_to_precincts2018_assignment, precincts2018_df[elec2018_cols], weights2018)\n", + "blocks_df[elec2018_cols] = prorated2018" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "339eefea", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18R
00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
114.3787616.22300913.6141597.00885013.7415936.43539814.3575226.22300913.4654875.883186
29.8617457.53154410.4442957.05302010.4234906.96979910.0697997.07382610.0281886.449664
359.7783055.27254258.0664417.12135658.8196616.43661061.1477974.38237357.5871194.519322
417.9403492.02144817.4738612.46849917.4155502.54624718.3873991.82707817.0073731.768767
\n", + "
" + ], + "text/plain": [ + " AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", + "1 14.378761 6.223009 13.614159 7.008850 13.741593 6.435398 14.357522 \n", + "2 9.861745 7.531544 10.444295 7.053020 10.423490 6.969799 10.069799 \n", + "3 59.778305 5.272542 58.066441 7.121356 58.819661 6.436610 61.147797 \n", + "4 17.940349 2.021448 17.473861 2.468499 17.415550 2.546247 18.387399 \n", + "\n", + " GOV18R REG18D REG18R \n", + "0 0.000000 0.000000 0.000000 \n", + "1 6.223009 13.465487 5.883186 \n", + "2 7.073826 10.028188 6.449664 \n", + "3 4.382373 57.587119 4.519322 \n", + "4 1.827078 17.007373 1.768767 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2018_cols].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "244b1927", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798\n", + "AG18R 64532\n", + "SOS18D 232255\n", + "SOS18R 67147\n", + "TRE18D 230382\n", + "TRE18R 66728\n", + "GOV18D 238762\n", + "GOV18R 60151\n", + "REG18D 223947\n", + "REG18R 57322\n", + "dtype: int64" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2018_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "8d8bf1bc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798.0\n", + "AG18R 64532.0\n", + "SOS18D 232255.0\n", + "SOS18R 67147.0\n", + "TRE18D 230382.0\n", + "TRE18R 66728.0\n", + "GOV18D 238762.0\n", + "GOV18R 60151.0\n", + "REG18D 223947.0\n", + "REG18R 57322.0\n", + "dtype: float64" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "45e4bc1b", + "metadata": {}, + "source": [ + "### If the goal is just to put all the data on 2020 precincts, we don't really have to disaggregate 2020 election data to blocks - but we might also want all the election data on blocks, so we'll go ahead and do it for completeness." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "b75e9d3c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['PRECID', 'STATEFP20', 'COUNTYFP20', 'NAME', 'CD18', 'SUD18', 'SLD18',\n", + " 'VTDST20', 'NOTES', 'PRES20D', 'SEN20D', 'PRES20R', 'SEN20R',\n", + " 'geometry', 'TOTPOP20', 'VAP20'],\n", + " dtype='object')" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "fe04f13c", + "metadata": {}, + "outputs": [], + "source": [ + "elec2020_cols = ['PRES20D', 'SEN20D', 'PRES20R', 'SEN20R']" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "735860ac", + "metadata": {}, + "outputs": [], + "source": [ + "weights2020 = blocks_df[\"VAP20\"] / blocks_to_precincts2020_assignment.map(blocks_df[\"VAP20\"].groupby(blocks_to_precincts2020_assignment).sum())\n", + "weights2020 = weights2020.fillna(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "1ace13c7", + "metadata": {}, + "outputs": [], + "source": [ + "prorated2020 = maup.prorate(blocks_to_precincts2020_assignment, precincts2020_df[elec2020_cols], weights2020)\n", + "blocks_df[elec2020_cols] = prorated2020" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "fdfe9942", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRES20DSEN20DPRES20RSEN20R
00.0000000.0000000.0000000.000000
117.33097316.5663725.4371686.711504
213.89798713.6899338.4261748.863087
369.02237367.5844075.4094926.847458
421.67225220.9142091.7298932.468499
\n", + "
" + ], + "text/plain": [ + " PRES20D SEN20D PRES20R SEN20R\n", + "0 0.000000 0.000000 0.000000 0.000000\n", + "1 17.330973 16.566372 5.437168 6.711504\n", + "2 13.897987 13.689933 8.426174 8.863087\n", + "3 69.022373 67.584407 5.409492 6.847458\n", + "4 21.672252 20.914209 1.729893 2.468499" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2020_cols].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "0c02e6de", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES20D 313293\n", + "SEN20D 305602\n", + "PRES20R 71618\n", + "SEN20R 80163\n", + "dtype: int64" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[elec2020_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "530aa5d6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES20D 313293.0\n", + "SEN20D 305602.0\n", + "PRES20R 71618.0\n", + "SEN20R 80163.0\n", + "dtype: float64" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2020_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "4bfcf5fb", + "metadata": {}, + "source": [ + "### Last step: Aggregate 2016 and 2018 election data up from blocks to 2020 precincts." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "22c17f41", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2020_df[elec2016_cols] = blocks_df[elec2016_cols].groupby(blocks_to_precincts2020_assignment).sum()\n", + "precincts2020_df[elec2018_cols] = blocks_df[elec2018_cols].groupby(blocks_to_precincts2020_assignment).sum()" + ] + }, + { + "cell_type": "markdown", + "id": "df0347e2", + "metadata": {}, + "source": [ + "### Check to see whether we gained/lost any votes:" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "c5555a15", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551.0\n", + "PRES16R 62690.0\n", + "SEN16D 238774.0\n", + "SEN16R 71078.0\n", + "dtype: float64" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "4e2ce9ec", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551.0\n", + "PRES16R 62690.0\n", + "SEN16D 238774.0\n", + "SEN16R 71078.0\n", + "dtype: float64" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "c6b8d1c7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798.0\n", + "AG18R 64532.0\n", + "SOS18D 232255.0\n", + "SOS18R 67147.0\n", + "TRE18D 230382.0\n", + "TRE18R 66728.0\n", + "GOV18D 238762.0\n", + "GOV18R 60151.0\n", + "REG18D 223947.0\n", + "REG18R 57322.0\n", + "dtype: float64" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "89074ca5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798.0\n", + "AG18R 64532.0\n", + "SOS18D 232255.0\n", + "SOS18R 67147.0\n", + "TRE18D 230382.0\n", + "TRE18R 66728.0\n", + "GOV18D 238762.0\n", + "GOV18R 60151.0\n", + "REG18D 223947.0\n", + "REG18R 57322.0\n", + "dtype: float64" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "03e0cb1a", + "metadata": {}, + "source": [ + "### Success! Now we can save these shapefiles for later use:" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "dfab9d9e", + "metadata": {}, + "outputs": [], + "source": [ + "# blocks_df.to_file(\"./Shapefiles/DenverCo_blocks_with_data/DenverCo_blocks_with_data.shp\")\n", + "# precincts2020_df.to_file(\"./Shapefiles/DenverCo_precincts2020_with_data/DenverCo_precincts2020_with_data.shp\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11ab3377", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "c1d501b5", + "metadata": {}, + "source": [ + "### Now let's talk about potential problems! \n", + "### We started with \"cleaned-up\" versions of the precinct shapefiles. The \"doctor\" function is used to evaluate shapefiles for topological problems such as gaps and overlaps." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "01e1d0e2", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 962.45it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 10 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2020_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "f40b7134", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|███████████████████████████████████████| 356/356 [00:00<00:00, 1036.65it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 10 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2018_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "40f79791", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|███████████████████████████████████████| 346/346 [00:00<00:00, 1024.45it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 10 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2016_df)" + ] + }, + { + "cell_type": "markdown", + "id": "97281549", + "metadata": {}, + "source": [ + "### These holes are actually \"real\" because Denver County is not simply connected; there are \"islands\" that belong to Arapahoe County. So these holes are not indicative of problems." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "f7fb0e39", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "blocks_df.plot()" + ] + }, + { + "cell_type": "markdown", + "id": "25b38d08", + "metadata": {}, + "source": [ + "### But here are the \"original\" precinct files, extracted from statewide Colorado precinct files.\n", + "### (Sources: 2016 precinct file was compiled by VEST; 2018 precinct file was compiled by Haley Colgate with assistance from Todd Blees of the Colorado State Demographer's office, and 2020 file was compiled by Louis Pino of the Colorado State Legislative staff.)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "8c6da770", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2016_orig_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shp\")\n", + "precincts2018_orig_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shp\")\n", + "precincts2020_orig_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shp\")" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "6d4973d1", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 826.87it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 27 overlaps.\n", + "There are 33 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2020_orig_df)" + ] + }, + { + "cell_type": "markdown", + "id": "57ae1c50", + "metadata": {}, + "source": [ + "### When we assigned blocks to precincts above, every block was assigned to a precinct; we can confirm this by checking for unassigned blocks:" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "8cb2a3ee", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0\n", + "0\n", + "0\n" + ] + } + ], + "source": [ + "print(len(blocks_df[blocks_to_precincts2020_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2018_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2016_assignment.isna()]))" + ] + }, + { + "cell_type": "markdown", + "id": "5726be17", + "metadata": {}, + "source": [ + "### But what if we assign blocks to the original versions?" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "812cf3ef", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 768.50it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 358.08it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 832.96it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:01<00:00, 302.34it/s]\n", + "100%|████████████████████████████████████████| 346/346 [00:00<00:00, 379.30it/s]\n", + "100%|███████████████████████████████████████| 346/346 [00:00<00:00, 1501.18it/s]\n" + ] + } + ], + "source": [ + "blocks_to_precincts2020_orig_assignment = maup.assign(blocks_df.geometry, precincts2020_orig_df.geometry)\n", + "blocks_to_precincts2018_orig_assignment = maup.assign(blocks_df.geometry, precincts2018_orig_df.geometry)\n", + "blocks_to_precincts2016_orig_assignment = maup.assign(blocks_df.geometry, precincts2016_orig_df.geometry)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "bbf676c1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n", + "1\n", + "5\n" + ] + } + ], + "source": [ + "print(len(blocks_df[blocks_to_precincts2020_orig_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2018_orig_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2016_orig_assignment.isna()]))" + ] + }, + { + "cell_type": "markdown", + "id": "72389fe9", + "metadata": {}, + "source": [ + "### So they all missed a few!" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "91f7c456", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20RgeometryPRES16DPRES16RSEN16DSEN16RAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RPRES20DSEN20DPRES20RSEN20R
2896080319800011037080319800011037Block 103717331733G5040S211325620+39.8260443-104.6282154000000000000000000000000000000000000000000000000POLYGON ((3244624.987 1726684.593, 3244656.485...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
6683080319800011039080319800011039Block 103917331733G5040S705520+39.8245484-104.6206486000000000000000000000000000000000000000000000000POLYGON ((3246737.729 1727192.367, 3247192.322...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
8656080319800011038080319800011038Block 103817331733G5040S2373580+39.8247377-104.6244984000000000000000000000000000000000000000000000000POLYGON ((3244825.443 1726402.364, 3244867.095...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "2896 08 031 980001 1037 080319800011037 Block 1037 \n", + "6683 08 031 980001 1039 080319800011039 Block 1039 \n", + "8656 08 031 980001 1038 080319800011038 Block 1038 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "2896 1 7 33 1 7 33 G5040 S \n", + "6683 1 7 33 1 7 33 G5040 S \n", + "8656 1 7 33 1 7 33 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "2896 2113 25620 +39.8260443 -104.6282154 0 0 \n", + "6683 70552 0 +39.8245484 -104.6206486 0 0 \n", + "8656 237358 0 +39.8247377 -104.6244984 0 0 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "2896 0 0 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 0 0 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "2896 0 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "2896 0 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 0 \n", + "\n", + " geometry PRES16D PRES16R \\\n", + "2896 POLYGON ((3244624.987 1726684.593, 3244656.485... 0.0 0.0 \n", + "6683 POLYGON ((3246737.729 1727192.367, 3247192.322... 0.0 0.0 \n", + "8656 POLYGON ((3244825.443 1726402.364, 3244867.095... 0.0 0.0 \n", + "\n", + " SEN16D SEN16R AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "2896 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "6683 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8656 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "\n", + " GOV18R REG18D REG18R PRES20D SEN20D PRES20R SEN20R \n", + "2896 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "6683 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8656 0.0 0.0 0.0 0.0 0.0 0.0 0.0 " + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[blocks_to_precincts2020_orig_assignment.isna()]" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "4689ef17", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20RgeometryPRES16DPRES16RSEN16DSEN16RAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RPRES20DSEN20DPRES20RSEN20R
9812080310083882009080310083882009Block 200917331733G5040S16110+39.7728637-104.8004050000000000000000000000000000000000000000000000000POLYGON ((3196304.708 1707166.383, 3197014.777...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "9812 08 031 008388 2009 080310083882009 Block 2009 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "9812 1 7 33 1 7 33 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "9812 1611 0 +39.7728637 -104.8004050 0 0 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "9812 0 0 0 0 0 0 0 0 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " geometry PRES16D PRES16R \\\n", + "9812 POLYGON ((3196304.708 1707166.383, 3197014.777... 0.0 0.0 \n", + "\n", + " SEN16D SEN16R AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "\n", + " GOV18R REG18D REG18R PRES20D SEN20D PRES20R SEN20R \n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 " + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[blocks_to_precincts2018_orig_assignment.isna()]" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "96ab7a1e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20RgeometryPRES16DPRES16RSEN16DSEN16RAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RPRES20DSEN20DPRES20RSEN20R
1997080310083887008080310083887008Block 700817331733G5040S102390+39.7729509-104.8078405000000000000000000000000000000000000000000000000POLYGON ((3193984.432 1707215.315, 3193983.602...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
2592080310083887007080310083887007Block 700717331733G5040S14660+39.7731435-104.8088405000000000000000000000000000000000000000000000000POLYGON ((3194032.231 1707250.650, 3194531.988...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
2729080310156002004080310156002004Block 200412321132G5040S30492100+39.6658476-105.0041615000000000000000000000000000000000000000000000000POLYGON ((3139496.394 1667748.388, 3139520.253...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
8262080310083887009080310083887009Block 700917331733G5040S62600+39.7729837-104.8051077000000000000000000000000000000000000000000000000POLYGON ((3194531.881 1707232.230, 3194531.988...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
9812080310083882009080310083882009Block 200917331733G5040S16110+39.7728637-104.8004050000000000000000000000000000000000000000000000000POLYGON ((3196304.708 1707166.383, 3197014.777...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "1997 08 031 008388 7008 080310083887008 Block 7008 \n", + "2592 08 031 008388 7007 080310083887007 Block 7007 \n", + "2729 08 031 015600 2004 080310156002004 Block 2004 \n", + "8262 08 031 008388 7009 080310083887009 Block 7009 \n", + "9812 08 031 008388 2009 080310083882009 Block 2009 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "1997 1 7 33 1 7 33 G5040 S \n", + "2592 1 7 33 1 7 33 G5040 S \n", + "2729 1 2 32 1 1 32 G5040 S \n", + "8262 1 7 33 1 7 33 G5040 S \n", + "9812 1 7 33 1 7 33 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "1997 10239 0 +39.7729509 -104.8078405 0 0 \n", + "2592 1466 0 +39.7731435 -104.8088405 0 0 \n", + "2729 3049 2100 +39.6658476 -105.0041615 0 0 \n", + "8262 6260 0 +39.7729837 -104.8051077 0 0 \n", + "9812 1611 0 +39.7728637 -104.8004050 0 0 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "1997 0 0 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 0 0 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "1997 0 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "1997 0 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " geometry PRES16D PRES16R \\\n", + "1997 POLYGON ((3193984.432 1707215.315, 3193983.602... 0.0 0.0 \n", + "2592 POLYGON ((3194032.231 1707250.650, 3194531.988... 0.0 0.0 \n", + "2729 POLYGON ((3139496.394 1667748.388, 3139520.253... 0.0 0.0 \n", + "8262 POLYGON ((3194531.881 1707232.230, 3194531.988... 0.0 0.0 \n", + "9812 POLYGON ((3196304.708 1707166.383, 3197014.777... 0.0 0.0 \n", + "\n", + " SEN16D SEN16R AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "1997 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2592 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2729 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8262 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "\n", + " GOV18R REG18D REG18R PRES20D SEN20D PRES20R SEN20R \n", + "1997 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2592 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2729 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8262 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 " + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[blocks_to_precincts2016_orig_assignment.isna()]" + ] + }, + { + "cell_type": "markdown", + "id": "220f4948", + "metadata": {}, + "source": [ + "### In this case the unassigned blocks have zero population and zero votes, so they wouldn't affect the aggregation/disaggregation of data - but this isn't always the case." + ] + }, + { + "cell_type": "markdown", + "id": "8e731eaf", + "metadata": {}, + "source": [ + "### MORAL: Shapefiles often come with significant topological problems that can affect data transfer in important ways! These problems should be diagnosed and repaired to the greatest extent possible prior to moving data around. \n", + "### For details about how Maup can repair these problems, see the \"Maup smart_repair demo\" notebook!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "324077ca", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12fe456f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a2ae944", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37a767b0", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ebcc0a9e", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/Maup data management demo.ipynb b/notebooks/Maup data management demo.ipynb new file mode 100644 index 0000000..63cd062 --- /dev/null +++ b/notebooks/Maup data management demo.ipynb @@ -0,0 +1,5636 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1372272b", + "metadata": {}, + "source": [ + "### Demo notebook for data management using Maup, based on Denver County, CO" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "44231122", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import geopandas as gpd\n", + "import maup\n", + "\n", + "maup.progress.enabled = True\n", + "\n", + "pd.options.mode.chained_assignment = None\n", + "pd.set_option('display.max_columns', None)" + ] + }, + { + "cell_type": "markdown", + "id": "ccae0af8", + "metadata": {}, + "source": [ + "### Goal: Add population data and election data from 2016 and 2018 to 2020 precincts." + ] + }, + { + "cell_type": "markdown", + "id": "bc693ada", + "metadata": {}, + "source": [ + "### Here are the shapefiles that we'll need:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "2a4e791a", + "metadata": {}, + "outputs": [], + "source": [ + "blocks_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_blocks/DenverCo_blocks.shp\")\n", + "precincts2016_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2016_repaired/DenverCo_precincts2016_repaired.shp\")\n", + "precincts2018_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2018_repaired/DenverCo_precincts2018_repaired.shp\")\n", + "precincts2020_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2020_repaired/DenverCo_precincts2020_repaired.shp\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "b614fcef", + "metadata": {}, + "source": [ + "### Take a look at what information each of these shapefiles contains:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9d840231", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20Rgeometry
0080310005013004080310005013004Block 300414341434G5040S113600+39.7445040-105.0362730000000000000000000000000000000000000000000000000POLYGON ((3130205.522 1696255.187, 3130218.309...
1080310043034019080310043034019Block 401916311231G5040S173060+39.7138779-104.932222733290110027250010002422100100240100023372500100029011002POLYGON ((3159569.782 1685233.398, 3159568.175...
2080310055031008080310055031008Block 100811161126G5040S171200+39.6308773-105.029607138250140176220140143152101401430000033862201401425014017POLYGON ((3132214.313 1655090.209, 3132219.592...
3080310032041004080310032041004Block 100418311631G5040S156480+39.7376302-104.96899931088810007121685100006101168510007830000761081685100006881000712POLYGON ((3149182.154 1694282.223, 3149242.061...
4080310031011022080310031011022Block 102218331831G5040S147590+39.7475700-104.960387338370000011360000012912800000110000003813600000137000001POLYGON ((3151562.306 1697493.116, 3151561.779...
............................................................................................................................................................................................................
10144080310069032005080310069032005Block 200519311932G5040S186260+39.6706226-104.911557251300110613113000001945112601106110011054511130000019300110613POLYGON ((3165300.448 1669539.900, 3165299.396...
10145080319800011033080319800011033Block 103317331733G5040S118600+39.8255061-104.7465304000000000000000000000000000000000000000000000000POLYGON ((3211448.603 1727256.205, 3211526.127...
10146080319800011024080319800011024Block 102417331733G5040S1696250+39.8342487-104.7591599000000000000000000000000000000000000000000000000POLYGON ((3204485.897 1729532.389, 3204502.996...
10147080310004013008080310004013008Block 300814341434G5040S176610+39.7711189-105.019748372690000123680000015825600000210000117236800000169000012POLYGON ((3134776.811 1706335.477, 3134790.024...
10148080310009031012080310009031012Block 101214341434G5040S185850+39.7189546-105.02940608320034033235418034013563414032018192000032208455180340132013403323POLYGON ((3132206.908 1687391.415, 3132550.655...
\n", + "

10149 rows × 67 columns

\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "0 08 031 000501 3004 080310005013004 Block 3004 \n", + "1 08 031 004303 4019 080310043034019 Block 4019 \n", + "2 08 031 005503 1008 080310055031008 Block 1008 \n", + "3 08 031 003204 1004 080310032041004 Block 1004 \n", + "4 08 031 003101 1022 080310031011022 Block 1022 \n", + "... ... ... ... ... ... ... \n", + "10144 08 031 006903 2005 080310069032005 Block 2005 \n", + "10145 08 031 980001 1033 080319800011033 Block 1033 \n", + "10146 08 031 980001 1024 080319800011024 Block 1024 \n", + "10147 08 031 000401 3008 080310004013008 Block 3008 \n", + "10148 08 031 000903 1012 080310009031012 Block 1012 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "0 1 4 34 1 4 34 G5040 S \n", + "1 1 6 31 1 2 31 G5040 S \n", + "2 1 1 16 1 1 26 G5040 S \n", + "3 1 8 31 1 6 31 G5040 S \n", + "4 1 8 33 1 8 31 G5040 S \n", + "... ... ... ... ... ... ... ... ... \n", + "10144 1 9 31 1 9 32 G5040 S \n", + "10145 1 7 33 1 7 33 G5040 S \n", + "10146 1 7 33 1 7 33 G5040 S \n", + "10147 1 4 34 1 4 34 G5040 S \n", + "10148 1 4 34 1 4 34 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "0 11360 0 +39.7445040 -105.0362730 0 0 \n", + "1 17306 0 +39.7138779 -104.9322227 33 29 \n", + "2 17120 0 +39.6308773 -105.0296071 38 25 \n", + "3 15648 0 +39.7376302 -104.9689993 108 88 \n", + "4 14759 0 +39.7475700 -104.9603873 38 37 \n", + "... ... ... ... ... ... ... \n", + "10144 18626 0 +39.6706226 -104.9115572 51 30 \n", + "10145 11860 0 +39.8255061 -104.7465304 0 0 \n", + "10146 169625 0 +39.8342487 -104.7591599 0 0 \n", + "10147 17661 0 +39.7711189 -105.0197483 72 69 \n", + "10148 18585 0 +39.7189546 -105.0294060 83 20 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "0 0 0 0 0 0 0 0 0 \n", + "1 0 1 1 0 0 2 7 25 \n", + "2 0 1 4 0 1 7 6 22 \n", + "3 1 0 0 0 7 12 16 85 \n", + "4 0 0 0 0 0 1 1 36 \n", + "... ... ... ... ... ... ... ... ... \n", + "10144 0 1 1 0 6 13 11 30 \n", + "10145 0 0 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 0 0 \n", + "10147 0 0 0 0 1 2 3 68 \n", + "10148 0 3 4 0 33 23 54 18 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 0 1 0 0 0 \n", + "2 0 1 4 0 1 4 \n", + "3 1 0 0 0 0 6 \n", + "4 0 0 0 0 0 1 \n", + "... ... ... ... ... ... ... \n", + "10144 0 0 0 0 1 9 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 0 0 0 0 1 \n", + "10148 0 3 4 0 1 3 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "0 0 0 0 0 0 0 0 \n", + "1 24 2 21 0 0 1 0 \n", + "2 31 5 21 0 1 4 0 \n", + "3 101 16 85 1 0 0 0 \n", + "4 29 1 28 0 0 0 0 \n", + "... ... ... ... ... ... ... ... \n", + "10144 45 11 26 0 1 1 0 \n", + "10145 0 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 0 \n", + "10147 58 2 56 0 0 0 0 \n", + "10148 56 34 14 0 3 2 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 2 4 0 1 0 \n", + "2 1 4 3 0 0 0 \n", + "3 7 8 3 0 0 0 \n", + "4 0 1 1 0 0 0 \n", + "... ... ... ... ... ... ... \n", + "10144 6 11 0 0 1 1 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 2 1 0 0 0 \n", + "10148 18 19 2 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 0 2 33 7 25 \n", + "2 0 0 3 38 6 22 \n", + "3 0 7 6 108 16 85 \n", + "4 0 0 0 38 1 36 \n", + "... ... ... ... ... ... ... \n", + "10144 0 5 4 51 11 30 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 1 1 72 3 68 \n", + "10148 0 32 20 84 55 18 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "0 0 0 0 0 0 0 \n", + "1 0 0 1 0 0 0 \n", + "2 0 1 4 0 1 4 \n", + "3 1 0 0 0 0 6 \n", + "4 0 0 0 0 0 1 \n", + "... ... ... ... ... ... ... \n", + "10144 0 0 0 0 1 9 \n", + "10145 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 \n", + "10147 0 0 0 0 0 1 \n", + "10148 0 3 4 0 1 3 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "0 0 0 0 0 0 0 0 \n", + "1 29 0 1 1 0 0 2 \n", + "2 25 0 1 4 0 1 7 \n", + "3 88 1 0 0 0 7 12 \n", + "4 37 0 0 0 0 0 1 \n", + "... ... ... ... ... ... ... ... \n", + "10144 30 0 1 1 0 6 13 \n", + "10145 0 0 0 0 0 0 0 \n", + "10146 0 0 0 0 0 0 0 \n", + "10147 69 0 0 0 0 1 2 \n", + "10148 20 1 3 4 0 33 23 \n", + "\n", + " geometry \n", + "0 POLYGON ((3130205.522 1696255.187, 3130218.309... \n", + "1 POLYGON ((3159569.782 1685233.398, 3159568.175... \n", + "2 POLYGON ((3132214.313 1655090.209, 3132219.592... \n", + "3 POLYGON ((3149182.154 1694282.223, 3149242.061... \n", + "4 POLYGON ((3151562.306 1697493.116, 3151561.779... \n", + "... ... \n", + "10144 POLYGON ((3165300.448 1669539.900, 3165299.396... \n", + "10145 POLYGON ((3211448.603 1727256.205, 3211526.127... \n", + "10146 POLYGON ((3204485.897 1729532.389, 3204502.996... \n", + "10147 POLYGON ((3134776.811 1706335.477, 3134790.024... \n", + "10148 POLYGON ((3132206.908 1687391.415, 3132550.655... \n", + "\n", + "[10149 rows x 67 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "22521abd", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
COUNTYFPNAMELSADG16PREDCliG16PRERTruG16PRELJohG16PREGSteG16PREIMcMG16PREOthG16USSDBenG16USSRGleG16USSLWilG16USSGMenG16USSOthgeometry
0031Denver 10172316334211973016426179POLYGON ((3125703.139 1681147.799, 3125702.944...
1031Denver 10275716623257773817428149POLYGON ((3129675.455 1682118.384, 3129674.269...
2031Denver 103678241442241470322734189POLYGON ((3129642.865 1679001.901, 3129708.711...
3031Denver 20367711433102565114320152POLYGON ((3146588.324 1694971.352, 3146587.373...
4031Denver 2089868859384292210640545POLYGON ((3146274.972 1693705.886, 3146340.216...
.............................................
341031Denver 916929449722381092648349167POLYGON ((3164780.350 1664424.144, 3165010.513...
342031Denver 92751628534912652130620121POLYGON ((3163781.804 1663194.808, 3163746.863...
343031Denver 9237393365525316732348561412POLYGON ((3173239.768 1665060.235, 3173239.714...
344031Denver 9375523354587355335519102POLYGON ((3173156.086 1655193.513, 3173161.344...
345031Denver 9334032222811834082311280POLYGON ((3169603.158 1661791.346, 3169558.084...
\n", + "

346 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " COUNTYFP NAMELSAD G16PREDCli G16PRERTru G16PRELJoh G16PREGSte \\\n", + "0 031 Denver 101 723 163 34 21 \n", + "1 031 Denver 102 757 166 23 25 \n", + "2 031 Denver 103 678 241 44 22 \n", + "3 031 Denver 203 677 114 33 10 \n", + "4 031 Denver 208 986 88 59 38 \n", + ".. ... ... ... ... ... ... \n", + "341 031 Denver 916 929 449 72 23 \n", + "342 031 Denver 927 516 285 34 9 \n", + "343 031 Denver 923 739 336 55 25 \n", + "344 031 Denver 937 552 335 45 8 \n", + "345 031 Denver 933 403 222 28 11 \n", + "\n", + " G16PREIMcM G16PREOth G16USSDBen G16USSRGle G16USSLWil G16USSGMen \\\n", + "0 1 9 730 164 26 17 \n", + "1 7 7 738 174 28 14 \n", + "2 4 14 703 227 34 18 \n", + "3 2 5 651 143 20 15 \n", + "4 4 2 922 106 40 54 \n", + ".. ... ... ... ... ... ... \n", + "341 8 10 926 483 49 16 \n", + "342 12 6 521 306 20 12 \n", + "343 3 16 732 348 56 14 \n", + "344 7 3 553 355 19 10 \n", + "345 8 3 408 231 12 8 \n", + "\n", + " G16USSOth geometry \n", + "0 9 POLYGON ((3125703.139 1681147.799, 3125702.944... \n", + "1 9 POLYGON ((3129675.455 1682118.384, 3129674.269... \n", + "2 9 POLYGON ((3129642.865 1679001.901, 3129708.711... \n", + "3 2 POLYGON ((3146588.324 1694971.352, 3146587.373... \n", + "4 5 POLYGON ((3146274.972 1693705.886, 3146340.216... \n", + ".. ... ... \n", + "341 7 POLYGON ((3164780.350 1664424.144, 3165010.513... \n", + "342 1 POLYGON ((3163781.804 1663194.808, 3163746.863... \n", + "343 12 POLYGON ((3173239.768 1665060.235, 3173239.714... \n", + "344 2 POLYGON ((3173156.086 1655193.513, 3173161.344... \n", + "345 0 POLYGON ((3169603.158 1661791.346, 3169558.084... \n", + "\n", + "[346 rows x 14 columns]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "8849c8cf", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
COUNTYFPVTDSTNAMECD116FPSLDUSTSLDLSTPRECIDAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RUSH18DUSH18RTOTPOPNH_WHITENH_BLACKNH_AMINNH_ASIANNH_NHPINH_OTHERNH_2MOREHISPH_WHITEH_BLACKH_AMINH_ASIANH_NHPIH_OTHERH_2MOREVAPHVAPWVAPBVAPAMINVAPASIANVAPNHPIVAPOTHERVAP2MOREVAPgeometry
0031031745Denver 7450103300713307167451084303107331610463341137272105025411212640000000000000000000000000POLYGON ((3167607.595 1714575.543, 3167607.566...
1031031540Denver 5400103400513405165407601937512157152567681967131767711790000000000000000000000000POLYGON ((3139202.439 1699577.829, 3139240.848...
2031031744Denver 74401033007133071674410003091003312994307105128097126710442619131301100028001002529128313011000POLYGON ((3180920.093 1705405.138, 3180920.088...
3031031530Denver 53001031005131051653064562864867462566517210729330082211000091171221062933008POLYGON ((3144153.912 1694345.882, 3144155.827...
4031031940Denver 940010310091310916940395267400273386271422254375247417244274258108002420000202684252108002POLYGON ((3165244.173 1654452.896, 3165241.340...
..........................................................................................................................................
351031031102Denver 102010340011340116102577134581129572135556130559126585123416943411856422053031011553307913013101152653186435355393210117POLYGON ((3133625.259 1681579.939, 3133626.217...
352031031101Denver 101010340011340116101615122619123615125613127599122634115441457866341092734357717802598441509154278521134993128792521POLYGON ((3125719.959 1678986.315, 3125719.959...
353031031924Denver 92401031009131091692446384476874738546894458864688744631586746196853620412096371555048659364184114296401856826133POLYGON ((3180600.942 1663222.850, 3180600.864...
354031031604Denver 604010330061330616604717133734126718130745121708114737116457889314081899046164108941326267050810632206877959911361944101POLYGON ((3173051.077 1693417.146, 3173051.078...
355031031746Denver 7460103300713307167464621464691464461604951234451274901270000000000000000000000000POLYGON ((3178132.638 1716671.381, 3178133.440...
\n", + "

356 rows × 45 columns

\n", + "
" + ], + "text/plain": [ + " COUNTYFP VTDST NAME CD116FP SLDUST SLDLST PRECID AG18D \\\n", + "0 031 031745 Denver 745 01 033 007 1330716745 1084 \n", + "1 031 031540 Denver 540 01 034 005 1340516540 760 \n", + "2 031 031744 Denver 744 01 033 007 1330716744 1000 \n", + "3 031 031530 Denver 530 01 031 005 1310516530 64 \n", + "4 031 031940 Denver 940 01 031 009 1310916940 395 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 031 031102 Denver 102 01 034 001 1340116102 577 \n", + "352 031 031101 Denver 101 01 034 001 1340116101 615 \n", + "353 031 031924 Denver 924 01 031 009 1310916924 463 \n", + "354 031 031604 Denver 604 01 033 006 1330616604 717 \n", + "355 031 031746 Denver 746 01 033 007 1330716746 462 \n", + "\n", + " AG18R SOS18D SOS18R TRE18D TRE18R GOV18D GOV18R REG18D REG18R \\\n", + "0 303 1073 316 1046 334 1137 272 1050 254 \n", + "1 193 751 215 715 256 768 196 713 176 \n", + "2 309 1003 312 994 307 1051 280 971 267 \n", + "3 5 62 8 64 8 67 4 62 5 \n", + "4 267 400 273 386 271 422 254 375 247 \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "351 134 581 129 572 135 556 130 559 126 \n", + "352 122 619 123 615 125 613 127 599 122 \n", + "353 84 476 87 473 85 468 94 458 86 \n", + "354 133 734 126 718 130 745 121 708 114 \n", + "355 146 469 146 446 160 495 123 445 127 \n", + "\n", + " USH18D USH18R TOTPOP NH_WHITE NH_BLACK NH_AMIN NH_ASIAN NH_NHPI \\\n", + "0 1121 264 0 0 0 0 0 0 \n", + "1 771 179 0 0 0 0 0 0 \n", + "2 1044 261 91 31 30 1 1 0 \n", + "3 66 5 172 107 29 3 3 0 \n", + "4 417 244 274 258 1 0 8 0 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 585 123 4169 434 118 56 422 0 \n", + "352 634 115 4414 578 66 34 109 2 \n", + "353 468 87 4463 1586 746 19 685 3 \n", + "354 737 116 4578 893 1408 18 990 4 \n", + "355 490 127 0 0 0 0 0 0 \n", + "\n", + " NH_OTHER NH_2MORE HISP H_WHITE H_BLACK H_AMIN H_ASIAN H_NHPI \\\n", + "0 0 0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 0 \n", + "2 0 0 28 0 0 1 0 0 \n", + "3 0 8 22 11 0 0 0 0 \n", + "4 0 2 4 2 0 0 0 0 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 5 30 3101 1553 30 79 13 0 \n", + "352 7 34 3577 1780 25 98 4 4 \n", + "353 6 204 1209 637 15 5 5 0 \n", + "354 6 164 1089 413 26 26 7 0 \n", + "355 0 0 0 0 0 0 0 0 \n", + "\n", + " H_OTHER H_2MORE VAP HVAP WVAP BVAP AMINVAP ASIANVAP NHPIVAP \\\n", + "0 0 0 0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 0 0 \n", + "2 25 2 91 28 31 30 1 1 0 \n", + "3 9 1 171 22 106 29 3 3 0 \n", + "4 2 0 268 4 252 1 0 8 0 \n", + ".. ... ... ... ... ... ... ... ... ... \n", + "351 1310 115 2653 1864 353 55 39 321 0 \n", + "352 1509 154 2785 2113 499 31 28 79 2 \n", + "353 486 59 3641 841 1429 640 18 568 2 \n", + "354 508 106 3220 687 795 991 13 619 4 \n", + "355 0 0 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP 2MOREVAP geometry \n", + "0 0 0 POLYGON ((3167607.595 1714575.543, 3167607.566... \n", + "1 0 0 POLYGON ((3139202.439 1699577.829, 3139240.848... \n", + "2 0 0 POLYGON ((3180920.093 1705405.138, 3180920.088... \n", + "3 0 8 POLYGON ((3144153.912 1694345.882, 3144155.827... \n", + "4 0 2 POLYGON ((3165244.173 1654452.896, 3165241.340... \n", + ".. ... ... ... \n", + "351 1 17 POLYGON ((3133625.259 1681579.939, 3133626.217... \n", + "352 5 21 POLYGON ((3125719.959 1678986.315, 3125719.959... \n", + "353 6 133 POLYGON ((3180600.942 1663222.850, 3180600.864... \n", + "354 4 101 POLYGON ((3173051.077 1693417.146, 3173051.078... \n", + "355 0 0 POLYGON ((3178132.638 1716671.381, 3178133.440... \n", + "\n", + "[356 rows x 45 columns]" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2018_df" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "f6af11f7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRECIDSTATEFP20COUNTYFP20NAMECD18SUD18SLD18VTDST20NOTESPRES20DSEN20DPRES20RSEN20Rgeometry
0131051653008031Denver 53013105031530None73751612POLYGON ((3144474.619 1694874.799, 3144509.314...
1131061660508031Denver 60513106031605None720702110137POLYGON ((3162200.074 1691024.061, 3162164.358...
2131061660608031Denver 60613106031606None832819107121POLYGON ((3162459.795 1693616.802, 3162459.063...
3131061660708031Denver 60713106031607None597595102113POLYGON ((3165142.137 1693683.542, 3165141.880...
4131061660908031Denver 60913106031609None2412335971POLYGON ((3152968.867 1690916.131, 3153065.627...
.............................................
351133071674608031Denver 74613307031746None17681699386469POLYGON ((3172960.862 1716278.085, 3172953.814...
352134051654008031Denver 54013405031540None15581477416489POLYGON ((3139192.261 1699587.973, 3139181.643...
353134051654108031Denver 54113405031541None15541504246297POLYGON ((3148046.593 1705187.168, 3148047.303...
354134051654208031Denver 54213405031542None881828200247POLYGON ((3133494.258 1698947.396, 3133494.426...
355134051654308031Denver 54313405031543None8378369390POLYGON ((3140186.339 1689621.761, 3140186.210...
\n", + "

356 rows × 14 columns

\n", + "
" + ], + "text/plain": [ + " PRECID STATEFP20 COUNTYFP20 NAME CD18 SUD18 SLD18 VTDST20 \\\n", + "0 1310516530 08 031 Denver 530 1 31 05 031530 \n", + "1 1310616605 08 031 Denver 605 1 31 06 031605 \n", + "2 1310616606 08 031 Denver 606 1 31 06 031606 \n", + "3 1310616607 08 031 Denver 607 1 31 06 031607 \n", + "4 1310616609 08 031 Denver 609 1 31 06 031609 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 1330716746 08 031 Denver 746 1 33 07 031746 \n", + "352 1340516540 08 031 Denver 540 1 34 05 031540 \n", + "353 1340516541 08 031 Denver 541 1 34 05 031541 \n", + "354 1340516542 08 031 Denver 542 1 34 05 031542 \n", + "355 1340516543 08 031 Denver 543 1 34 05 031543 \n", + "\n", + " NOTES PRES20D SEN20D PRES20R SEN20R \\\n", + "0 None 73 75 16 12 \n", + "1 None 720 702 110 137 \n", + "2 None 832 819 107 121 \n", + "3 None 597 595 102 113 \n", + "4 None 241 233 59 71 \n", + ".. ... ... ... ... ... \n", + "351 None 1768 1699 386 469 \n", + "352 None 1558 1477 416 489 \n", + "353 None 1554 1504 246 297 \n", + "354 None 881 828 200 247 \n", + "355 None 837 836 93 90 \n", + "\n", + " geometry \n", + "0 POLYGON ((3144474.619 1694874.799, 3144509.314... \n", + "1 POLYGON ((3162200.074 1691024.061, 3162164.358... \n", + "2 POLYGON ((3162459.795 1693616.802, 3162459.063... \n", + "3 POLYGON ((3165142.137 1693683.542, 3165141.880... \n", + "4 POLYGON ((3152968.867 1690916.131, 3153065.627... \n", + ".. ... \n", + "351 POLYGON ((3172960.862 1716278.085, 3172953.814... \n", + "352 POLYGON ((3139192.261 1699587.973, 3139181.643... \n", + "353 POLYGON ((3148046.593 1705187.168, 3148047.303... \n", + "354 POLYGON ((3133494.258 1698947.396, 3133494.426... \n", + "355 POLYGON ((3140186.339 1689621.761, 3140186.210... \n", + "\n", + "[356 rows x 14 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df" + ] + }, + { + "cell_type": "markdown", + "id": "b93f583a", + "metadata": {}, + "source": [ + "### So the blocks file has lots of population data and the precinct files each have election data for one year.\n", + "### It might be convenient to rename some of the election columns in the 2016 file so that they have the same format as the other years." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "ad1aa151", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['COUNTYFP', 'NAMELSAD', 'G16PREDCli', 'G16PRERTru', 'G16PRELJoh',\n", + " 'G16PREGSte', 'G16PREIMcM', 'G16PREOth', 'G16USSDBen', 'G16USSRGle',\n", + " 'G16USSLWil', 'G16USSGMen', 'G16USSOth', 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "4cd76b85", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2016_df = precincts2016_df.rename(columns = {\n", + " 'G16PREDCli': 'PRES16D',\n", + " 'G16PRERTru': 'PRES16R',\n", + " 'G16USSDBen': 'SEN16D',\n", + " 'G16USSRGle': 'SEN16R'\n", + "})" + ] + }, + { + "cell_type": "markdown", + "id": "6118e4a4", + "metadata": {}, + "source": [ + "### In order to move all this data around, we'll need assignments of blocks to precincts for each of the precinct files." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "b226cfc9", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 786.59it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 371.12it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 892.50it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:01<00:00, 347.47it/s]\n", + "100%|████████████████████████████████████████| 346/346 [00:00<00:00, 725.00it/s]\n", + "100%|████████████████████████████████████████| 346/346 [00:01<00:00, 311.78it/s]\n" + ] + } + ], + "source": [ + "blocks_to_precincts2020_assignment = maup.assign(blocks_df.geometry, precincts2020_df.geometry)\n", + "blocks_to_precincts2018_assignment = maup.assign(blocks_df.geometry, precincts2018_df.geometry)\n", + "blocks_to_precincts2016_assignment = maup.assign(blocks_df.geometry, precincts2016_df.geometry)" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "cdcaf044", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 300\n", + "1 56\n", + "2 41\n", + "3 73\n", + "4 262\n", + " ... \n", + "10144 96\n", + "10145 234\n", + "10146 234\n", + "10147 292\n", + "10148 313\n", + "Length: 10149, dtype: int64" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_to_precincts2020_assignment" + ] + }, + { + "cell_type": "markdown", + "id": "fbed2841", + "metadata": {}, + "source": [ + "### First step: Aggregate population data from blocks to 2020 precincts.\n", + "### (We'll just use a few of the population columns for this demo.)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "61cf6d7e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['STATEFP20', 'COUNTYFP20', 'TRACTCE20', 'BLOCKCE20', 'GEOID20',\n", + " 'NAME20', 'CD116', 'SLDL20', 'SLDU20', 'CD118', 'SLDL22', 'SLDU22',\n", + " 'MTFCC20', 'FUNCSTAT20', 'ALAND20', 'AWATER20', 'INTPTLAT20',\n", + " 'INTPTLON20', 'TOTPOP20', 'WHITE20', 'BLACK20', 'AMIN20', 'ASIAN20',\n", + " 'NHPI20', 'OTHER20', '2MORE20', 'HISP20', 'NH_WHITE20', 'NH_BLACK20',\n", + " 'NH_AMIN20', 'NH_ASIAN20', 'NH_NHPI20', 'NH_OTHER20', 'NH_2MORE20',\n", + " 'VAP20', 'HVAP20', 'WVAP20', 'BVAP20', 'AMINVAP20', 'ASIANVAP20',\n", + " 'NHPIVAP20', 'OTHERVAP20', '2MOREVAP20', 'H_WHITE20', 'H_BLACK20',\n", + " 'H_AMIN20', 'H_ASIAN20', 'H_NHPI20', 'H_OTHER20', 'H_2MORE20',\n", + " 'TOTPOP20R', 'HISP20R', 'NHWHITE20R', 'NHBLACK20R', 'NHAMIN20R',\n", + " 'NHASIAN20R', 'NH_NHPI20R', 'NHOTHER20R', 'NH2MORE20R', 'WHITE20R',\n", + " 'BLACK20R', 'AMIN20R', 'ASIAN20R', 'NHPI20R', 'OTHER20R', '2MORE20R',\n", + " 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "d1bf7b5e", + "metadata": {}, + "outputs": [], + "source": [ + "pop_cols = ['TOTPOP20', 'VAP20']" + ] + }, + { + "cell_type": "markdown", + "id": "5fd55604", + "metadata": {}, + "source": [ + "### We can use the assignment of blocks to precincts to aggregate populations from blocks up to precincts:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "5da80e36", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2020_df[pop_cols] = blocks_df[pop_cols].groupby(blocks_to_precincts2020_assignment).sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "9905a96f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TOTPOP20VAP20
0171171
11132930
212071047
3926738
4371283
\n", + "
" + ], + "text/plain": [ + " TOTPOP20 VAP20\n", + "0 171 171\n", + "1 1132 930\n", + "2 1207 1047\n", + "3 926 738\n", + "4 371 283" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[pop_cols].head()" + ] + }, + { + "cell_type": "markdown", + "id": "36a553ac", + "metadata": {}, + "source": [ + "### Check that we didn't gain/lose any population in the aggregation step:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "4d6bbc16", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "TOTPOP20 715522\n", + "VAP20 581062\n", + "dtype: int64" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[pop_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "e575bd32", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "TOTPOP20 715522\n", + "VAP20 581062\n", + "dtype: int64" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[pop_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "1d9a3f5d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRECIDSTATEFP20COUNTYFP20NAMECD18SUD18SLD18VTDST20NOTESPRES20DSEN20DPRES20RSEN20RgeometryTOTPOP20VAP20
0131051653008031Denver 53013105031530None73751612POLYGON ((3144474.619 1694874.799, 3144509.314...171171
1131061660508031Denver 60513106031605None720702110137POLYGON ((3162200.074 1691024.061, 3162164.358...1132930
2131061660608031Denver 60613106031606None832819107121POLYGON ((3162459.795 1693616.802, 3162459.063...12071047
3131061660708031Denver 60713106031607None597595102113POLYGON ((3165142.137 1693683.542, 3165141.880...926738
4131061660908031Denver 60913106031609None2412335971POLYGON ((3152968.867 1690916.131, 3153065.627...371283
...................................................
351133071674608031Denver 74613307031746None17681699386469POLYGON ((3172960.862 1716278.085, 3172953.814...32842196
352134051654008031Denver 54013405031540None15581477416489POLYGON ((3139192.261 1699587.973, 3139181.643...33663268
353134051654108031Denver 54113405031541None15541504246297POLYGON ((3148046.593 1705187.168, 3148047.303...27742643
354134051654208031Denver 54213405031542None881828200247POLYGON ((3133494.258 1698947.396, 3133494.426...19941830
355134051654308031Denver 54313405031543None8378369390POLYGON ((3140186.339 1689621.761, 3140186.210...16781367
\n", + "

356 rows × 16 columns

\n", + "
" + ], + "text/plain": [ + " PRECID STATEFP20 COUNTYFP20 NAME CD18 SUD18 SLD18 VTDST20 \\\n", + "0 1310516530 08 031 Denver 530 1 31 05 031530 \n", + "1 1310616605 08 031 Denver 605 1 31 06 031605 \n", + "2 1310616606 08 031 Denver 606 1 31 06 031606 \n", + "3 1310616607 08 031 Denver 607 1 31 06 031607 \n", + "4 1310616609 08 031 Denver 609 1 31 06 031609 \n", + ".. ... ... ... ... ... ... ... ... \n", + "351 1330716746 08 031 Denver 746 1 33 07 031746 \n", + "352 1340516540 08 031 Denver 540 1 34 05 031540 \n", + "353 1340516541 08 031 Denver 541 1 34 05 031541 \n", + "354 1340516542 08 031 Denver 542 1 34 05 031542 \n", + "355 1340516543 08 031 Denver 543 1 34 05 031543 \n", + "\n", + " NOTES PRES20D SEN20D PRES20R SEN20R \\\n", + "0 None 73 75 16 12 \n", + "1 None 720 702 110 137 \n", + "2 None 832 819 107 121 \n", + "3 None 597 595 102 113 \n", + "4 None 241 233 59 71 \n", + ".. ... ... ... ... ... \n", + "351 None 1768 1699 386 469 \n", + "352 None 1558 1477 416 489 \n", + "353 None 1554 1504 246 297 \n", + "354 None 881 828 200 247 \n", + "355 None 837 836 93 90 \n", + "\n", + " geometry TOTPOP20 VAP20 \n", + "0 POLYGON ((3144474.619 1694874.799, 3144509.314... 171 171 \n", + "1 POLYGON ((3162200.074 1691024.061, 3162164.358... 1132 930 \n", + "2 POLYGON ((3162459.795 1693616.802, 3162459.063... 1207 1047 \n", + "3 POLYGON ((3165142.137 1693683.542, 3165141.880... 926 738 \n", + "4 POLYGON ((3152968.867 1690916.131, 3153065.627... 371 283 \n", + ".. ... ... ... \n", + "351 POLYGON ((3172960.862 1716278.085, 3172953.814... 3284 2196 \n", + "352 POLYGON ((3139192.261 1699587.973, 3139181.643... 3366 3268 \n", + "353 POLYGON ((3148046.593 1705187.168, 3148047.303... 2774 2643 \n", + "354 POLYGON ((3133494.258 1698947.396, 3133494.426... 1994 1830 \n", + "355 POLYGON ((3140186.339 1689621.761, 3140186.210... 1678 1367 \n", + "\n", + "[356 rows x 16 columns]" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df" + ] + }, + { + "cell_type": "markdown", + "id": "35bf6d3f", + "metadata": {}, + "source": [ + "### Next step: Disaggregate votes from all three precinct files to blocks, using Voting Age Population (VAP20) for the weights. " + ] + }, + { + "cell_type": "markdown", + "id": "c78b0a30", + "metadata": {}, + "source": [ + "### 2016:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "c3c840ad", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['COUNTYFP', 'NAMELSAD', 'PRES16D', 'PRES16R', 'G16PRELJoh',\n", + " 'G16PREGSte', 'G16PREIMcM', 'G16PREOth', 'SEN16D', 'SEN16R',\n", + " 'G16USSLWil', 'G16USSGMen', 'G16USSOth', 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "34f07c2b", + "metadata": {}, + "outputs": [], + "source": [ + "elec2016_cols = ['PRES16D', 'PRES16R', 'SEN16D', 'SEN16R']" + ] + }, + { + "cell_type": "markdown", + "id": "6b37197a", + "metadata": {}, + "source": [ + "### Now we assign a \"weight\" to each block that's equal to the fraction of the total population of ALL blocks assigned to the same precinct that's contained in that block.\n", + "### IMPORTANT: Some precincts have zero population, which leads to a zero denominator and an undefined weight for all blocks assigned to that precinct. We can solve this problem by replacing NaN values with zeros.\n", + "### Occasionally some zero-population precincts contain a small, nonzero number of votes, and these votes will be lost in the disaggregation from precincts down to blocks. " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "e4694fc7", + "metadata": {}, + "outputs": [], + "source": [ + "weights2016 = blocks_df[\"VAP20\"] / blocks_to_precincts2016_assignment.map(blocks_df[\"VAP20\"].groupby(blocks_to_precincts2016_assignment).sum())\n", + "weights2016 = weights2016.fillna(0)" + ] + }, + { + "cell_type": "markdown", + "id": "97376acf", + "metadata": {}, + "source": [ + "### Here's the disaggregation step, using the \"prorate\" function:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "3fa669eb", + "metadata": {}, + "outputs": [], + "source": [ + "prorated2016 = maup.prorate(blocks_to_precincts2016_assignment, precincts2016_df[elec2016_cols], weights2016)\n", + "blocks_df[elec2016_cols] = prorated2016" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "d4e5120a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRES16DPRES16RSEN16DSEN16R
00.0000000.0000000.0000000.000000
115.3982305.13982315.1008855.883186
210.4026858.01006711.1308728.197315
361.0108475.95728858.5457636.505085
417.1045581.57439716.4825742.313003
\n", + "
" + ], + "text/plain": [ + " PRES16D PRES16R SEN16D SEN16R\n", + "0 0.000000 0.000000 0.000000 0.000000\n", + "1 15.398230 5.139823 15.100885 5.883186\n", + "2 10.402685 8.010067 11.130872 8.197315\n", + "3 61.010847 5.957288 58.545763 6.505085\n", + "4 17.104558 1.574397 16.482574 2.313003" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2016_cols].head()" + ] + }, + { + "cell_type": "markdown", + "id": "b048f421", + "metadata": {}, + "source": [ + "### Check to see whether we gained/lost any votes:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "ede614d4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551\n", + "PRES16R 62690\n", + "SEN16D 238774\n", + "SEN16R 71078\n", + "dtype: int64" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2016_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "b1b1bc43", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551.0\n", + "PRES16R 62690.0\n", + "SEN16D 238774.0\n", + "SEN16R 71078.0\n", + "dtype: float64" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "adfe2fa6", + "metadata": {}, + "source": [ + "### 2018:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "ff59b2e4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['COUNTYFP', 'VTDST', 'NAME', 'CD116FP', 'SLDUST', 'SLDLST', 'PRECID',\n", + " 'AG18D', 'AG18R', 'SOS18D', 'SOS18R', 'TRE18D', 'TRE18R', 'GOV18D',\n", + " 'GOV18R', 'REG18D', 'REG18R', 'USH18D', 'USH18R', 'TOTPOP', 'NH_WHITE',\n", + " 'NH_BLACK', 'NH_AMIN', 'NH_ASIAN', 'NH_NHPI', 'NH_OTHER', 'NH_2MORE',\n", + " 'HISP', 'H_WHITE', 'H_BLACK', 'H_AMIN', 'H_ASIAN', 'H_NHPI', 'H_OTHER',\n", + " 'H_2MORE', 'VAP', 'HVAP', 'WVAP', 'BVAP', 'AMINVAP', 'ASIANVAP',\n", + " 'NHPIVAP', 'OTHERVAP', '2MOREVAP', 'geometry'],\n", + " dtype='object')" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2018_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "5cc8bbe8", + "metadata": {}, + "outputs": [], + "source": [ + "elec2018_cols = ['AG18D', 'AG18R', 'SOS18D', 'SOS18R', 'TRE18D', 'TRE18R', 'GOV18D', 'GOV18R', 'REG18D', 'REG18R']" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "id": "6189e09e", + "metadata": {}, + "outputs": [], + "source": [ + "weights2018 = blocks_df[\"VAP20\"] / blocks_to_precincts2018_assignment.map(blocks_df[\"VAP20\"].groupby(blocks_to_precincts2018_assignment).sum())\n", + "weights2018 = weights2018.fillna(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "ec05ed4c", + "metadata": {}, + "outputs": [], + "source": [ + "prorated2018 = maup.prorate(blocks_to_precincts2018_assignment, precincts2018_df[elec2018_cols], weights2018)\n", + "blocks_df[elec2018_cols] = prorated2018" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "339eefea", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
AG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18R
00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
114.3787616.22300913.6141597.00885013.7415936.43539814.3575226.22300913.4654875.883186
29.8617457.53154410.4442957.05302010.4234906.96979910.0697997.07382610.0281886.449664
359.7783055.27254258.0664417.12135658.8196616.43661061.1477974.38237357.5871194.519322
417.9403492.02144817.4738612.46849917.4155502.54624718.3873991.82707817.0073731.768767
\n", + "
" + ], + "text/plain": [ + " AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n", + "1 14.378761 6.223009 13.614159 7.008850 13.741593 6.435398 14.357522 \n", + "2 9.861745 7.531544 10.444295 7.053020 10.423490 6.969799 10.069799 \n", + "3 59.778305 5.272542 58.066441 7.121356 58.819661 6.436610 61.147797 \n", + "4 17.940349 2.021448 17.473861 2.468499 17.415550 2.546247 18.387399 \n", + "\n", + " GOV18R REG18D REG18R \n", + "0 0.000000 0.000000 0.000000 \n", + "1 6.223009 13.465487 5.883186 \n", + "2 7.073826 10.028188 6.449664 \n", + "3 4.382373 57.587119 4.519322 \n", + "4 1.827078 17.007373 1.768767 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2018_cols].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "244b1927", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798\n", + "AG18R 64532\n", + "SOS18D 232255\n", + "SOS18R 67147\n", + "TRE18D 230382\n", + "TRE18R 66728\n", + "GOV18D 238762\n", + "GOV18R 60151\n", + "REG18D 223947\n", + "REG18R 57322\n", + "dtype: int64" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2018_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "8d8bf1bc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798.0\n", + "AG18R 64532.0\n", + "SOS18D 232255.0\n", + "SOS18R 67147.0\n", + "TRE18D 230382.0\n", + "TRE18R 66728.0\n", + "GOV18D 238762.0\n", + "GOV18R 60151.0\n", + "REG18D 223947.0\n", + "REG18R 57322.0\n", + "dtype: float64" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "45e4bc1b", + "metadata": {}, + "source": [ + "### If the goal is just to put all the data on 2020 precincts, we don't really have to disaggregate 2020 election data to blocks - but we might also want all the election data on blocks, so we'll go ahead and do it for completeness." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "b75e9d3c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['PRECID', 'STATEFP20', 'COUNTYFP20', 'NAME', 'CD18', 'SUD18', 'SLD18',\n", + " 'VTDST20', 'NOTES', 'PRES20D', 'SEN20D', 'PRES20R', 'SEN20R',\n", + " 'geometry', 'TOTPOP20', 'VAP20'],\n", + " dtype='object')" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "fe04f13c", + "metadata": {}, + "outputs": [], + "source": [ + "elec2020_cols = ['PRES20D', 'SEN20D', 'PRES20R', 'SEN20R']" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "id": "735860ac", + "metadata": {}, + "outputs": [], + "source": [ + "weights2020 = blocks_df[\"VAP20\"] / blocks_to_precincts2020_assignment.map(blocks_df[\"VAP20\"].groupby(blocks_to_precincts2020_assignment).sum())\n", + "weights2020 = weights2020.fillna(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "1ace13c7", + "metadata": {}, + "outputs": [], + "source": [ + "prorated2020 = maup.prorate(blocks_to_precincts2020_assignment, precincts2020_df[elec2020_cols], weights2020)\n", + "blocks_df[elec2020_cols] = prorated2020" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "fdfe9942", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PRES20DSEN20DPRES20RSEN20R
00.0000000.0000000.0000000.000000
117.33097316.5663725.4371686.711504
213.89798713.6899338.4261748.863087
369.02237367.5844075.4094926.847458
421.67225220.9142091.7298932.468499
\n", + "
" + ], + "text/plain": [ + " PRES20D SEN20D PRES20R SEN20R\n", + "0 0.000000 0.000000 0.000000 0.000000\n", + "1 17.330973 16.566372 5.437168 6.711504\n", + "2 13.897987 13.689933 8.426174 8.863087\n", + "3 69.022373 67.584407 5.409492 6.847458\n", + "4 21.672252 20.914209 1.729893 2.468499" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2020_cols].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "0c02e6de", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES20D 313293\n", + "SEN20D 305602\n", + "PRES20R 71618\n", + "SEN20R 80163\n", + "dtype: int64" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[elec2020_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "530aa5d6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES20D 313293.0\n", + "SEN20D 305602.0\n", + "PRES20R 71618.0\n", + "SEN20R 80163.0\n", + "dtype: float64" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2020_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "4bfcf5fb", + "metadata": {}, + "source": [ + "### Last step: Aggregate 2016 and 2018 election data up from blocks to 2020 precincts." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "22c17f41", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2020_df[elec2016_cols] = blocks_df[elec2016_cols].groupby(blocks_to_precincts2020_assignment).sum()\n", + "precincts2020_df[elec2018_cols] = blocks_df[elec2018_cols].groupby(blocks_to_precincts2020_assignment).sum()" + ] + }, + { + "cell_type": "markdown", + "id": "df0347e2", + "metadata": {}, + "source": [ + "### Check to see whether we gained/lost any votes:" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "c5555a15", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551.0\n", + "PRES16R 62690.0\n", + "SEN16D 238774.0\n", + "SEN16R 71078.0\n", + "dtype: float64" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "4e2ce9ec", + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "PRES16D 244551.0\n", + "PRES16R 62690.0\n", + "SEN16D 238774.0\n", + "SEN16R 71078.0\n", + "dtype: float64" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[elec2016_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "c6b8d1c7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798.0\n", + "AG18R 64532.0\n", + "SOS18D 232255.0\n", + "SOS18R 67147.0\n", + "TRE18D 230382.0\n", + "TRE18R 66728.0\n", + "GOV18D 238762.0\n", + "GOV18R 60151.0\n", + "REG18D 223947.0\n", + "REG18R 57322.0\n", + "dtype: float64" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "89074ca5", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "AG18D 232798.0\n", + "AG18R 64532.0\n", + "SOS18D 232255.0\n", + "SOS18R 67147.0\n", + "TRE18D 230382.0\n", + "TRE18R 66728.0\n", + "GOV18D 238762.0\n", + "GOV18R 60151.0\n", + "REG18D 223947.0\n", + "REG18R 57322.0\n", + "dtype: float64" + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "precincts2020_df[elec2018_cols].sum()" + ] + }, + { + "cell_type": "markdown", + "id": "03e0cb1a", + "metadata": {}, + "source": [ + "### Success! Now we can save these shapefiles for later use:" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "dfab9d9e", + "metadata": {}, + "outputs": [], + "source": [ + "# blocks_df.to_file(\"./Shapefiles/DenverCo_blocks_with_data/DenverCo_blocks_with_data.shp\")\n", + "# precincts2020_df.to_file(\"./Shapefiles/DenverCo_precincts2020_with_data/DenverCo_precincts2020_with_data.shp\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "11ab3377", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "c1d501b5", + "metadata": {}, + "source": [ + "### Now let's talk about potential problems! \n", + "### We started with \"cleaned-up\" versions of the precinct shapefiles. The \"doctor\" function is used to evaluate shapefiles for topological problems such as gaps and overlaps." + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "01e1d0e2", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 962.45it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 10 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2020_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "id": "f40b7134", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|███████████████████████████████████████| 356/356 [00:00<00:00, 1036.65it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 10 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2018_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "40f79791", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|███████████████████████████████████████| 346/346 [00:00<00:00, 1024.45it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 10 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2016_df)" + ] + }, + { + "cell_type": "markdown", + "id": "97281549", + "metadata": {}, + "source": [ + "### These holes are actually \"real\" because Denver County is not simply connected; there are \"islands\" that belong to Arapahoe County. So these holes are not indicative of problems." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "f7fb0e39", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "blocks_df.plot()" + ] + }, + { + "cell_type": "markdown", + "id": "25b38d08", + "metadata": {}, + "source": [ + "### But here are the \"original\" precinct files, extracted from statewide Colorado precinct files.\n", + "### (Sources: 2016 precinct file was compiled by VEST; 2018 precinct file was compiled by Haley Colgate with assistance from Todd Blees of the Colorado State Demographer's office, and 2020 file was compiled by Louis Pino of the Colorado State Legislative staff.)" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "8c6da770", + "metadata": {}, + "outputs": [], + "source": [ + "precincts2016_orig_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2016_orig/DenverCo_precincts2016_orig.shp\")\n", + "precincts2018_orig_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2018_orig/DenverCo_precincts2018_orig.shp\")\n", + "precincts2020_orig_df = gpd.read_file(\"../examples/Shapefiles/DenverCo_precincts2020_orig/DenverCo_precincts2020_orig.shp\")" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "6d4973d1", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 826.87it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 27 overlaps.\n", + "There are 33 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(precincts2020_orig_df)" + ] + }, + { + "cell_type": "markdown", + "id": "57ae1c50", + "metadata": {}, + "source": [ + "### When we assigned blocks to precincts above, every block was assigned to a precinct; we can confirm this by checking for unassigned blocks:" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "8cb2a3ee", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0\n", + "0\n", + "0\n" + ] + } + ], + "source": [ + "print(len(blocks_df[blocks_to_precincts2020_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2018_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2016_assignment.isna()]))" + ] + }, + { + "cell_type": "markdown", + "id": "5726be17", + "metadata": {}, + "source": [ + "### But what if we assign blocks to the original versions?" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "812cf3ef", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 768.50it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 358.08it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:00<00:00, 832.96it/s]\n", + "100%|████████████████████████████████████████| 356/356 [00:01<00:00, 302.34it/s]\n", + "100%|████████████████████████████████████████| 346/346 [00:00<00:00, 379.30it/s]\n", + "100%|███████████████████████████████████████| 346/346 [00:00<00:00, 1501.18it/s]\n" + ] + } + ], + "source": [ + "blocks_to_precincts2020_orig_assignment = maup.assign(blocks_df.geometry, precincts2020_orig_df.geometry)\n", + "blocks_to_precincts2018_orig_assignment = maup.assign(blocks_df.geometry, precincts2018_orig_df.geometry)\n", + "blocks_to_precincts2016_orig_assignment = maup.assign(blocks_df.geometry, precincts2016_orig_df.geometry)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "bbf676c1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3\n", + "1\n", + "5\n" + ] + } + ], + "source": [ + "print(len(blocks_df[blocks_to_precincts2020_orig_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2018_orig_assignment.isna()]))\n", + "print(len(blocks_df[blocks_to_precincts2016_orig_assignment.isna()]))" + ] + }, + { + "cell_type": "markdown", + "id": "72389fe9", + "metadata": {}, + "source": [ + "### So they all missed a few!" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "91f7c456", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20RgeometryPRES16DPRES16RSEN16DSEN16RAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RPRES20DSEN20DPRES20RSEN20R
2896080319800011037080319800011037Block 103717331733G5040S211325620+39.8260443-104.6282154000000000000000000000000000000000000000000000000POLYGON ((3244624.987 1726684.593, 3244656.485...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
6683080319800011039080319800011039Block 103917331733G5040S705520+39.8245484-104.6206486000000000000000000000000000000000000000000000000POLYGON ((3246737.729 1727192.367, 3247192.322...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
8656080319800011038080319800011038Block 103817331733G5040S2373580+39.8247377-104.6244984000000000000000000000000000000000000000000000000POLYGON ((3244825.443 1726402.364, 3244867.095...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "2896 08 031 980001 1037 080319800011037 Block 1037 \n", + "6683 08 031 980001 1039 080319800011039 Block 1039 \n", + "8656 08 031 980001 1038 080319800011038 Block 1038 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "2896 1 7 33 1 7 33 G5040 S \n", + "6683 1 7 33 1 7 33 G5040 S \n", + "8656 1 7 33 1 7 33 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "2896 2113 25620 +39.8260443 -104.6282154 0 0 \n", + "6683 70552 0 +39.8245484 -104.6206486 0 0 \n", + "8656 237358 0 +39.8247377 -104.6244984 0 0 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "2896 0 0 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 0 0 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "2896 0 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "2896 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "2896 0 0 0 0 0 0 0 \n", + "6683 0 0 0 0 0 0 0 \n", + "8656 0 0 0 0 0 0 0 \n", + "\n", + " geometry PRES16D PRES16R \\\n", + "2896 POLYGON ((3244624.987 1726684.593, 3244656.485... 0.0 0.0 \n", + "6683 POLYGON ((3246737.729 1727192.367, 3247192.322... 0.0 0.0 \n", + "8656 POLYGON ((3244825.443 1726402.364, 3244867.095... 0.0 0.0 \n", + "\n", + " SEN16D SEN16R AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "2896 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "6683 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8656 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "\n", + " GOV18R REG18D REG18R PRES20D SEN20D PRES20R SEN20R \n", + "2896 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "6683 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8656 0.0 0.0 0.0 0.0 0.0 0.0 0.0 " + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[blocks_to_precincts2020_orig_assignment.isna()]" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "4689ef17", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20RgeometryPRES16DPRES16RSEN16DSEN16RAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RPRES20DSEN20DPRES20RSEN20R
9812080310083882009080310083882009Block 200917331733G5040S16110+39.7728637-104.8004050000000000000000000000000000000000000000000000000POLYGON ((3196304.708 1707166.383, 3197014.777...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "9812 08 031 008388 2009 080310083882009 Block 2009 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "9812 1 7 33 1 7 33 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "9812 1611 0 +39.7728637 -104.8004050 0 0 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "9812 0 0 0 0 0 0 0 0 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "9812 0 0 0 0 0 0 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " geometry PRES16D PRES16R \\\n", + "9812 POLYGON ((3196304.708 1707166.383, 3197014.777... 0.0 0.0 \n", + "\n", + " SEN16D SEN16R AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "\n", + " GOV18R REG18D REG18R PRES20D SEN20D PRES20R SEN20R \n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 " + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[blocks_to_precincts2018_orig_assignment.isna()]" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "96ab7a1e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
STATEFP20COUNTYFP20TRACTCE20BLOCKCE20GEOID20NAME20CD116SLDL20SLDU20CD118SLDL22SLDU22MTFCC20FUNCSTAT20ALAND20AWATER20INTPTLAT20INTPTLON20TOTPOP20WHITE20BLACK20AMIN20ASIAN20NHPI20OTHER202MORE20HISP20NH_WHITE20NH_BLACK20NH_AMIN20NH_ASIAN20NH_NHPI20NH_OTHER20NH_2MORE20VAP20HVAP20WVAP20BVAP20AMINVAP20ASIANVAP20NHPIVAP20OTHERVAP202MOREVAP20H_WHITE20H_BLACK20H_AMIN20H_ASIAN20H_NHPI20H_OTHER20H_2MORE20TOTPOP20RHISP20RNHWHITE20RNHBLACK20RNHAMIN20RNHASIAN20RNH_NHPI20RNHOTHER20RNH2MORE20RWHITE20RBLACK20RAMIN20RASIAN20RNHPI20ROTHER20R2MORE20RgeometryPRES16DPRES16RSEN16DSEN16RAG18DAG18RSOS18DSOS18RTRE18DTRE18RGOV18DGOV18RREG18DREG18RPRES20DSEN20DPRES20RSEN20R
1997080310083887008080310083887008Block 700817331733G5040S102390+39.7729509-104.8078405000000000000000000000000000000000000000000000000POLYGON ((3193984.432 1707215.315, 3193983.602...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
2592080310083887007080310083887007Block 700717331733G5040S14660+39.7731435-104.8088405000000000000000000000000000000000000000000000000POLYGON ((3194032.231 1707250.650, 3194531.988...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
2729080310156002004080310156002004Block 200412321132G5040S30492100+39.6658476-105.0041615000000000000000000000000000000000000000000000000POLYGON ((3139496.394 1667748.388, 3139520.253...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
8262080310083887009080310083887009Block 700917331733G5040S62600+39.7729837-104.8051077000000000000000000000000000000000000000000000000POLYGON ((3194531.881 1707232.230, 3194531.988...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
9812080310083882009080310083882009Block 200917331733G5040S16110+39.7728637-104.8004050000000000000000000000000000000000000000000000000POLYGON ((3196304.708 1707166.383, 3197014.777...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", + "
" + ], + "text/plain": [ + " STATEFP20 COUNTYFP20 TRACTCE20 BLOCKCE20 GEOID20 NAME20 \\\n", + "1997 08 031 008388 7008 080310083887008 Block 7008 \n", + "2592 08 031 008388 7007 080310083887007 Block 7007 \n", + "2729 08 031 015600 2004 080310156002004 Block 2004 \n", + "8262 08 031 008388 7009 080310083887009 Block 7009 \n", + "9812 08 031 008388 2009 080310083882009 Block 2009 \n", + "\n", + " CD116 SLDL20 SLDU20 CD118 SLDL22 SLDU22 MTFCC20 FUNCSTAT20 \\\n", + "1997 1 7 33 1 7 33 G5040 S \n", + "2592 1 7 33 1 7 33 G5040 S \n", + "2729 1 2 32 1 1 32 G5040 S \n", + "8262 1 7 33 1 7 33 G5040 S \n", + "9812 1 7 33 1 7 33 G5040 S \n", + "\n", + " ALAND20 AWATER20 INTPTLAT20 INTPTLON20 TOTPOP20 WHITE20 \\\n", + "1997 10239 0 +39.7729509 -104.8078405 0 0 \n", + "2592 1466 0 +39.7731435 -104.8088405 0 0 \n", + "2729 3049 2100 +39.6658476 -105.0041615 0 0 \n", + "8262 6260 0 +39.7729837 -104.8051077 0 0 \n", + "9812 1611 0 +39.7728637 -104.8004050 0 0 \n", + "\n", + " BLACK20 AMIN20 ASIAN20 NHPI20 OTHER20 2MORE20 HISP20 NH_WHITE20 \\\n", + "1997 0 0 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 0 0 \n", + "\n", + " NH_BLACK20 NH_AMIN20 NH_ASIAN20 NH_NHPI20 NH_OTHER20 NH_2MORE20 \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " VAP20 HVAP20 WVAP20 BVAP20 AMINVAP20 ASIANVAP20 NHPIVAP20 \\\n", + "1997 0 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " OTHERVAP20 2MOREVAP20 H_WHITE20 H_BLACK20 H_AMIN20 H_ASIAN20 \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " H_NHPI20 H_OTHER20 H_2MORE20 TOTPOP20R HISP20R NHWHITE20R \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " NHBLACK20R NHAMIN20R NHASIAN20R NH_NHPI20R NHOTHER20R NH2MORE20R \\\n", + "1997 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 \n", + "\n", + " WHITE20R BLACK20R AMIN20R ASIAN20R NHPI20R OTHER20R 2MORE20R \\\n", + "1997 0 0 0 0 0 0 0 \n", + "2592 0 0 0 0 0 0 0 \n", + "2729 0 0 0 0 0 0 0 \n", + "8262 0 0 0 0 0 0 0 \n", + "9812 0 0 0 0 0 0 0 \n", + "\n", + " geometry PRES16D PRES16R \\\n", + "1997 POLYGON ((3193984.432 1707215.315, 3193983.602... 0.0 0.0 \n", + "2592 POLYGON ((3194032.231 1707250.650, 3194531.988... 0.0 0.0 \n", + "2729 POLYGON ((3139496.394 1667748.388, 3139520.253... 0.0 0.0 \n", + "8262 POLYGON ((3194531.881 1707232.230, 3194531.988... 0.0 0.0 \n", + "9812 POLYGON ((3196304.708 1707166.383, 3197014.777... 0.0 0.0 \n", + "\n", + " SEN16D SEN16R AG18D AG18R SOS18D SOS18R TRE18D TRE18R GOV18D \\\n", + "1997 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2592 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2729 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8262 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "\n", + " GOV18R REG18D REG18R PRES20D SEN20D PRES20R SEN20R \n", + "1997 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2592 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "2729 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "8262 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", + "9812 0.0 0.0 0.0 0.0 0.0 0.0 0.0 " + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "blocks_df[blocks_to_precincts2016_orig_assignment.isna()]" + ] + }, + { + "cell_type": "markdown", + "id": "220f4948", + "metadata": {}, + "source": [ + "### In this case the unassigned blocks have zero population and zero votes, so they wouldn't affect the aggregation/disaggregation of data - but this isn't always the case." + ] + }, + { + "cell_type": "markdown", + "id": "8e731eaf", + "metadata": {}, + "source": [ + "### MORAL: Shapefiles often come with significant topological problems that can affect data transfer in important ways! These problems should be diagnosed and repaired to the greatest extent possible prior to moving data around. \n", + "### For details about how Maup can repair these problems, see the \"Maup smart_repair demo\" notebook!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "324077ca", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "12fe456f", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a2ae944", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "37a767b0", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ebcc0a9e", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/notebooks/Maup smart_repair demo.ipynb b/notebooks/Maup smart_repair demo.ipynb new file mode 100644 index 0000000..45bd408 --- /dev/null +++ b/notebooks/Maup smart_repair demo.ipynb @@ -0,0 +1,874 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "f09d728f", + "metadata": {}, + "source": [ + "### Demo notebook for Maup's smart_repair function for fixing topological problems in shapefiles" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "a60f23c3", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import geopandas as gpd\n", + "\n", + "from geopandas import GeoSeries, GeoDataFrame\n", + "\n", + "import shapely\n", + "from shapely.geometry import Polygon\n", + "\n", + "import maup\n", + "from maup import smart_repair, quick_repair\n", + "\n", + "import matplotlib.pyplot as plt\n" + ] + }, + { + "cell_type": "markdown", + "id": "6d6a30fd", + "metadata": {}, + "source": [ + "### First we'll explore a toy example of \"precincts\" and \"counties\":" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "333decdb", + "metadata": {}, + "outputs": [], + "source": [ + "import random\n", + "random.seed(2023) # For reproducibility\n", + "\n", + "ppolys = []\n", + "\n", + "for i in range(4):\n", + " for j in range(4):\n", + " poly = Polygon(\n", + " [(0.5*i + 0.1*k, 0.5*j + (random.random() - 0.5)/12) for k in range(6)] +\n", + " [(0.5*(i+1) + (random.random() - 0.5)/12, 0.5*j + 0.1*k) for k in range(1,6)] +\n", + " [(0.5*(i+1) - 0.1*k, 0.5*(j+1) + (random.random() - 0.5)/12) for k in range(1,6)] +\n", + " [(0.5*i + (random.random() - 0.5)/12, 0.5*(j+1) - 0.1*k) for k in range(1,5)]\n", + " )\n", + " ppolys.append(poly)\n", + " \n", + "toy_precincts_df = gpd.GeoDataFrame(geometry = gpd.GeoSeries(ppolys))" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b4958891", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "toy_precincts_df.plot(cmap='tab20', alpha=0.7)" + ] + }, + { + "cell_type": "markdown", + "id": "810a47d7", + "metadata": {}, + "source": [ + "### Check for gaps and overlaps:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "5db0e210", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 28 overlaps.\n", + "There are 23 holes.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(toy_precincts_df)" + ] + }, + { + "cell_type": "markdown", + "id": "88cfb3eb", + "metadata": {}, + "source": [ + "### First do a basic repair of the toy precincts:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "37602357", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Snapping all geometries to a grid with precision 10^( -10 ) to avoid GEOS errors.\n", + "Identifying overlaps...\n", + "Resolving overlaps...\n", + "Assigning order 2 pieces...\n", + "Assigning order 3 pieces...\n", + "Filling gaps...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Gaps to simplify: 100%|█████████████████████████| 23/23 [00:00<00:00, 36.40it/s]\n", + "Gaps to fill: 100%|█████████████████████████████| 10/10 [00:00<00:00, 16.40it/s]\n" + ] + } + ], + "source": [ + "toy_precincts_repaired_df = smart_repair(toy_precincts_df)" + ] + }, + { + "cell_type": "markdown", + "id": "da5cfb42", + "metadata": {}, + "source": [ + "### Check that the repair succeeded:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "099facc7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "maup.doctor(toy_precincts_repaired_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "a9649dd3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "toy_precincts_repaired_df.plot(cmap='tab20', alpha=0.7)" + ] + }, + { + "cell_type": "markdown", + "id": "1ef948c0", + "metadata": {}, + "source": [ + "### Now suppose that the precincts are intended to nest cleanly into the following \"toy counties:\"" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "f1b508e0", + "metadata": {}, + "outputs": [], + "source": [ + "cpoly1 = Polygon([(0,0), (1,0), (1,1), (0,1)])\n", + "cpoly2 = Polygon([(1,0), (2,0), (2,1), (1,1)])\n", + "cpoly3 = Polygon([(0,1), (1,1), (1,2), (0,2)])\n", + "cpoly4 = Polygon([(1,1), (2,1), (2,2), (1,2)])\n", + "\n", + "toy_counties_df = gpd.GeoDataFrame(geometry = gpd.GeoSeries([cpoly1, cpoly2, cpoly3, cpoly4]))" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "fe27dff1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "toy_counties_df.plot(cmap='tab20')" + ] + }, + { + "cell_type": "markdown", + "id": "703128a2", + "metadata": {}, + "source": [ + "### We can perform a \"county-aware\" repair as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "b36d3de2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Snapping all geometries to a grid with precision 10^( -10 ) to avoid GEOS errors.\n", + "Identifying overlaps...\n", + "Resolving overlaps and filling gaps...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Gaps to simplify in region 0: 100%|█████████████| 16/16 [00:00<00:00, 59.89it/s]\n", + "Gaps to fill in region 0: 100%|███████████████████| 4/4 [00:00<00:00, 46.28it/s]\n", + "Gaps to simplify in region 1: 100%|█████████████| 17/17 [00:00<00:00, 43.15it/s]\n", + "Gaps to fill in region 1: 100%|███████████████████| 7/7 [00:00<00:00, 21.96it/s]\n", + "Gaps to simplify in region 2: 100%|█████████████| 15/15 [00:00<00:00, 48.20it/s]\n", + "Gaps to fill in region 2: 100%|███████████████████| 7/7 [00:00<00:00, 26.13it/s]\n", + "Gaps to simplify in region 3: 100%|█████████████| 17/17 [00:00<00:00, 45.25it/s]\n", + "Gaps to fill in region 3: 100%|███████████████████| 5/5 [00:00<00:00, 18.95it/s]\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "toy_precincts_repaired_county_aware_df = smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df)\n", + "toy_precincts_repaired_county_aware_df.plot(cmap = \"tab20\", alpha=0.7)" + ] + }, + { + "cell_type": "markdown", + "id": "55e77296", + "metadata": {}, + "source": [ + "### Next, suppose that we'd like to get rid of small rook adjacencies at corner points where 4 precincts meet. We might reasonably estimate that these all have length less than 0.1, so we can accomplish this as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "84a7eb10", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Snapping all geometries to a grid with precision 10^( -10 ) to avoid GEOS errors.\n", + "Identifying overlaps...\n", + "Resolving overlaps and filling gaps...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Gaps to simplify in region 0: 100%|█████████████| 16/16 [00:00<00:00, 56.65it/s]\n", + "Gaps to fill in region 0: 100%|███████████████████| 4/4 [00:00<00:00, 44.55it/s]\n", + "Gaps to simplify in region 1: 100%|█████████████| 17/17 [00:00<00:00, 43.17it/s]\n", + "Gaps to fill in region 1: 100%|███████████████████| 7/7 [00:00<00:00, 22.09it/s]\n", + "Gaps to simplify in region 2: 100%|█████████████| 15/15 [00:00<00:00, 47.50it/s]\n", + "Gaps to fill in region 2: 100%|███████████████████| 7/7 [00:00<00:00, 26.23it/s]\n", + "Gaps to simplify in region 3: 100%|█████████████| 17/17 [00:00<00:00, 46.05it/s]\n", + "Gaps to fill in region 3: 100%|███████████████████| 5/5 [00:00<00:00, 19.08it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Converting small rook adjacencies to queen...\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "toy_precincts_repaired_county_aware_rook_to_queen_df = smart_repair(toy_precincts_df, nest_within_regions = toy_counties_df, min_rook_length = 0.1)\n", + "toy_precincts_repaired_county_aware_rook_to_queen_df.plot(cmap = \"tab20\", alpha=0.7)" + ] + }, + { + "cell_type": "markdown", + "id": "5c5473a3", + "metadata": {}, + "source": [ + "### The difference is hard to see, so let's zoom in on gap between the 4 original precincts in the upper left-hand corner.\n", + "\n", + "### Original precincts:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "id": "3c5faa10", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "f, ax = plt.subplots(1)\n", + "\n", + "ax.set_xlim(0.35, 0.7)\n", + "ax.set_ylim(1.35, 1.7)\n", + "#remove x axis and y axis\n", + "plt.yticks([])\n", + "plt.xticks([])\n", + "\n", + "toy_precincts_df.plot(ax=ax, cmap='tab20', alpha=0.7)\n", + "\n", + "plt.box(on=True)\n", + "\n", + "plt.show()\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "76172f30", + "metadata": {}, + "source": [ + "### County-aware repair:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "d78b3465", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAGFCAYAAAASI+9IAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAWmklEQVR4nO3d225c12HG8W/t2TM8SNTZVi1bkRQ7ieNIbhAkRdugBeo2CNoGOblOkPQib9AH6BP0QQr0old5hdz0Jjc+xgdRosSDxDM5M5zZe/Zh9YIUzcgiZw85s9eeWf8fYBiwZOszIPGvWWvPyFhrrQAAkBS4HgAAqA6iAAA4RBQAAIeIAgDgEFEAABwiCgCAQ0QBAHAoLPKd8jzXysqK5ubmZIwZ9SYAwJBZa9VqtXTjxg0FwfGvBwpFYWVlRTdv3hzaOACAG4uLi3rttdeO/fZCUZibm5Mk/e5/fqfGbGM4yzAWvjZ9Sz+u/dz1DABn1N5r652f/fXh1/PjFIrCsyOjxmxDjXNEwSevnv+aztuTfxIBGB/9rgC4aMaJLtUuu54AoEREASe6EFx0PQFAiYgCTnRO511PAFAiooATTWnG9QQAJSIKOFEj58ECwCdEAccKbCCT1lzPAFAiooBjXW28JCPewQ74hCjgWC/VX3Y9AUDJiAKOdSW86noCgJIRBRzrYnDJ9QQAJSMKOFZiE9cTAJSMKOBYy8lj1xMAlIwo4Fjz0X3XEwCUjCjgWFHeVRpyhAT4hCjgRHtBy/UEACUiCjjRll13PQFAiYgCTrScLLqeAKBERAEnmo++cD0BQImIAk7UzlrKwtT1DAAlIQroq1Nru54AoCREAX1t2Q3XEwCUhCigrxUumwFvEAX09YB3NgPeIAroayfdVl7LXM8AUAKigEK6XDYDXiAKKGRLXDYDPiAKKGQlXXY9AUAJiAIKWYjmXU8AUAKigEI2euvKAy6bgUlHFFCMkaKw43oFgBEjCihsR5uuJwAYMaKAwp5mT1xPADBiRAGFPeSdzcDEIwoo7Gn8RDbIXc8AMEJEAcUZKQ67rlcAGCGigIHsaMv1BAAjRBQwkFUum4GJRhQwkEfxA9cTAIwQUcBAFqPHsobLZmBSEQUMxki9MHK9AsCIEAUMbNfsuJ4AYESIAga2mq24ngBgRIgCBva499D1BAAjQhQwsMfdR7LGup4BYASIAgaWm1xJGLueAWAEiAJOpRnsuJ4AYASIAk5lPXvqegKAESAKOJXHvQXXEwCMAFHAqSx0H3LZDEwgooBTSZUoDXuuZwAYMqKAU2ubXdcTAAwZUcCpredrricAGDKigFNbTB65ngBgyIgCTu1h976suGwGJglRwKnFNlYWJq5nABgiooAzaQdN1xMADBFRwJlsWi6bgUlCFHAmS8mi6wkAhogo4Ezmu5+7ngBgiIgCzqSbd7lsBiYIUcCZ7QUt1xMADAlRwJlt2nXXEwAMCVHAmS1z2QxMDKKAM3sQfeF6AoAhIQo4s1bWUhamrmcAGAKigKHo1tquJwAYAqKAodiyG64nABgCooChWEmWXE8AMAREAUMxz2UzMBGIAoZiJ91WXstczwBwRkQBQ9Ot7bmeAOCMiAKGZttuup4A4IyIAobmSbbsegKAMyIKGJqH0bzrCQDOiChgaNZ7q8qD3PUMAGdAFDA8RorCjusVAM6AKGCodsRlMzDOiAKG6mm24noCgDMgChiqhfiB6wkAzoAoYKieRMuyXDYDY4soYLiMFIdd1ysAnBJRwNDtatv1BACnRBQwdFw2A+OLKGDoHsUPXU8AcEpEAUO3FD2WNdb1DACnQBQwdNZY9cLI9QwAp0AUMBLNgMtmYBwRBYzEWvbU9QQAp0AUMBKP4wXXEwCcAlHASCxED7hsBsYQUcBI5MqVhLHrGQAGRBQwMi2z43oCgAERBYzMer7qegKAAREFjMzj3iPXEwAMiChgZBa687LishkYJ0QBI5MoUVrvuZ4BYABEASPVDnZdTwAwAKKAkVrP1lxPADAAooCRWuKyGRgrRAEj9SDishkYJ0QBIxXbSFmYup4BoCCigJHbC5quJwAoiChg5DYsl83AuCAKGLnlZNH1BAAFEQWM3Hz3C9cTABREFDBynXxPWZi4ngGgAKKAUuwFbdcTABRAFFCKLbvuegKAAogCSrGcctkMjAOigFI8iO67ngCgAKKAUjTTXWU13tkMVB1RQGm6NS6bgaojCijNljZcTwDQB1FAaVaSZdcTAPRBFFCaBxHvbAaqjiigNNvplvJa5noGgBMQBZQqqu25ngDgBEQBpdrWlusJAE5AFFCqlWTJ9QQAJyAKKNVCPO96AoATEAWUaq23qjzIXc8AcAyigHIZKQ47rlcAOAZRQOl2uGwGKosooHRPsxXXEwAcgyigdAvRA9cTAByDKKB0K/GSLJfNQCURBZTPSHHYdb0CwAsQBTixq23XEwC8AFGAE2vZU9cTALwAUYATCzGXzUAVEQU4sRg9kjXW9QwAzyEKcMIaqySMXM8A8ByiAGd2DZfNQNUQBTizlnPZDFQNUYAzj+MF1xMAPIcowJmF6IGsuGwGqoQowJlcuZJ67HoGgCOIApxqBbuuJwA4gijAqfV81fUEAEcQBTi12HvkegKAI4gCnHrYuc9lM1AhRAFOJUqU1nuuZwA4QBTgXNs0XU8AcIAowLmNfM31BAAHiAKcW0oeu54A4ABRgHMPovuuJwA4QBTgXJR3lYaJ6xkARBRQEXsBl81AFRAFVMKG5bIZqAKigEpYThZdTwAgooCK4LIZqAaigErYy9rKwtT1DMB7RAGV0QlaricA3iMKqIxNu+F6AuA9ooDKWE55ZzPgGlFAZXDZDLhHFFAZzXRXeS1zPQPwGlFApXRqbdcTAK8RBVTKtrhsBlwiCqiUlXTZ9QTAa0QBlbIQzbueAHiNKKBSNnrrXDYDDhEFVIuRolrH9QrAW0QBlbOjTdcTAG8RBVQOl82AO0QBlbMQc9kMuEIUUDmr8VPZIHc9A/ASUUD1GCkKuWwGXCAKqKQdbbmeAHiJKKCSnqYrricAXiIKqKRH8QPXEwAvEQVU0nK8JGu4bAbKRhRQTUaK65HrFYB3iAIqq6lt1xMA7xAFVNZq9sT1BMA7RAGV9aj30PUEwDtEAZW12H0ka6zrGYBXiAIqKze5eiGXzUCZiAIqrRXsup4AeIUooNLWsqeuJwBeIQqotMXegusJwFgyxsiYQLJGWWqUxsX+vXC0s4Czedidlz1nZWRcTwGcM8ZIxsjmVjaX8swq7eVKe7l6caZeJ1W8lyntffXPOe9024V+DKKASsuUKanHaiTTrqcAI2PM/m96rDXKM6s8tUp7mZI4VxJlirqp4nYqm4/+aTyigMprBbu6KqKA8WMCI1mjPLfKMylLc2Xxwe/qu5niTqpeN5NsdR69JgqovPVsVVd13fUMQNKR39Vr/xgnT6zSNFca5+pFmXrdVFE7U5Z89QhnHBAFVN5i8khvmrddz8CE2/9ib2Tt/ll9lubKermSOFMvyhV3UsV75RzhuEQUUHkL3XnZWS6bcQpm/4u9tZLNJZtZZcn+5WwvytSLqnmE4xJRQOX1bE9ZPVGYNFxPQUU8O8J5dl6fpVZZkh9+sY8PLmazhD+TY1BEAWOhHTR1Sddcz8CIPbuYtfbIxWwvVxLn+xezXT+OcFwiChgLG/kqURhXB0c4OjjCyQ+OcJJepiTa/519tJcq6aaul0JEARVWk9Hr9YuaUqCXdcH1HDzPHNzyvPAIJ1XcyRTvcYQzbogCKuVc0NDd2nndi7r69taKpvI1SVLvXEdrr/y72vkVWS6cR+q4I5xenCvhCGfiEQU4ZvUX9TndtQ293W7q9s6yghd8zW/sLeq1+/+lLJxT6/o7as7e1V52USIQxRgdPL21/2x9lu7/lT47wummijoZRzggCiifkfR6/aLuZUb3dtf1cufhn3/jCWppS5eWf69L+r3SqatqvfSOdmfeUjc7P9LNVfXss3B0+Gz9kSOcbqq4m+0/hZNyhINiiAJKMWVCvVW/oLejnt7aXtG5dP3M/80w3tTlpf/VZUnJ7KtqXvsHNRvfVJTPnH2wY19+Fo6UP/tdfWKVxNn+EU4nVdzhCAfDRxQwMlfCGd0zs7q319Ib208U6snIfqx6Z1lXH/+3rkqKz7+h5rW/VzN8Xb28Wu9tODyvPzzC2f9dfRLl+2f1HOHAMaKAIbK6Vb+oe3moe61N3Wg++vJNRiWaat/XS+37uiYpvnRPu5d/qGbtltLc4U93I8V7uZY/bfLOWVQaUcCZ1E2gb9Uv6V4v1d2dJ7oYb3z5jQ6CcJSRNL3zoaZ3PtTLkrpXvq/m5b9R07yqLK+VuCTQyudNRc2kxB8TOB2igIHNHXls9M2tZTXyVdeT+jKSZrf+qNmtP+q6qWnv2g/VuvA9xeaiUk0ryWsa9pNMxhi1Nntae1jsDzcBqoAooACrV8I5va0p3W3v6PbOkpNjoWExNtP59T/o/PofDv+ZVaBk9obSmVeUNK4raVxVEl5SEpxXohkleUPWFv9/zlOjxU92x/bjk+EvooAXCmT0Rv2i3s6kuzurutY9+tjo+AbhOEa5Gp0lNTpLL/x2KylrXFE6+6qSqetKGteU1C8rCS4oMeeU2IayvCZjAm087mh3rVvu/wAwJB5H4dll3+R9gTutmSDUd2oXdC+O9e3tFc2ma64nVYaRFPa2FPa2NK0PX/h98qCh/0v+Q7vNaj3xBAzCmyjMBQ3drp3X7dzqdretW811NdJEvbCuOGwoChvqBaGiWqg4CBXXavt/N0ZxEOz/3Wj/L1lF1qpnrCKbKc5TxcoV54kSm2mcQnMtnNU9zehup6nXR/zY6KQL8p5uzM5rp/lt11OAU5vIKNRkdLM+pzuq63Yc6XZrS1e6a189BzfSdJZoOkt0Md4byo+dWykK6+qFU4rDuuKgrigM94MTBOoFNUVBoNgEB5EximTVkxSZXD17EBplivNMPZspsamGFRoj6U7jku6mRm83N3W9vTDW9wNVcyP5gz4RUcD4moAoWF0Lz+m2mdbtLNPtvZZebW2obp87+ijpC19gpNks0WyWSPFw/puZleJ6Q3FtSnEYKq7VFdX2QxPXaopNcBCcQJGkngkUKT98VRPbXBcU6F4v092dFc31jrybmCAM1VS+qZcu72l9+5zrKcCpjF0UpkyoW+F53bGB7kRd3Wqta6630f9fHGM1I82mPc2mvaGFBqNzY+pPWtf3Xc8ATqXSUTCSXqnP6bYaup0kutPe1vX2+gs/RROoikv55xJRwJiqVBTOBw3dee4yeDp77oPTCAIq7rP0X11PAE7NWRT2L4Mv6I5C3Yoj3Wlv6UrnBZfBwBhZm/pbPX160fUM4NRKikK1LoOBUcjU0Me7f+d6BnAmI4mCj5fBwP3Ge+psl/lBe8DwnTkKXAYDUju8pfm1r7meAZzZwFHgMhj4c9ZKHye/GOgD84CqGigK/9mKdau1w2UwcMSTmX/UxpNZ1zOAoQgG+c5Xui2CAByRmFl9svUD1zOAoRkoCgD+3OfhrxXH/DLC5OBnM3BKu/U3tbD2iusZwFARBeAUrJU+injnMiYPUQBO4fH0T7TTnHI9Axg6ogAMKA6u6LPNe65nACNBFIABfWreU5LwFB4mE1EABrDZ+K6WNq66ngGMDFEACspV00ftH7meAYwUUQAKejj1C7X36q5nACNFFIACuuEr+mL9G65nACNHFIACPs7eVZZxuYzJRxSAPjYb39Pq1gXXM4BSEAWgjyjgaSP4gygAfRjlricApSEKQB9EAT4hCkAfxhIF+IMoAH3wSgE+IQpAH0aZ6wlAaYgC0AdRgE+IAtCHsUQB/iAKQB/GEAX4gygAfZicKMAfRAHoIzDW9QSgNEQB6Ic7BXiEKAB9BDx9BI8QBaAPo9T1BKA0RAHog4+5gE+IAtAHb16DT4gC0AdRgE+IAtAH72iGT4gC0AefkgqfEAWgD2N5+gj+IApAH9wpwCdEAeiDVwrwCVEA+uCiGT4hCkAfHB/BJ0QB6CPg+AgeIQpAX3x0NvxBFIA+jJFkXK8AykEUgAICogBPEAWgAMOfvgZPEAWgAMMrBXiCKAAFEAX4gigABRAF+IIoAAVwpwBfEAWgAF4pwBdEASiAVwrwBVEACuCFAnxBFIACeKUAXxAFoADuFOALogAUYPhQPHiCKAAFmIAowA9EASiAOwX4gigABXClAF8QBaCAgFcK8ARRAArg+Ai+IApAATx9BF8QBaAAXinAF0QBKIBXCvAFUQAK4JUCfEEUgAKIAnxBFIACjHLXE4BSEAWgAO4U4AuiABTA8RF8QRSAAogCfEEUgAK4U4AviAJQAJ99BF8QBaAQXinAD0QBKIBXCvAFUQCKsLxSgB+IAlAArxTgC6IAFMDTR/AFUQAKIQrwA1EACgj4mAt4gigABRiTuZ4AlIIoAAUYnj6CJ4gCUACffQRfEAWgAJ4+gi+IAlCAEXcK8ANRAArg+Ai+IApAAVw0wxdEASjAGKIAPxAFoADuFOALogAUEHB8BE8QBaAQogA/EAWggIDjI3iCKACF8EoBfiAKQAEBUYAniAJQAE8fwRdEASiAzz6CL4gCUACvFOALogAUYCxRgB+IAlAAx0fwBVEACiAK8AVRAAowNnU9ASgFUQAK4JUCfEEUgAKMeKUAPxAFoACePoIviAJQAO9TgC+IAlDAbvgt1xOAUhAFoI84uKyPN3/gegZQCqIA9PGxfqtej18q8AM/04ETrEy9oyebF13PAEpDFIBjxMFVfbT5V65nAKUiCsALWCt9pN8oSfglAr/wMx54gSfT/6SnmxdczwBKRxSA50TBVX3E00bwFFEAjrBW+sj+VkliXE8BnCAKwBEr0z/S6tac6xmAM0QBOBDVXtLHm993PQNwiigA2j82+jD/DcdG8B5RACQtz/xYa1vnXc8AnCMK8F5Uu66P17/negZQCUQBXrNW+iD7tdKUYyNAIgrw3NLMP2t9m2Mj4BmiAG91a9f1yfp3Xc8AKoUowEvWSh9ybAR8BVGAl5Zm/oVjI+AFiAK80w1f0Sfrf+l6BlBJRAFesVb6IPkVx0bAMYgCvLI48xNt7JxzPQOoLKIAb3RqN/TJ2j3XM4BKIwrwgrXSB+mvlGUcGwEnIQrwwuPpn2pzZ9b1DKDyiAImXid8VX9a/47rGcBYIAqYaM+eNuLYCCiGKGCiPZr5qTZ3ZlzPAMYGUcDE2gtf1adrHBsBgyAKmEjWSh/0ODYCBkUUMJEWpn+mrV2OjYBBEQVMnL3wpj5bf8v1DGAsEQVMlP1jo/c4NgJOiShgoixM/0Jbu9OuZwBjiyhgYuyFt/Tp2puuZwBjjShgIlhr9H78rvKcYyPgLIgCJsLD6V9qu8mxEXBWRAFjr12/rc/WvuV6BjARiALGmrVG70fvKs9dLwEmA1HAWHs4/a52mlOuZwATgyhgbLXCO/ps7ZuuZwAThShgLOU20PvxLzk2AoaMKGAsPZh+V7scGwFDRxQwdlr1r+vz1TdczwAmElHAWMltoPe7v5S1vEkNGAWigLHyYPrftNtquJ4BTCyigLHRDN/Q52scGwGjRBQwFnLV9EH0c1meNgJGiihgLMxPvcexEVACooDKa9a/qS/Wvu56BuAFooBKy1XT+52fcmwElIQooNLmp36lZptjI6AsYZHvZK2VJDW7yUjHAEft1r+h95+8JGvbrqcAY68T7Un68uv5cYzt9z0kLS0t6ebNm8NZBgBwZnFxUa+99tqx314oCnmea2VlRXNzczKGd5ICwLix1qrVaunGjRsKguNvDgpFAQDgBy6aAQCHiAIA4BBRAAAcIgoAgENEAQBwiCgAAA4RBQDAof8H6EslzJCM+1YAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "f, ax = plt.subplots(1)\n", + "\n", + "ax.set_xlim(0.35, 0.7)\n", + "ax.set_ylim(1.35, 1.7)\n", + "#remove x axis and y axis\n", + "plt.yticks([])\n", + "plt.xticks([])\n", + "\n", + "toy_precincts_repaired_county_aware_df.plot(ax=ax, cmap='tab20', alpha=0.7)\n", + "\n", + "plt.box(on=True)\n", + "\n", + "plt.show()\n" + ] + }, + { + "cell_type": "markdown", + "id": "d4f13d55", + "metadata": {}, + "source": [ + "### County-aware repair with rook adjacency converted to queen" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "1ad963e8", + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "f, ax = plt.subplots(1)\n", + "\n", + "ax.set_xlim(0.35, 0.7)\n", + "ax.set_ylim(1.35, 1.7)\n", + "#remove x axis and y axis\n", + "plt.yticks([])\n", + "plt.xticks([])\n", + "\n", + "toy_precincts_repaired_county_aware_rook_to_queen_df.plot(ax=ax, cmap='tab20', alpha=0.7)\n", + "\n", + "plt.box(on=True)\n", + "\n", + "plt.show()\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66485de1", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9d2c67ca", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "48dfabae", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "382b91a3", + "metadata": {}, + "source": [ + "### Now let's look a particularly gnarly gap from the Colorado 2020 precinct shapefile. This region consists of 15 precincts that all adjoin a single gap along a county boundary." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "bcfcf407", + "metadata": {}, + "outputs": [], + "source": [ + "bad_gap_region_df = gpd.read_file(\"../examples/Shapefiles/bad_gap_region/bad_gap_region.shp\")" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "id": "91e76d9d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "bad_gap_region_df.plot(cmap='tab20')" + ] + }, + { + "cell_type": "markdown", + "id": "9cb29918", + "metadata": {}, + "source": [ + "### The gap is hard to see, so let's stretch it out:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "84c4b87e", + "metadata": { + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "f, ax = plt.subplots(1)\n", + "\n", + "plt.yticks([])\n", + "plt.xticks([])\n", + "\n", + "plt.xlim([3028000, 3029500])\n", + "plt.ylim([1600000, 1756000])\n", + "\n", + "bad_gap_region_df.plot(ax=ax, cmap='tab20', aspect=0.01)" + ] + }, + { + "cell_type": "markdown", + "id": "e251bad0", + "metadata": {}, + "source": [ + "### Maup's old \"autorepair\" function (now called \"quick_repair\") will assign this entire gap to a single precinct, thereby creating inaccurate adjacency relations between precincts adjoining the gap:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "5bf6378c", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/jnc/Research and Presentations/Gerrymandering and elections/Maup stuff/maup_2_0_0/repair.py:384: UserWarning: The indices of the two GeoSeries are different.\n", + " result = targets.union(sources_to_absorb)\n", + "/Users/jnc/Research and Presentations/Gerrymandering and elections/Maup stuff/maup_2_0_0/repair.py:384: UserWarning: The indices of the two GeoSeries are different.\n", + " result = targets.union(sources_to_absorb)\n" + ] + } + ], + "source": [ + "bad_gap_region_quick_repair_df = quick_repair(bad_gap_region_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "1d73521a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "f, ax = plt.subplots(1)\n", + "\n", + "plt.yticks([])\n", + "plt.xticks([])\n", + "\n", + "plt.xlim([3028000, 3029500])\n", + "plt.ylim([1600000, 1756000])\n", + "\n", + "bad_gap_region_quick_repair_df.plot(ax=ax, cmap='tab20', aspect=0.01)" + ] + }, + { + "cell_type": "markdown", + "id": "0ffaeff5", + "metadata": {}, + "source": [ + "### The \"smart_repair\" function does a much better job:" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "2f16f2a4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Snapping all geometries to a grid with precision 10^( -5 ) to avoid GEOS errors.\n", + "Identifying overlaps...\n", + "Resolving overlaps...\n", + "Assigning order 2 pieces...\n", + "Filling gaps...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Gaps to simplify: 100%|███████████████████████████| 3/3 [00:10<00:00, 3.34s/it]\n", + "Gaps to fill: 100%|███████████████████████████████| 1/1 [00:04<00:00, 4.98s/it]\n" + ] + } + ], + "source": [ + "bad_gap_region_smart_repair_df = smart_repair(bad_gap_region_df)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "0f2a3b6d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "f, ax = plt.subplots(1)\n", + "\n", + "plt.yticks([])\n", + "plt.xticks([])\n", + "\n", + "plt.xlim([3028000, 3029500])\n", + "plt.ylim([1600000, 1756000])\n", + "\n", + "bad_gap_region_smart_repair_df.plot(ax=ax, cmap='tab20', aspect=0.01)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "006d434e", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "679dc6f7", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81a6aaea", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b8bb2b0", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ea7919f", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.5" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/pyproject.toml b/pyproject.toml index 57fda45..175fad7 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "maup" -version = "1.1.3" +version = "2.0.0" description = "The geospatial toolkit for redistricting data" authors = [ "Metric Geometry and Gerrymandering Group ", diff --git a/tests/__init__.py b/tests/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/tests/conftest.py b/tests/conftest.py index 34d7f68..3f6c1b0 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,7 +1,7 @@ import geopandas as gp +import pandas as pd import pytest from shapely.geometry import Polygon -import pandas as pd import maup CRS = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" @@ -90,10 +90,9 @@ def square_mostly_in_top_left(): def squares_some_neat_some_overlapping( square_mostly_in_top_left, squares_within_four_square_grid ): - result = pd.concat( - [squares_within_four_square_grid, square_mostly_in_top_left], - ignore_index=True, - ) + result = pd.concat([squares_within_four_square_grid, + square_mostly_in_top_left], ignore_index=True + ) result.crs = CRS return result diff --git a/tests/test_crs.py b/tests/test_crs.py index f617870..7fe09ea 100644 --- a/tests/test_crs.py +++ b/tests/test_crs.py @@ -5,8 +5,8 @@ def test_require_same_crs(square, four_square_grid): square_gdf = gpd.GeoDataFrame([{"geometry": square}]) - square_gdf.crs = "epsg:4269" - four_square_grid.crs = "epsg:4326" + square_gdf.crs = 4432 + four_square_grid.crs = 4433 @require_same_crs def f(sources, targets): diff --git a/tests/test_repair.py b/tests/test_repair.py index 0801258..6747e10 100644 --- a/tests/test_repair.py +++ b/tests/test_repair.py @@ -1,6 +1,6 @@ import geopandas import maup -from maup.repair import count_overlaps +from maup.repair import count_overlaps, autorepair, quick_repair import pytest # These tests are losely based off the test_example_case in test_prorate.py @@ -35,12 +35,14 @@ def test_example_autorepair_MI(): assert holes.unary_union.area > 100 assert len(holes) > 0 - shp["geometry"] = maup.autorepair(shp, relative_threshold=None) + shp["geometry"] = maup.quick_repair(shp, relative_threshold=None) assert count_overlaps(shp) == 0 holes = maup.repair.holes_of_union(shp) assert holes.empty or holes.unary_union.area < 1e-10 # overlaps are not guaranteed to disappear - assert maup.doctor(shp) + # This assertion is a terrible idea since there will almost certainly still be small + # gaps and overlaps remaining! + # assert maup.doctor(shp) def test_snap_shp_to_grid(): shp = geopandas.read_file("zip://./examples/MI.zip") # MI shapefile @@ -97,4 +99,8 @@ def test_crop_to(): def test_apply_func_error(): with pytest.raises(TypeError): - maup.repair.apply_func_to_polygon_parts("not a Polygon object", lambda x: x) \ No newline at end of file + maup.repair.apply_func_to_polygon_parts("not a Polygon object", lambda x: x) + + +# def test_quick_repair_equals_autorepair(): + \ No newline at end of file diff --git a/tests/test_smart_repair.py b/tests/test_smart_repair.py new file mode 100644 index 0000000..a6912d1 --- /dev/null +++ b/tests/test_smart_repair.py @@ -0,0 +1,83 @@ +import random +import geopandas +import maup +import pytest +from shapely.geometry import Point, Polygon + +from maup import assign, doctor +from maup.adjacencies import adjacencies +from maup.smart_repair import smart_repair + + +@pytest.fixture +def toy_precincts_geoseries(): + random.seed(2023) + ppolys = [] + for i in range(4): + for j in range(4): + poly = Polygon( + [(0.5*i + 0.1*k, 0.5*j + (random.random() - 0.5)/12) for k in range(6)] + + [(0.5*(i+1) + (random.random() - 0.5)/12, 0.5*j + 0.1*k) for k in range(1,6)] + + [(0.5*(i+1) - 0.1*k, 0.5*(j+1) + (random.random() - 0.5)/12) for k in range(1,6)] + + [(0.5*i + (random.random() - 0.5)/12, 0.5*(j+1) - 0.1*k) for k in range(1,5)] + ) + ppolys.append(poly) + + return geopandas.GeoSeries(ppolys) + +@pytest.fixture +def toy_precincts_geodataframe(): + random.seed(2023) + ppolys = [] + for i in range(4): + for j in range(4): + poly = Polygon( + [(0.5*i + 0.1*k, 0.5*j + (random.random() - 0.5)/12) for k in range(6)] + + [(0.5*(i+1) + (random.random() - 0.5)/12, 0.5*j + 0.1*k) for k in range(1,6)] + + [(0.5*(i+1) - 0.1*k, 0.5*(j+1) + (random.random() - 0.5)/12) for k in range(1,6)] + + [(0.5*i + (random.random() - 0.5)/12, 0.5*(j+1) - 0.1*k) for k in range(1,5)] + ) + ppolys.append(poly) + + return geopandas.GeoDataFrame(geometry = geopandas.GeoSeries(ppolys)) + +@pytest.fixture +def toy_counties_geodataframe(): + cpoly1 = Polygon([(0,0), (1,0), (1,1), (0,1)]) + cpoly2 = Polygon([(1,0), (2,0), (2,1), (1,1)]) + cpoly3 = Polygon([(0,1), (1,1), (1,2), (0,2)]) + cpoly4 = Polygon([(1,1), (2,1), (2,2), (1,2)]) + + return geopandas.GeoDataFrame(geometry = geopandas.GeoSeries([cpoly1, cpoly2, cpoly3, cpoly4])) + + +class TestSmartRepair: + def test_smart_repair_basic_output_from_gdf_clean(self, toy_precincts_geodataframe): + repaired_gdf = smart_repair(toy_precincts_geodataframe) + assert isinstance(repaired_gdf, geopandas.GeoDataFrame) + assert doctor(repaired_gdf) + + def test_smart_repair_basic_output_from_gs_clean(self, toy_precincts_geoseries): + repaired_gs = smart_repair(toy_precincts_geoseries) + assert isinstance(repaired_gs, geopandas.GeoSeries) + assert doctor(repaired_gs) + + def test_nest_within_regions(self, toy_precincts_geodataframe, toy_counties_geodataframe): + repaired_with_regions_gdf = smart_repair(toy_precincts_geodataframe, + nest_within_regions = toy_counties_geodataframe + ) + p_to_c = assign(toy_precincts_geodataframe, toy_counties_geodataframe) + for p in p_to_c.index: + assert toy_counties_geodataframe.geometry[p_to_c[p]].contains(repaired_with_regions_gdf.geometry[p]) + + def test_small_rook_to_queen(self, toy_precincts_geodataframe): + repaired_basic_gdf = smart_repair(toy_precincts_geodataframe) + assert min(adjacencies(repaired_basic_gdf).length) < 0.05 + + repaired_srtq_gdf = smart_repair(toy_precincts_geodataframe, min_rook_length=0.05) + assert min(adjacencies(repaired_srtq_gdf).length) > 0.05 + + +# There should also be a lot of unit tests for all the component functions, +# but this could mushroom into a BIG project that will have to wait for another day! +