Authors: Jae Yeon Kim, Milan de Vries, Hahrie Han
This repository contains datasets and source files used to produce the MapAgora civic opportunity datasets, developed for the study of American local politics, civil society, and public policy.
- R version 4.4.0 (2024-04-24)
- Platform: aarch64-apple-darwin20
- Running under: macOS 15.3.2
This project provides three core datasets:
- Dataset 1: Organization-level dataset of de-identified nonprofit organizations
- Datasets 2-3: Aggregated civic opportunity counts at the county and ZIP code levels
- Datasets 4-5: Aggregated organizational type breakdowns at the county and ZIP code levels
Datasets 2-5 are derived from Dataset 1 using 01_dataset_generation.Rmd
, which also generates Figure 1 and Supplementary Table S2 and Figure S1.
This dataset includes 1,774,798 de-identified nonprofit organizations. To protect privacy and reduce the risk of misinterpretation, all identifying information (e.g., organization names and EINs) is removed.
Each observation includes:
-
Unique identifier:
id
: a row index used for reference; contains no identifying information.
-
Geographic identifiers:
state
: two-letter state abbreviationcity
: city name listed in the IRS recordFIPS
: county FIPS codeZCTA
: ZIP Code Tabulation Areais_po
: indicates whether the organization lists a P.O. Box as its mailing address (1 = yes, 0 = no)
-
Civic opportunity indicators:
membership
,volunteer
,events
,take_action
: binary variables indicating whether the organization provides each type of civic opportunity (1 = provides the opportunity; 0 = does not provide it or information is unavailable)opp_binary
: equals 1 if the organization provides at least one civic opportunity; 0 otherwiseopp_mean
: the mean of the four civic opportunity binary indicators
-
Organizational type:
predicted
: machine-learned classification of the organization (e.g., religious, political, professional)
-
Federated indicator:
grouping_value
: an anonymized internal identifier used to track federated organizations (e.g., national networks with local chapters)
-
Financial attributes:
asset_amt
: total assetsincome_amt
: total incomerevenue_amt
: total revenue
Dimensions: 1,774,798 rows
File Access and Format Differences:
Format | File Size | Available At |
---|---|---|
.parquet |
41.9 MB | GitHub and Harvard Dataverse |
.csv |
125.4 MB | Harvard Dataverse only (not hosted on GitHub due to file size limits) |
.rds |
286.1 MB | Harvard Dataverse only (not hosted on GitHub due to file size limits) |
Derived from Dataset 1, this dataset aggregates civic opportunity indicators and socioeconomic characteristics at the ZIP code (ZCTA) and county levels. Each observation corresponds to a geographic unit and includes counts of civic opportunity types, a composite score and index, normalized indicators, and contextual variables from the American Community Survey (ACS).
Each observation includes:
-
Geographic identifiers:
state
: two-letter state abbreviationFIPS
: county FIPS codeZCTA
: ZIP Code Tabulation Area
-
Organizational counts:
n
: total nonprofit organizationscivic_org_sum
: total civic opportunity organizationsmembership_sum
: total organizations providing membership opportunitiesvolunteer_sum
: total organizations providing volunteer opportunitiesevents_sum
: total organizations providing public event opportunitiestake_action_sum
: total organizations providing political or civic action opportunities
-
Composite civic opportunity scores:
civic_opp_sum
: total opportunity score
-
Normalized civic opportunity indicators (per capita):
civic_org_sum_normalized
: total number of civic organizations per capitacivic_opp_sum_normalized
: total civic opportunities per capitacivic_opp_index
: quintile-based civic opportunity index, derived by dividingcivic_opp_sum_normalized
into five equal-sized binsmembership_sum_normalized
: total number of organizations providing membership opportunities per capitavolunteer_sum_normalized
: total number of organizations providing volunteer opportunities per capitaevents_sum_normalized
: total number of organizations providing public event opportunities per capitatake_action_sum_normalized
: total number of organizations providing political or civic action opportunities per capita
-
Sociodemographic indicators:
TotalPopulation
: total populationPOV150
: poverty rateSNGPNT
: single-parent householdsBROAD
: households without broadband accessNOHSDP
: adults without a high school diplomaUNEMP
: unemployment rateREMNRTY
: share of racial or ethnic minority residents
Dimensions:
- County level: 3,281 rows
$\times$ 24 columns - ZIP code level: 30,988 rows
$\times$ 24 columns
County Level Files:
ZIP Code Level Files:
Also derived from Dataset 1, this dataset summarizes the types of organizations that provide civic opportunities at the ZIP code and county levels. Each observation corresponds to a unique geography-organization type pair and enables analysis of regional patterns in the composition of civic infrastructure.
Each observation includes:
-
Geographic identifiers:
FIPS
: county FIPS codeZCTA
: ZIP Code Tabulation Area
-
Organizational counts:
n
: number of civic opportunity???providing organizations of a given type in the geography
-
Organization type classification:
class
: predicted organizational type (e.g., religious, political, professional)
-
Relative frequency:
freq
: proportion of civic opportunity organizations in the geography that fall into the given class
-
Primary provider type:
primary_org_cat
: the most common civic opportunity organization type in the geography; appears once per unit
Dimensions:
- County level: 29,687 rows
$\times$ 5 columns - ZIP code level: 150,162 rows
$\times$ 5 columns
County Level Files:
ZIP Code Level Files:
-
Data description:
02_description.Rmd
- Produces Figures 2-3 and Tables 4-5
-
Data validation:
03_validation.Rmd
- Produces Figures 4-7