Skip to content

andrewmhong/tx-2020-election-latinos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 

Repository files navigation

tx-2020-election-latinos

Split Ticket TX 2020 Election Latinos Analysis by Andrew Hong

Regression Output

Weighted (by Population) Least Squares Regression Table

Dep. Variable: Trump Gain

R² = 0.47

Variable Coef Std Err t P>|t| [0.025 0.975]
const -0.0194 0.001 -13.349 0.000 -0.022 -0.017
White 0.0099 0.002 5.926 0.000 0.007 0.013
Black or African American 0.0230 0.002 12.219 0.000 0.019 0.027
Asian 0.0272 0.003 10.179 0.000 0.022 0.032
Hispanic or Latino 0.0663 0.002 41.734 0.000 0.063 0.069
Sanders Gain 0.1467 0.003 48.189 0.000 0.141 0.153
Median Income -0.0006 0.001 -0.869 0.385 -0.002 0.001
County Pop. Density -0.0285 0.001 -39.252 0.000 -0.030 -0.027
Median Age 0.0144 0.001 23.135 0.000 0.013 0.016
College Attainment 0.0101 0.001 7.818 0.000 0.008 0.013

The above is the resulting output of a Weighted (by Population) Least-Squares regression predicting Donald Trump Gains (2016-2020) off of a series of demographical and political variables including race, age, income, (4-year) college attainment, county population density, and Bernie Sanders Gains (2016-2020 Dem. Primary). The key takeaway is that Sanders Gain and Latino population proportion were the two strongest predictors of Trump Gains (p<0.0001). Interestingly despite the overwhelming focus on Latinos driving Trump Gains in Texas, Sanders Gains was actually more predictive of Trump Gains than Latino population proportion was. A +10% greater Sanders Gain predicts a +1.5% greater Trump Gain per Census Block Group (BG), while a +10% greater Latino population proportion predicts just +0.6% greater Trump Gain.

After this initial regression, I added an interaction term between Latinos and Sanders shift to the original regression, which is computationally just multiplying the two together. In this context, the interaction term tests whether the relationship between Sanders’ gains and Trump’s gains is stronger in areas with large Latino populations.

Weighted (by Population) Least Squares Regression Table (w/Sanders-Latino Interaction)

Dep. Variable: Trump Gain

R² = 0.52

Variable Coef Std Err t P>|t| [0.025 0.975]
const -0.0089 0.001 -6.415 0.000 -0.012 -0.006
White -0.0050 0.002 -3.094 0.002 -0.008 -0.002
Black or African American 0.0219 0.002 12.289 0.000 0.018 0.025
Asian 0.0100 0.003 3.905 0.000 0.005 0.015
Hispanic or Latino 0.0526 0.002 34.256 0.000 0.050 0.056
Sanders Gain -0.0387 0.005 -7.631 0.000 -0.049 -0.029
Latino-Sanders Gain Interaction 0.4400 0.010 44.464 0.000 0.421 0.459
Median Income -0.0012 0.001 -1.658 0.097 -0.003 0.000
County Pop. Density -0.0300 0.001 -43.481 0.000 -0.031 -0.029
Median Age 0.0115 0.001 19.473 0.000 0.010 0.013
College Attainment 0.0091 0.001 7.417 0.000 0.007 0.011

Indeed, Sanders gains were much more predictive of Trump gains in heavily-Latino areas. This new Latino-Sanders Gain interaction term was the strongest variable and its inclusion decreased the predictive power of both Latino population and Sanders’ gain, suggesting combining both variables together predicts with Trump gains. Furthermore, the Sanders Gain variable jumped from the largest positive to largest negative coefficient, suggesting that Sanders Gain predicts Trump Gains in high-Latino areas, but actually predicts Trump Losses in areas with few Latinos.

One possible explanation for the larger Sanders-Latino interaction coefficient is that squaring the two ≤1.00 terms makes a small magnitude term, prompting the regression to inflate its coefficient value to compensate for the smaller value. To test that explanation, I squared the Latino and Sanders terms from the original regression and ran each squared term in individual regressions. The resulting coefficients were much smaller than the Sanders-Latino interaction term (0.0816, 0.2163 < 0.44), reaffirming the large coefficient reflects a strong predictive value of the interaction term.


Notes: Each instance in the regression was a unique 2020 Census Block Group, the most granular geography the Census publishes income and education data on. Using the Maup Python packege, I dis/aggregated Census block-level 2016-2020 Presidential General Election, block-level 2016 Democratic Primary, and precinct-level 2016 Democratic Primary data to the Block Group Level. To ensure standardized variable magnitudes, Income, County Density, and Median Age were transformed to a 0-1 uniform distribution (or 0.0-1.0 percentile scale) to align with the 0-1 (or -1 to 1 for Sanders/Trump Gain) percent format of the race, political, and College Attainment variables. I tested different approaches to standardizing these variables, but the general regression result did not change.

There were some missing and (geospatially) unmatchable precincts from the 2016 Democratic Presidential Primary, but 90% of precincts were able to be included. Importantly, the major metropolitan areas of Texas were included including >99% of the Rio Grande Valley. The bulk of these missing precincts were in sporadic small rural counties and pockets of Denton County (Dallas-Fort Worth suburbs). While the missing data is minor enough to not change the overall statistical pattern of the results, it still likely compromised the precision of certain estimates.

Full repository with source files and code will be uploaded shortly.

About

Split Ticket 2020 Election TX Latinos Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages