Split Ticket TX 2020 Election Latinos Analysis by Andrew Hong
Dep. Variable: Trump Gain
R² = 0.47
Variable | Coef | Std Err | t | P>|t| | [0.025 | 0.975] |
---|---|---|---|---|---|---|
const | -0.0194 | 0.001 | -13.349 | 0.000 | -0.022 | -0.017 |
White | 0.0099 | 0.002 | 5.926 | 0.000 | 0.007 | 0.013 |
Black or African American | 0.0230 | 0.002 | 12.219 | 0.000 | 0.019 | 0.027 |
Asian | 0.0272 | 0.003 | 10.179 | 0.000 | 0.022 | 0.032 |
Hispanic or Latino | 0.0663 | 0.002 | 41.734 | 0.000 | 0.063 | 0.069 |
Sanders Gain | 0.1467 | 0.003 | 48.189 | 0.000 | 0.141 | 0.153 |
Median Income | -0.0006 | 0.001 | -0.869 | 0.385 | -0.002 | 0.001 |
County Pop. Density | -0.0285 | 0.001 | -39.252 | 0.000 | -0.030 | -0.027 |
Median Age | 0.0144 | 0.001 | 23.135 | 0.000 | 0.013 | 0.016 |
College Attainment | 0.0101 | 0.001 | 7.818 | 0.000 | 0.008 | 0.013 |
The above is the resulting output of a Weighted (by Population) Least-Squares regression predicting Donald Trump Gains (2016-2020) off of a series of demographical and political variables including race, age, income, (4-year) college attainment, county population density, and Bernie Sanders Gains (2016-2020 Dem. Primary). The key takeaway is that Sanders Gain and Latino population proportion were the two strongest predictors of Trump Gains (p<0.0001). Interestingly despite the overwhelming focus on Latinos driving Trump Gains in Texas, Sanders Gains was actually more predictive of Trump Gains than Latino population proportion was. A +10% greater Sanders Gain predicts a +1.5% greater Trump Gain per Census Block Group (BG), while a +10% greater Latino population proportion predicts just +0.6% greater Trump Gain.
After this initial regression, I added an interaction term between Latinos and Sanders shift to the original regression, which is computationally just multiplying the two together. In this context, the interaction term tests whether the relationship between Sanders’ gains and Trump’s gains is stronger in areas with large Latino populations.
Dep. Variable: Trump Gain
R² = 0.52
Variable | Coef | Std Err | t | P>|t| | [0.025 | 0.975] |
---|---|---|---|---|---|---|
const | -0.0089 | 0.001 | -6.415 | 0.000 | -0.012 | -0.006 |
White | -0.0050 | 0.002 | -3.094 | 0.002 | -0.008 | -0.002 |
Black or African American | 0.0219 | 0.002 | 12.289 | 0.000 | 0.018 | 0.025 |
Asian | 0.0100 | 0.003 | 3.905 | 0.000 | 0.005 | 0.015 |
Hispanic or Latino | 0.0526 | 0.002 | 34.256 | 0.000 | 0.050 | 0.056 |
Sanders Gain | -0.0387 | 0.005 | -7.631 | 0.000 | -0.049 | -0.029 |
Latino-Sanders Gain Interaction | 0.4400 | 0.010 | 44.464 | 0.000 | 0.421 | 0.459 |
Median Income | -0.0012 | 0.001 | -1.658 | 0.097 | -0.003 | 0.000 |
County Pop. Density | -0.0300 | 0.001 | -43.481 | 0.000 | -0.031 | -0.029 |
Median Age | 0.0115 | 0.001 | 19.473 | 0.000 | 0.010 | 0.013 |
College Attainment | 0.0091 | 0.001 | 7.417 | 0.000 | 0.007 | 0.011 |
Indeed, Sanders gains were much more predictive of Trump gains in heavily-Latino areas. This new Latino-Sanders Gain interaction term was the strongest variable and its inclusion decreased the predictive power of both Latino population and Sanders’ gain, suggesting combining both variables together predicts with Trump gains. Furthermore, the Sanders Gain variable jumped from the largest positive to largest negative coefficient, suggesting that Sanders Gain predicts Trump Gains in high-Latino areas, but actually predicts Trump Losses in areas with few Latinos.
One possible explanation for the larger Sanders-Latino interaction coefficient is that squaring the two ≤1.00 terms makes a small magnitude term, prompting the regression to inflate its coefficient value to compensate for the smaller value. To test that explanation, I squared the Latino and Sanders terms from the original regression and ran each squared term in individual regressions. The resulting coefficients were much smaller than the Sanders-Latino interaction term (0.0816, 0.2163 < 0.44), reaffirming the large coefficient reflects a strong predictive value of the interaction term.
Notes: Each instance in the regression was a unique 2020 Census Block Group, the most granular geography the Census publishes income and education data on. Using the Maup Python packege, I dis/aggregated Census block-level 2016-2020 Presidential General Election, block-level 2016 Democratic Primary, and precinct-level 2016 Democratic Primary data to the Block Group Level. To ensure standardized variable magnitudes, Income, County Density, and Median Age were transformed to a 0-1 uniform distribution (or 0.0-1.0 percentile scale) to align with the 0-1 (or -1 to 1 for Sanders/Trump Gain) percent format of the race, political, and College Attainment variables. I tested different approaches to standardizing these variables, but the general regression result did not change.
There were some missing and (geospatially) unmatchable precincts from the 2016 Democratic Presidential Primary, but 90% of precincts were able to be included. Importantly, the major metropolitan areas of Texas were included including >99% of the Rio Grande Valley. The bulk of these missing precincts were in sporadic small rural counties and pockets of Denton County (Dallas-Fort Worth suburbs). While the missing data is minor enough to not change the overall statistical pattern of the results, it still likely compromised the precision of certain estimates.
Full repository with source files and code will be uploaded shortly.