-
Notifications
You must be signed in to change notification settings - Fork 10
Description
The code that's referenced will be policyengine_us_data/db/etl_irs_soi.py. It is not yet checked in at the time of making this issue.
Context: In the interest of time, all targets in the IRS SOI data set except for AGI were not futher split up into amount and count; only the amounts were used as targets at the various geographic levels. One partial exception is eitc, which was split up by number of children and required new strata but the variables aggregated were still eitc amounts.
Given that the IRS SOI data set provides tax unit counts as well as amounts, we are arguably leaving something on the table by not getting them. On the other hand, there are likely diminishing returns from adding extra targets which yield additional complexity.
If the assignee wants to take this issue even further, in addition to incorporating tax_unit_counts for each variable, the IRS data set is also split up by AGI band, so both counts and amounts of each variable could be further partitioned by AGI. This would create a lot more strata and potentially take cell counts close to zero.