Possible issue with missing data when using subsidized_residential_feasibility #214

theocharides · 2020-09-03T01:24:57Z

When adding the "jobs housing fee" policy to BAUS, an error occurred that led to the following caution on using the "subsidized_residential_feasibility" model. It seems like the following items could lead to missing values, and would be worth looking into:

If two policies are activated and using subsidized_residential_feasibility to create subsidized units, summary.parcel_output will join parcels_geography to the feasibility table twice. For duplicate columns, the newer column will be called '_y'.
In the case of a parcels_geography attribute like 'tra_id', because summary.parcel_output is a dynamic table that grows with each iteration, the newer column would be the correct one, though it is not the one that is maintained.
We avoid this problem because some attributes like 'tra_id' (with all of its values) are added to parcels at the onset, and remain the dominant column through this process. But this is something to note for any columns we hope to get from the parcels_geography join.

yuqiww · 2020-09-03T19:32:50Z

The overlapping columns might have been introduced in two places in subsidies.py:

calculating the fee which uses county as index, which could potentially duplicate the "county" column; this was fixed by renaming the column (here and here),.
merging "parcels_geography" into the "feasibility" variable in the "subsidized_residential_feasibility" step which creates "_y" for duplicated columns. As @theocharides mentioned above, we think it doesn't affect the initial columns that are already in "feasibility", which come from the "parcels" variable. "Feasibility" is built from scratch in each iteration.

The attached "parcel_output_fields" helps understand when the duplicated columns were created:

The yellow-highlighted fields didn't get into parcel_output until Year 2025. These fields belong to "parcels_geography", which means this variable somehow got into the process in year 2025. This also created those "_y" fields. "Parcels_geography" was introduced in the step of "subsidized_residential_feasibility".
All the red-highlighted fields also showed up starting in Year 25. They were created in the step of "run_subsidized_developer": here, here, and here.
In Draft Blueprint, we didn't use the revenue from "vmt-fee" or "jobs-housing fee" to subsidize commercial or housing development, therefore only the
parcel_output_fields (1).xlsx
'lump-sum account' strategy used these two functions, here and here. I checked the run log, no building was subsidized before 2025. This explains why these new fields didn't show up until this year.

Finally, the "overlapping columns" branch added code to print out the columns of key variables in summaries.py in each iteration, for reference.

yuqiww · 2020-09-03T19:33:27Z

parcel_output_fields (1).xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible issue with missing data when using subsidized_residential_feasibility #214

Possible issue with missing data when using subsidized_residential_feasibility #214

theocharides commented Sep 3, 2020 •

edited

Loading

yuqiww commented Sep 3, 2020

yuqiww commented Sep 3, 2020

Possible issue with missing data when using subsidized_residential_feasibility #214

Possible issue with missing data when using subsidized_residential_feasibility #214

Comments

theocharides commented Sep 3, 2020 • edited Loading

yuqiww commented Sep 3, 2020

yuqiww commented Sep 3, 2020

theocharides commented Sep 3, 2020 •

edited

Loading