Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with missing data when using subsidized_residential_feasibility #214

Open
theocharides opened this issue Sep 3, 2020 · 2 comments

Comments

@theocharides
Copy link

theocharides commented Sep 3, 2020

When adding the "jobs housing fee" policy to BAUS, an error occurred that led to the following caution on using the "subsidized_residential_feasibility" model. It seems like the following items could lead to missing values, and would be worth looking into:

  • If two policies are activated and using subsidized_residential_feasibility to create subsidized units, summary.parcel_output will join parcels_geography to the feasibility table twice. For duplicate columns, the newer column will be called '_y'.

  • In the case of a parcels_geography attribute like 'tra_id', because summary.parcel_output is a dynamic table that grows with each iteration, the newer column would be the correct one, though it is not the one that is maintained.

  • We avoid this problem because some attributes like 'tra_id' (with all of its values) are added to parcels at the onset, and remain the dominant column through this process. But this is something to note for any columns we hope to get from the parcels_geography join.

@yuqiww
Copy link
Member

yuqiww commented Sep 3, 2020

The overlapping columns might have been introduced in two places in subsidies.py:

  • calculating the fee which uses county as index, which could potentially duplicate the "county" column; this was fixed by renaming the column (here and here),.

  • merging "parcels_geography" into the "feasibility" variable in the "subsidized_residential_feasibility" step which creates "_y" for duplicated columns. As @theocharides mentioned above, we think it doesn't affect the initial columns that are already in "feasibility", which come from the "parcels" variable. "Feasibility" is built from scratch in each iteration.

The attached "parcel_output_fields" helps understand when the duplicated columns were created:

  • The yellow-highlighted fields didn't get into parcel_output until Year 2025. These fields belong to "parcels_geography", which means this variable somehow got into the process in year 2025. This also created those "_y" fields. "Parcels_geography" was introduced in the step of "subsidized_residential_feasibility".
  • All the red-highlighted fields also showed up starting in Year 25. They were created in the step of "run_subsidized_developer": here, here, and here.
  • In Draft Blueprint, we didn't use the revenue from "vmt-fee" or "jobs-housing fee" to subsidize commercial or housing development, therefore only the
    parcel_output_fields (1).xlsx
    'lump-sum account' strategy used these two functions, here and here. I checked the run log, no building was subsidized before 2025. This explains why these new fields didn't show up until this year.

Finally, the "overlapping columns" branch added code to print out the columns of key variables in summaries.py in each iteration, for reference.

@yuqiww
Copy link
Member

yuqiww commented Sep 3, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants