Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: zone not in index #189

Open
4Step opened this issue Dec 6, 2024 · 3 comments
Open

KeyError: zone not in index #189

4Step opened this issue Dec 6, 2024 · 3 comments
Labels

Comments

@4Step
Copy link

4Step commented Dec 6, 2024

Describe the bug
The program bug arises with Group Quarters model where the control file is set at zonal level and the land-use data contains GQ data (Group_quarters_pop_noninstitutionalized) for only few zones (like 25% of all zones). The PUMS to TAZ crosswalk file includes all zones, even the ones with no GQ data. As the PopulationSIM runs it loops over each PUMA and selects zones to process. Somewhere here is the bug it crashes with the following error:
KeyError: u'the label [8619] is not in the [index]'
Closing remaining open files:C:\TSM_NextGen_v5\PopSim\Florida\Setup\output\GQ\pipeline.h5...done

Work around
Use a separate crosswalk file for GQ with only zones and PUMA for which GQ data exist. However, this is annoying as each model year could have a different GQ set of zones.

The log file prints the following details

INFO - initial_seed_balancing seed id 8619
Traceback (most recent call last):
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\run_populationsim.py", line 62, in
pipeline.run(models=steps, resume_after=resume_after)
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 571, in run
run_model(model)
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 472, in run_model
orca.run([step_name])
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\orca\orca.py", line 1992, in run
step()
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\orca\orca.py", line 797, in call
return self._func(**kwargs)
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\populationsim\steps\initial_seed_balancing.py", line 82, in initial_seed_balancing
control_totals=seed_controls_df.loc[seed_id],
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1478, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1911, in _getitem_axis
self._validate_key(key, axis)
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1798, in _validate_key
error()
File "C:\TSM_NextGen_v5\PopSim\Florida\Setup\software\Anaconda2\envs\popsim\lib\site-packages\pandas\core\indexing.py", line 1785, in error
axis=self.obj._get_axis_name(axis)))

To Reproduce
Steps to reproduce the behavior:

  1. Well, select a few random zones and remove GQ data from the field referred in controls.csv
  2. Run populationSIM and it should produce the out of index error
  3. The reported zone number is one after the actual zone with no GQ data.

Expected behavior
Skip the zones with no GQ data.

Screenshots
image

Additional context
Supplying a separate crosswalk file with only the list of TAZ with GQ data works fine. This might have to do with the looping and processing of the crosswalk file. A temporary internal crosswalk might help where only the zones with GQ controls could be used.

@4Step 4Step added the bug label Dec 6, 2024
@bettinardi
Copy link
Collaborator

Shouldn't this be posted on the popsim issues page?
For context, Oregon always runs GQ popsim different from a general population popsim run (and then stiches them back together). This is done because most of the controls only related to the general population, so it creates an internal fight within the popsim balancing to try and complete both GP and GQ in the same simulation/run.

@jfdman jfdman transferred this issue from ActivitySim/activitysim Dec 6, 2024
@jfdman
Copy link
Collaborator

jfdman commented Dec 6, 2024

@4Step it might be helpful if you could post the data somewhere so we can replicate the error you are receiving. Note that I transferred this issue from ActivitySim to PopulationSim.

@4Step
Copy link
Author

4Step commented Dec 9, 2024

Shouldn't this be posted on the popsim issues page? For context, Oregon always runs GQ popsim different from a general population popsim run (and then stiches them back together). This is done because most of the controls only related to the general population, so it creates an internal fight within the popsim balancing to try and complete both GP and GQ in the same simulation/run.

@bettinardi , Our implementation is similar to Oregon where we run general population and GQ population separately then combine both. Just to clarify, the issue I posted is related to GQ application where it requires a second crosswalk file with a list of zones that consists non-zero GQ (the general population crosswalk file results in the error I posted).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants