Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboarding new city: Baltimore #121

Open
terryf82 opened this issue Jun 7, 2018 · 9 comments
Open

Onboarding new city: Baltimore #121

terryf82 opened this issue Jun 7, 2018 · 9 comments
Assignees

Comments

@terryf82
Copy link
Collaborator

terryf82 commented Jun 7, 2018

No description provided.

@terryf82
Copy link
Collaborator Author

terryf82 commented Jun 7, 2018

@alicefeng I can't remember the specific issues you encountered when trying to run Baltimore through the pipeline, or is it now working?

It doesn't look as though OSM has a polygon for the city so it'll probably need to be handled through a separate approach like Brisbane.

@alicefeng
Copy link
Collaborator

@terryf82 Oh this was the issue where the individual crash ids were alphanumeric rather than strictly numeric which clashed with our data standards (at the time - not sure if we've modified the standards since then).

@terryf82
Copy link
Collaborator Author

@alicefeng I've updated the crashes & concerns standards in the data_standards branch to allow for both string and numeric ids. Give it a run on Baltimore when you get a chance and let me know how it goes!

@terryf82
Copy link
Collaborator Author

Hey @alicefeng the latest commits to the data_standards branch should allow you to get past the graph_from_place() problem that was preventing us from onboarding Baltimore.

Basically there's a new function there that checks the OSM API (nominatim) for a polygon. If it finds one it returns the position, which is fed into graph_from_place() as which_result=x (sometimes the polygon isn't the first result). If there's no polygon for a city, we use graph_from_point() against the city lat+lng instead.

I've been testing the Baltimore pipeline using crash data from https://data.maryland.gov/Public-Safety/MDTA-Accidents/rqid-652u (not sure if this is the same source you were using?) and even though the map is now built properly, it still breaks in train_model. I tried a few different config file setups (start_year, end_year etc) but no luck, mostly I hit this error (@bpben any thoughts?)

Training model... Outputting to: /app/data/baltimore/processed/ Segment features included: ['width', 'lanes', 'hwy_type', 'osm_speed', 'oneway'] Traceback (most recent call last): File "/opt/conda/envs/boston-crash-model/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/opt/conda/envs/boston-crash-model/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/src/models/train_model.py", line 206, in <module> crash_lags = format_crash_data(data_nonzero, 'crash', week, year) File "/app/src/models/model_utils.py", line 14, in format_crash_data target_idx = all_dates[(all_dates.year==target_year)&(all_dates.week==target_week)].index.values[0] IndexError: index 0 is out of bounds for axis 0 with size 0

@alicefeng
Copy link
Collaborator

@terryf82 Awesome about the function for checking if there's a polygon. And yes, that looks to be the dataset I was using.

@alicefeng
Copy link
Collaborator

I just tried running the updated data_standards branch on Baltimore and failed with the same error @terryf82 pasted above.

@bpben

@alicefeng
Copy link
Collaborator

I fixed an error on my end and tried rerunning the pipeline for Baltimore. It's still failing at the model training script though this time I got a different error from before:

File "/opt/conda/envs/boston-crash-model/lib/python3.6/site-packages/sklearn/metrics/ranking.py", line 268, in _binary_roc_auc_score raise ValueError("Only one class present in y_true. ROC AUC score " ValueError: Only one class present in y_true. ROC AUC score is not defined in that case.

@bpben
Copy link
Collaborator

bpben commented Jun 30, 2018

Either there's no crashes or very few crashes. I just took a look at the canonical dataset you sent me and there's zero crashes there. Maybe send me the original crash dataset or just the full baltimore folder?

@alicefeng
Copy link
Collaborator

Yeah, that was due to an error on my part. I fixed it, reran the pipeline and now have a canonical dataset that has non-zero crashes in it. Using that dataset led me the second error posted here. I'll send you that file.

But you said even having all zeroes shouldn't lead to the first error right? (@terryf82 's dataset had non-zero weeks and he also got the first error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants