last_id is silently not supported for Subject.where() #212

mwalmsley · 2019-04-04T12:59:16Z

Adding last_id={id} to Subject.where() appears to have no effect and no error.

Test Case

Executing:

`subjects = Subject.where(
scope='project',
project_id='5733'
)

for n in range(10):
s = subjects.next()`

Gives the following result:

Adding last_id=30091682 gives the same result as above.

camallen · 2019-04-04T13:45:44Z

This is due to the subject API resource lacking the optimized last_id support. That was added to speed up the classifications API but it should be ported to the each resource.

Paging through the resource result sets via next / previous links is the standard support for resources and subject does work this way. Does that meet your use case here?

mwalmsley · 2019-04-09T14:21:41Z

I think that I didn't give enough thought to what I actually needed to accomplish here.

I realised that in order for iteratively downloaded (yay for last_id) classifications to be useful, I need the metadata from the subject to link those classifications back to the science catalog.

classification <-(links.subject, subject_id)-> subject (metadata.science_id, science_id) <- science_catalog

My first thought was to download all new subjects with last_id - but of course, that's not how subjects work! Old subjects can get new classifications.

Paging would work to download all subjects, but doing that daily would be slow and duplicate calls.

My current solution is to get the specific subject for each new classification:

subject_id = classification['links']['subjects'][0] # only works for single-subject projects

subject = get_subject(project_id, subject_id) # assume id is unique

classification['links']['subject'] = subject.raw

save_classification_to_file(classification, save_loc)

and decorate get_subject (which is simply the Python client) with a huge lru_cache, on the assumption that subjects tend to appear repeatedly at similar times (i.e. the currently active subject set).

This saves me having to maintain an up-to-date duplicate database of all subjects, but is a bit slow vs. the optimised classification interface.

I would guess that wanting to get the subject details along with the classification details would be quite useful for others, though I'm not sure how best to implement this.

mwalmsley added the bug label Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

last_id is silently not supported for Subject.where() #212

last_id is silently not supported for Subject.where() #212

mwalmsley commented Apr 4, 2019 •

edited

Loading

camallen commented Apr 4, 2019

mwalmsley commented Apr 9, 2019 •

edited

Loading

last_id is silently not supported for Subject.where() #212

last_id is silently not supported for Subject.where() #212

Comments

mwalmsley commented Apr 4, 2019 • edited Loading

Test Case

camallen commented Apr 4, 2019

mwalmsley commented Apr 9, 2019 • edited Loading

mwalmsley commented Apr 4, 2019 •

edited

Loading

mwalmsley commented Apr 9, 2019 •

edited

Loading