-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for generating a set of tracking-ids from a slicing operation into an aggregation. #451
Comments
Thanks @bnlawrence, great write up relating to what we all discussed today. No comments as yet, since I will need time to think over this and study some background, but just FYI for now, I'm |
This is resolvable with NCAS-CMS/cfa-conventions#41. With this change to the CFA conventions, cf-python could automatically create auxiliary coordinate constructs from any non-standardised aggregation metadata, making it available for slicing. With >>> f = cf.read('example_1b.nc')[0] # aggregated array has 12 months split over two files
>>> f.coord('long_name=tracking_id')
<CF AuxiliaryCoordinate: long_name=tracking_id(12, 1, 73, 144) >
# Each element of "f" has a tracking_id, but there are only two different values
>>> print(f.coord('long_name=tracking_id').array[:, 0, 0, 0])
[[[['764489ad-7bee-4228' '764489ad-7bee-4228' '764489ad-7bee-4228'
'764489ad-7bee-4228' '764489ad-7bee-4228' '764489ad-7bee-4228'
'a4f8deb3-fae1-26b6' 'a4f8deb3-fae1-26b6' 'a4f8deb3-fae1-26b6'
'a4f8deb3-fae1-26b6' 'a4f8deb3-fae1-26b6' 'a4f8deb3-fae1-26b6']]]]
# Find unique tracking IDs
>>> print(f.coord('long_name=tracking_id').data.unique())
<CF Data(2): [764489ad-7bee-4228, a4f8deb3-fae1-26b]>
# Find unique tracking IDs corresponding to a subspace:
>>> g = f.subspace(T=cf.wi(cf.dt('1959-12-01'), cf.dt('1960-03-01'))))
>>> print(g.coord('long_name%tracking_id').data.unique())
<CF Data(1): [764489ad-7bee-4228]> Memory storage wise, this is cheap, because each fragment's tracking ID array will be cf.FullArray instance, which just stores the scalar common to that fragment. However, when we come to get the (unique) values, the array will be expanded in memory into the full shape of the subspace. This will managed by |
This is all implemented in #630 |
Closing now #630 is merged. |
Consider the following use case:
cf.within
(or any other sort of valid slice) into that aggregation to extract a mean value of a particular variable over a week.(This is obviously a trivial case, it gets more interesting, if say, these are calculations carried out across ensembles from multiple institutions).
The feature request is that
cf.within
(or other slice) can return a set of tracking-ids which can be added to a list of "provenance sources" ... (so potentially a series of cf calculations can generate a list of all the files needed for reproduction (and/or citation).The text was updated successfully, but these errors were encountered: