Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test the 2 datacat Python APIs #7

Closed
zonca opened this issue Sep 21, 2021 · 11 comments
Closed

Test the 2 datacat Python APIs #7

zonca opened this issue Sep 21, 2021 · 11 comments
Assignees

Comments

@zonca
Copy link
Member

zonca commented Sep 21, 2021

Related to #2

Install the Python client https://gitlab.com/supercdms/slaclab-datacat
Documentation of data catalog: https://confluence.slac.stanford.edu/display/CDMS/SuperCDMS+Data+Catalog

  • Are both independent of the cdms package?
  • Try to launch a query using search
  • Would it be easier to directly use the REST API? (probably not)
@zonca zonca self-assigned this Sep 21, 2021
@zonca
Copy link
Member Author

zonca commented Oct 20, 2021

Slacklab-datacat

I installed the Python package and trying to understand how to run it, in:
https://gitlab.com/supercdms/slaclab-datacat#dump-datacat-path-resource-path-vetoevents-metadata-value-for-datasets-at-slac
it seems it needs a configuration file ~/.datacat/default.cfg, how do I get it?

@pibion who I can ask for help with this? thanks!

The package is also avaible on CVMFS, but gives same ValueError: Client has no API URL configured error, probably due to the configuration file missing.

@zonca
Copy link
Member Author

zonca commented Oct 20, 2021

CDMSDataCatalog

This package is available on the SuperCDMS JupyterHub on Jetstream via CVMFS and it also seems preconfigured:

dc=CDMSDataCatalog()
dc.ls('/') #really searches '/CDMS/'
dc.ls('/CDMS/ANIMAL')
/CDMS/ANIMAL
/CDMS/AuxiliaryData
/CDMS/CUTE
/CDMS/NEXUS
/CDMS/SLAC
/CDMS/SNOLAB
/CDMS/Scratch
/CDMS/Simulations
/CDMS/Soudan
/CDMS/StaticDataGroups
/CDMS/SuperSim
/CDMS/TRIUMF
/CDMS/Test
/CDMS/UCB
/CDMS/UMN
/CDMS/eTravelerFiles
/CDMS/ANIMAL/R68
/CDMS/ANIMAL/R70

However the example query doesn't work:

datasets=dc.search('/CDMS/**',query='DataLevel eq "RRQ" and Source eq "Ba" and Bulldozed eq "1"')

returns an empty list.

@zonca
Copy link
Member Author

zonca commented Oct 22, 2021

@bloer I'm trying to access the CDMS data catalog, what is the recommended method? can you please checkout above here what I am doing wrong? Thanks!!

@pibion
Copy link

pibion commented Oct 22, 2021

@thathayhaykid I'm pulling you into this thread as well since you have some experience working with the data catalog!

@pibion
Copy link

pibion commented Oct 22, 2021

@zonca I'd recommend taking off the "Bulldozed eq 1" part of the query, although you might get a long list of files back?

@bloer
Copy link

bloer commented Oct 22, 2021

There's slightly more documentation here: https://www.slac.stanford.edu/exp/cdms/software/releasedocs/V04-05/CDMSDataCatalog-0.6.1/ . I suggest using the higher-level CDMSDataCatalog.findData method https://www.slac.stanford.edu/exp/cdms/software/releasedocs/V04-05/CDMSDataCatalog-0.6.1/CDMSDataCatalog.html#CDMSDataCatalog.CDMSDataCatalog.CDMSDataCatalog.findData

I find the low-level query syntax hard to use, especially the path multi-wildcard bit, but it's documented here: https://github.com/slaclab/datacat/wiki/Search-Syntax / I don't see where you got the example query from, but it doesn't look malformed. We've changed around some of the metadata labels around recently so most likely there just isn't any data matching the query

@zonca
Copy link
Member Author

zonca commented Oct 23, 2021

Thanks @bloer
Can you please provide an example of using find Data that returns a bunch of "interesting" (for your definition of "interesting") files?

@zonca
Copy link
Member Author

zonca commented Oct 26, 2021

@zonca I'd recommend taking off the "Bulldozed eq 1" part of the query, although you might get a long list of files back?

still no results

@zonca
Copy link
Member Author

zonca commented Oct 26, 2021

I tried findData as suggested by @bloer, picking the first example in the docs:

dc.findData(Facility='CUTE', nFridgeRun=14, ProdStep='BatNoise')
[<CDMSDataset Class, Name: Prod2T_Filter_23200301_140601.root>,
 <CDMSDataset Class, Name: Prod2T_Filter_23200303_172847.root>,
 <CDMSDataset Class, Name: Prod2T_Filter_23200303_231114.root>,
 <CDMSDataset Class, Name: ProdTest1_Filter_23200301_140601.root>,
 <CDMSDataset Class, Name: ProdTest1_Filter_23200303_172847.root>,
 <CDMSDataset Class, Name: ProdTest1_Filter_23200303_231114.root>,
 <CDMSDataset Class, Name: TestCode_Filter_23200303_172847.root>,
 <CDMSDataset Class, Name: TestCode_Filter_23200303_231114.root>]

but then I don't know what to do with those objects, @thathayhaykid how do I plot them? Is there an example notebook I can look at?

@zonca
Copy link
Member Author

zonca commented Oct 26, 2021

@zonca
Copy link
Member Author

zonca commented Oct 26, 2021

@pibion @thathayhaykid it seems that is returned by findData are root files, so I tried to follow:

https://gitlab.com/supercdms/Analysis/tutorials/-/blob/master/tutorials/CUTE/CUTE.ipynb

but keys in the root file are all different so I'm stuck.

Anyway findData seems the way to go, I'll open another issue for defining the API.

@zonca zonca closed this as completed Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants