Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(records): add LHCb Ntuples example from the hackathon #3720

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tiborsimko
Copy link
Member

@tiborsimko tiborsimko commented Feb 10, 2025

This is a work-in-progress illustrating how the open data portal records resulting from an LHCb Ntupling Service request could look like.

Just for illustration in order to discuss and amend the content.

CC @dillfitz @pietnogga

@dillfitz
Copy link

dillfitz commented Feb 18, 2025

I gave a first pass of editing, but this is still a work in progress from my end. I will also start some threads on the changed file for finer details.

I tried to push some changes so we could have a discussion on the updated diff, but I did not have the permissions. Note that in particular in the description of the umbrella record, I included some text for communication purposes that will not be included in the resulting record. I tried to highlight this in my review comments. I will paste the diff below for now:

@@ -1,7 +1,7 @@
 [
   {
     "abstract": {
-      "description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}.</p><p>FIXME Here should come a long explanation of inputs, ntuples, etc.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place wher they would be added."
+      "description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>. NOTE -- IN THE FUTURE, NEED TO FETCH FROM REQUEST DETAILS HERE https://gitlab.cern.ch/cernopendata/lhcb-ntupling-service-requests-dev/-/issues/142 OR A MORE SECURE PLACE! </p><p> Ntuples are created using DaVinci version {v46r9 - extract from info.yaml} and the Analysis Productions batch processing system. Quantities saved to the Ntuple are specified during the request phase and detailed in the following code configuration files.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place where they would be added."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -54,7 +54,7 @@
       "secondary": ["Collision"]
     },
     "usage": {
-      "description": "<p>You can clone and amend this ntupling request here:",
+      "description": "<p>Ntuples and instructions are provided in the links under Related datasets. They are ready to be downloaded and used for analysis. If these do not suit your needs, you can clone and amend this ntupling request here:",
       "links": [
         {
           "description": "LHCb Open Data Ntupling Service",
@@ -65,7 +65,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -180,7 +180,7 @@
       "stream": "BHADRON",
       "version": "stripping21r1"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K-{piplus}pi+) {piminus}pi-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) pi-]CC -- 2011 Magnet Down",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -197,7 +197,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -272,7 +272,7 @@
       "stream": "BHADRON",
       "version": "stripping21r1p2"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K- {piplus}pi+) {Kminus_0}K-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) K-]CC -- 2011 Magnet Down",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -289,7 +289,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -379,7 +379,7 @@
       "stream": "BHADRON",
       "version": "stripping21r1"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K-{piplus}pi+) {piminus}pi-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) pi-]CC -- 2011 Magnet Up",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -396,7 +396,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -461,7 +461,7 @@
       "stream": "BHADRON",
       "version": "stripping21r1p2"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K- {piplus}pi+) {Kminus_0}K-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) K-]CC -- 2011 Magnet Up",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -478,7 +478,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -593,7 +593,7 @@
       "stream": "BHADRON",
       "version": "stripping21"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K-{piplus}pi+) {piminus}pi-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) pi-]CC -- 2012 Magnet Down",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -610,7 +610,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -715,7 +715,7 @@
       "stream": "BHADRON",
       "version": "stripping21r0p2"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K- {piplus}pi+) {Kminus_0}K-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) K-]CC -- 2012 Magnet Down",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -732,7 +732,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -882,7 +882,7 @@
       "stream": "BHADRON",
       "version": "stripping21"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K-{piplus}pi+) {piminus}pi-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) pi-]CC -- 2012 Magnet Up",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]
@@ -899,7 +899,7 @@
   },
   {
     "abstract": {
-      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}."
+      "description": "Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>."
     },
     "accelerator": "CERN-LHC",
     "collaboration": {
@@ -1004,7 +1004,7 @@
       "stream": "BHADRON",
       "version": "stripping21r0p2"
     },
-    "title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K- {piplus}pi+) {Kminus_0}K-]CC",
+    "title": "[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) K-]CC -- 2012 Magnet Up",
     "type": {
       "primary": "Dataset",
       "secondary": ["Collision"]

Copy link

@dillfitz dillfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done with a first pass of review, though some points would be nice to iterate on. Since I couldn't push changes directly, I have included my diff in a different comment on the PR so you can see my suggestions so far.

I can update this diff and I make more changes unless I am able to push directly in the future. Let me know what you prefer.

Thanks a lot for setting this up!

[
{
"abstract": {
"description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}.</p><p>FIXME Here should come a long explanation of inputs, ntuples, etc.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place wher they would be added."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: Here should come a long explanation of inputs, ntuples, etc.

This could be pretty tricky depending on how detailed we are. If we endeavor to provide a similar level of detail on the inputs as https://opendata.cern.ch/record/28071, there is a bit of groundwork to do (e.g. steps for how DST files were produced). At the very least we can point to the proper DST files used to create the Ntuples, as well as software versions used in the Ntuple Making step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation can stay more brief, anything you would like to highlight and/or make available for search.

[
{
"abstract": {
"description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}.</p><p>FIXME Here should come a long explanation of inputs, ntuples, etc.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place wher they would be added."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE: If we can attach the code...

I will have to check with DPA but I agree this would be nice. Then we can mention that some information about the tools used to save quantities to the Ntuples are specified in the config files.

"stream": "BHADRON",
"version": "stripping21r1"
},
"title": "{Lambda_b0}[Lambda_b0 -> {Lambda_cplus}(Lambda_c+ -> {pplus}p+ {Kminus}K-{piplus}pi+) {piminus}pi-]CC",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For here and other occurrences. Right now this similar decay descriptor is repeated for 4 different "related datasets" (we have 8 related datasets -- 2 decays x 2 years x 2 magnet polarities). I have added the year and magnet polarity to the title, which can be reliably extracted from the info.yaml file (and will need to be for other fields in the json file, though I am not sure how those are currently used). I have also made the decay descriptor more readable by removing the Ntuple branch names, which could be implemented algorithmically by removing substrings enclosed in "{}".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title look good 👍

Shall we also amend the umbrella record title? Currently it says:

LHCb Ntuples from user request 142

Perhaps it should similarly say:

[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) K-]CC ntuples

or including request ID:

[Lambda_b0 -> (Lambda_c+ -> p+ K- pi+) K-]CC ntuples from user request 142 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI went for the following titles in the PR update. (Since the request number 142 is already prominently displayed in the record, we probably don't need it in the title?)

image

[
{
"abstract": {
"description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}.</p><p>FIXME Here should come a long explanation of inputs, ntuples, etc.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place wher they would be added."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE {insert reason for request} -- we should extract this from the request details here or perhaps an immutable source if it is stored elsewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this needs to be inputted by the LHCb open data team, as relying on a free text supplied by a user could contain slang, or internal information, or other such language that would not be understandable outside of the context.

(In the reason, 1 user talks to N LHCb open data team; whilst for the promoted ntuples, anybody in the world can access them, missing the original context)

"secondary": ["Collision"]
},
"usage": {
"description": "<p>You can clone and amend this ntupling request here:",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested a modification to encourage folks to check out the existing Ntuples and only clone and amend the request if these Ntuples do not suit their needs.

[
{
"abstract": {
"description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}.</p><p>FIXME Here should come a long explanation of inputs, ntuples, etc.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place wher they would be added."
Copy link

@dillfitz dillfitz Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me summarize my previous comments on this line in light of our discussion this morning with a suggestion:

Suggested change
"description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for {insert reason for request}.</p><p>FIXME Here should come a long explanation of inputs, ntuples, etc.</p><p>FIXME If we can attach the <code>info.yaml</code> etc, this is the place wher they would be added."
"description": "<p>Data from proton-proton (pp) collisions collected by the LHCb experiment filtered to produce Ntuples for exploring heavy baryon decays used as control channels for CP violation studies. This was used as an example during the <a href=https://indico.cern.ch/event/1429526/ target=_blank>First LHCb Open Data and Ntuple Wizard Workshop</a>.</p> <p>Ntuples are created using DaVinci version v46r11 and the Analysis Productions system to process the BHADRON.MDST datasets for both magnet polarities in years 2016, 2017, and 2018. Quantities saved to the Ntuple are specified during the request phase and detailed in the following code configuration files.</p><p>PLACEHOLDER FOR CODE</p>

I am not sure how to insert viewable code into the records, but the files are located here (https://gitlab.cern.ch/cernopendata/lhcb-ntupling-service-requests-dev/-/tree/opendata-ntupling-service-request-142-development/request/142/baryon_example_run1). We probably only need to include the yaml files, but I am open to suggestions. I recommend we just include the code and I can run it by DPA when I show them what we have been working on.

Note: I have not tested this since I am not in the office and I have not gotten the local deployment of the ODP setup on my laptop. The changes are straightforward enough though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final comment from this morning and before merging, I think we should add something like "Originated from request 142" in the description of the related datasets (not the parent) so that if another request is promoted with the same decay descriptor, the records can still be uniquely identified visually when performing a search. We probably want to put it early up in the description so it is always visible in the search results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably only need to include the yaml files, but I am open to suggestions. I recommend we just include the code and I can run it by DPA when I show them what we have been working on.

I'll amend the example attaching all these input files, and we can see how it looks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add something like "Originated from request 142" in the description of the related datasets (not the parent)

Yes, I'll enrich the "selection" part for daughter records in this sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll amend the example attaching all these input files, and we can see how it looks.

Added all the input files that go to the AnaProd merge request. (Perhaps this may come useful in longer term, thinking about a future where the Ntupling Service would be deprecated or discontinued and such.)

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll enrich the "selection" part for daughter records in this sense.

Enriched the selection information in this sense.

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dillfitz BTW a terminology question. Do we spell it "Ntuples" with uppercase N in regular sentences? In ATLAS and CMS documents, we usually just say "ntuples" with lowercase in the regular text...

@tiborsimko
Copy link
Member Author

I can update this diff and I make more changes unless I am able to push directly in the future.

I'll modify the current PR to continue discussions here, but for the next PRs we can just be editing the same upstream branch, so that edits would be easier.

(If you prefer, I could even close this PR and open a new one in a shared-editable state already.)

@tiborsimko tiborsimko force-pushed the lhcb-ntuples-hackathon branch from 4b8937b to d0dd9b8 Compare February 28, 2025 09:12
@tiborsimko tiborsimko changed the title WIP LHCb ntuple records from the hackathon feat(records): add LHCb Ntuples example from the hackathon Feb 28, 2025
@tiborsimko tiborsimko marked this pull request as ready for review February 28, 2025 09:17
@tiborsimko tiborsimko force-pushed the lhcb-ntuples-hackathon branch 2 times, most recently from 2eedfc2 to 80b0b25 Compare March 3, 2025 21:31
@tiborsimko tiborsimko force-pushed the lhcb-ntuples-hackathon branch from 80b0b25 to 1903699 Compare March 11, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants