-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expanding dataset validity information in DBS #118
Comments
Just to summaries the solution which we have discussed and proposed with @vkuznet here and here: We are about to
The information should be populated manually for the time being. Once we see it working we may think of automating the process through the WM system. |
hi Dima, Since as it was shortly mentioned in the private e-mails exchanged last night, that an initial set of flags could indeed help a lot during the next deletion campaign even before the full functionality is delivered, I am now pushing harder to at least create the new database schema and test it in integration. Could you just check the set of flags as proposed/summarized in the issue description here, and express your opinion:
|
I am quoting Dima from the Jira issue here, just as an update to the current one to keep it in line with the discussions happening in Jira as well. """
We found no reason to create an additional table just to add these 4 extra columns. VARCHAR2 is an efficient string format using only actual string size plus 1 byte for string length. We have also reviewed the option to add information about superseded datasets, but we concluded that it will be very hard to make it work automatically since there is no mechanism to track relationships between datasets in different campaigns. Therefore, if people want to protect datasets that are not available in other campaigns or have other unique properties, we will use the new isProtected flag. Valentin Y Kuznetsov, do you see any issue with this proposal? |
Impact of the new feature
This is a request related to improving data (version) management and to protect Open Data during tape deletion campaigns.
Is your feature request related to a problem? Please describe.
Original issue created in Jira: https://its.cern.ch/jira/browse/CMSTRANSF-857
We have multiple versions of valid datasets in the system. It's necessary to differentiate them for efficient disk space management and tape deletions.
Use cases:
We need to know for each dataset if it was superseded by another dataset. It's especially important for MiniAOD and NanoAOD, which may have many different versions. The best way to achieve that is to save the id/name of the new dataset. For this to work, we need PdmV to supply what datasets the current request supersedes and handle it properly in the workflow management.
It's important to keep in mind, that some data and MC doesn't get reprocessed, so we cannot rely simply on campaigns or RegExp patterns.
Open data needs to be protected separately as unique use case.
Describe the solution you'd like
As initially suggested in the Jira ticket the solution should wrap around providing 2 more flags for each dataset to DBS:
And later extended to the following set of 4 flags:
The set of flags and flag data types are still subject to discussion.
Describe alternatives you've considered
No alternatives suggested
Additional context
N/A
The text was updated successfully, but these errors were encountered: