-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✋ BLOCKED - Program reports violations for data_object_set.was_generated_by
references
#5
Comments
Excerpts from the schema (with unrelated parts replaced with Base slots:
# ...
was_generated_by:
name: was_generated_by
from_schema: https://w3id.org/nmdc/nmdc
mappings:
- prov:wasGeneratedBy
range: WorkflowExecution Class definition: classes:
# ...
DataObject:
name: DataObject
# ...
slots:
# ...
- was_generated_by
slot_usage:
# ...
was_generated_by:
name: was_generated_by
pattern: ^^(nmdc):(wfmag|wfmb|wfmgan|wfmgas|wfmsa|wfmp|wfmt|wfmtan|wfmtas|wfnom|wfrbt|wfrqc)-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})(\.[0-9]{1,})$|^^(nmdc):(omprc|dgms|dgns)-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$
structured_pattern:
syntax: ^{id_nmdc_prefix}:(wfmag|wfmb|wfmgan|wfmgas|wfmsa|wfmp|wfmt|wfmtan|wfmtas|wfnom|wfrbt|wfrqc)-{id_shoulder}-{id_blade}{id_version}$|^{id_nmdc_prefix}:(omprc|dgms|dgns)-{id_shoulder}-{id_blade}$
interpolated: true
class_uri: nmdc:DataObject Database slot: classes:
# ...
Database:
name: Database
# ...
slots:
# ...
- workflow_execution_set
class_uri: nmdc:Database
tree_root: true Base slots:
# ...
workflow_execution_set:
name: workflow_execution_set
description: This property links a database object to the set of workflow executions.
from_schema: https://w3id.org/nmdc/nmdc
mixins:
- object_set
range: WorkflowExecution Class classes:
# ...
WorkflowExecution:
name: WorkflowExecution
# ...
is_a: PlannedProcess
abstract: true
slots:
# ...
- was_informed_by
slot_usage:
# ...
has_input:
name: has_input
required: true
pattern: ^(nmdc):(dobj)-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
has_output:
name: has_output
pattern: ^(nmdc):(dobj)-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$
structured_pattern:
syntax: '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
interpolated: true
# ...
was_informed_by:
name: was_informed_by
required: true
class_uri: nmdc:WorkflowExecution
rules:
# ... |
Here's a curl -X 'GET' \
'https://api-berkeley.microbiomedata.org/nmdcschema/ids/nmdc%3Aomprc-13-4wkf0639' \
-H 'accept: application/json' Here is that document: {
"id": "nmdc:omprc-13-4wkf0639",
"name": "Rachael_21T_04-15A_M_14Mar17_leopard_Infuse",
"has_input": [
"nmdc:bsm-13-4bfysc34"
],
"has_output": [
"nmdc:dobj-13-xx781m34"
],
"description": "High resolution MS spectra only",
"processing_institution": "EMSL",
"type": "nmdc:MassSpectrometry",
"alternative_identifiers": [
"emsl:570856"
],
"analyte_category": "nom",
"associated_studies": [
"nmdc:sty-11-33fbta56"
],
"instrument_used": [
"nmdc:inst-14-nstrhv39"
]
} I used this Mongo query to find out which collection that document resides in. Show/hide Mongo queryconst id = "nmdc:omprc-13-4wkf0639";
const db = db.getSiblingDB("nmdc");
const collectionNames = db.getCollectionNames();
for (const collectionName of collectionNames) {
print("Processing: " + collectionName);
// Skip these collections.
if (collectionName.startsWith("system.")
|| collectionName.startsWith("minter.")
|| collectionName.startsWith("_")
|| collectionName.startsWith("ids_")) {
continue;
}
const collection = db.getCollection(collectionName);
// Search this collection.
const cursor = collection.find({id: id}).limit(1);
if (cursor.hasNext()) {
print("Found in: " + collectionName);
break;
}
} It resides in the |
I think my teammate expected |
Turns out this particular violation (and presumably others, although I haven't confirmed it) is due to this bug in the |
data_object_set.was_generated_by
references (false positive)data_object_set.was_generated_by
references
Now that the missing Edit: I am not able to re-run it. See the comment below this one for details. |
Progress is blocked by the absence of the necessary YAML-formatted schema file in the |
data_object_set.was_generated_by
referencesdata_object_set.was_generated_by
references
Here is a single example.
Excerpt from
violations.tsv
:Excerpt from
references.tsv
:One of my teammates that is very familiar with the schema and database thinks this result is erroneous.
Configuration
# Download the raw content of https://github.com/microbiomedata/berkeley-schema-fy24/blob/main/nmdc_schema/nmdc_materialized_patterns.yaml curl -o schema.yaml https://raw.githubusercontent.com/microbiomedata/berkeley-schema-fy24/main/nmdc_schema/nmdc_materialized_patterns.yaml
Tasks
The text was updated successfully, but these errors were encountered: