-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change nmdc-schema's MIxS import to GSC's 6.2 YAML #1368
Comments
prepare for rebuild of then |
MAM 2024-04-15: below, I think I changed the
|
|
|
|
|
ValueError: Conflicting URIs (https://raw.githubusercontent.com/microbiomedata/nmdc-schema/main/src/schema/mixs.yaml, https://w3id.org/nmdc/core) for item: sequencing a subset in both mixs.yaml and core.yaml not in use by nmdc-schema from core.yaml, so commented out same thing for
|
comment out from nmdc.yaml. ignore assets/old_python/reconsititute_mixs.py same with
|
ValueError: File "nmdc.yaml", line 639, col 9 Class "Biosample" - unknown slot: "has numeric value" ???
|
new enum names: oxy_stat_samp:
range: oxy_stat_samp_enum ValueError: File "nmdc.yaml", line 1105, col 16 slot: Biosample_oxy_stat_samp - unrecognized range (oxy_stat_samp_enum) oxy_stat_samp:
description: Oxygenation status of sample
title: oxygenation status of sample
examples:
- value: aerobic
from_schema: https://w3id.org/mixs
keywords:
- oxygen
- sample
- status
slot_uri: MIXS:0000753
range: OXY_STAT_SAMP_ENUM |
I'm sacrificing the ability to assign "new" MIxS slots to classes in nmdc.yaml without defining them. Before this branch, the build process would detect assigned and undefined MIxS slots (in Biosample and OmicsProcessing) and add them to our mixs.yaml. But it was only able to retrieve slots from a single MIxS class, like MimsSoil. Moving forward, if an undefined slot is assigned to a class, the build will fail with an error message like ??? That could be interpreted in many cases that the slot and one of its MIxS classes should be added to assets/other_mixs_yaml_files/mixs_slots_import_sheet.tsv This has resulted in a much shorter project.Makefile |
project.Makefile uses yq to convert flat MIxS slot ranges into structured nmdc-schema ranges like QuantityValue. In this branch, I have removed all range changes for slots that haven't been used in MongoDB yet. Files in the src/data path whose name contains the sub-string |
For 2024-04-15 discussion with @sujaypatil96 about current import process Checked out main, fetched, pulled poetry update
make squeaky-clean all test No MIxS cleanup is performed by make mixs-yaml-clean
make --dry-run src/schema/mixs.yaml Depends on
Note the use of Most of the That could have been done with modifications_and_validation from sheets_and_friends I would like to change all of the |
As of this comment, the nmdc-schema includes 494 MIxS slots yq e '.slots | keys' src/schema/mixs.yaml | wc -l Five of those are grouping slots which aren't associated with any class PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX MIXS: <https://w3id.org/mixs/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select
*
where {
{
values ?t {
owl:DatatypeProperty
owl:ObjectProperty
}
graph nmdc:nmdc-no-use-native-uris {
?p rdf:type ?t .
filter(strstarts(str(?p), "https://w3id.org/mixs/"))
}
optional {
?p rdfs:label ?l
}
}
minus {
graph nmdc:nmdc_relation_graph {
?s ?p ?o .
filter(strstarts(str(?p), "https://w3id.org/mixs/"))
}
}
}
|
Only 73 have been used in MongoDB as of 2024-04-11 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select
#?st ?p ?l ?ot (count(?s) as ?count)
?p ?l (count(?s) as ?count)
where {
graph <https://api.microbiomedata.org> {
?s ?p ?o .
optional {
?s a ?st
}
optional {
?o a ?ot
}
minus {
?s a ?o
}
}
optional {
?p rdfs:label ?l
}
filter(strstarts(str(?p), "https://w3id.org/mixs/"))
}
group by ?p ?l
order by ?l
|
wget https://raw.githubusercontent.com/GenomicsStandardsConsortium/mixs/v6.2.0/src/mixs/schema/mixs.yaml
yq e '.slots | keys' mixs.yaml | sed 's/^- //' | sort > mixs.6.2.slots.txt |
Where awk -F',' 'NR>1 {print $2}' used-73-mixs-slots.csv | sort > used-73-mixs-slot-names.txt |
comm -23 used-73-mixs-slot-names.txt mixs.6.2.slots.txt These are the only slots that we are using that aren't present verbatim in MIxS 6.2!
|
resuming work in this issue/branch in preparation for a MIxS environmental triad parent (or grouping?) slot, following @aclum's example for |
Just reran
|
I started fixing up some instantiation unit tests, but I think they have been deleted from other branches because they don't test anything that so I skipped one with |
removed two of @eecavanna 's doc tests in I would hope to understand this better and then put them back in |
was adding
|
There are 483 slots in There are 489 lines including a header on Do we really think we're going to use all of those? See If I was going to remove some of them, would I have to go through soemthign like a deprecation r 30 of them are marked required based on 25 of those appeared in the error report above
|
Added Making progress
|
Unfortunately just noticed that the next step: change them to: https://raw.githubusercontent.com/GenomicsStandardsConsortium/mixs/v6.2.0/src/mixs/schema/mixs.yaml |
GenomicsStandardsConsortium/mixs@1da8493 is from Apr 3, 2023 https://github.com/GenomicsStandardsConsortium/mixs/releases/tag/v6.2.0 is from Oct 18, 2023 I updated See also target
|
MIxS enums have all upper snake case names now like |
running |
or the wc -l local/mongo_as_nmdc_database_validation.log.txt
|
still 29820 local/mongo_as_nmdc_database_validation.log ??? |
check submission schema |
currently slightly ahead of MIxS 6.0:
https://raw.githubusercontent.com/microbiomedata/mixs/1da849346a80b717810a02d7c8ed74a22bcd84de/model/schema/mixs.yaml
change to: https://raw.githubusercontent.com/GenomicsStandardsConsortium/mixs/v6.2.0/src/mixs/schema/mixs.yaml
see also
The text was updated successfully, but these errors were encountered: