Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with ranges in the logsheets #21

Open
kmexter opened this issue Sep 25, 2024 · 10 comments
Open

dealing with ranges in the logsheets #21

kmexter opened this issue Sep 25, 2024 · 10 comments
Assignees

Comments

@kmexter
Copy link
Contributor

kmexter commented Sep 25, 2024

We have discussed this in other issues but moving here to reduce the confusion
There are some values in the logsheets that are entered as a <, >, or - (range)

For the size fractions (size_frac_up and low) this is just wrong and should be flagged as such in the QC. There is an upper and a lower value, and if one of them is not known (as it did not exist, so to speak), then the value in that cell should be either NA or blank.
--> @kmexter make sure that this is not only flagged in the QC, but also transformed into just a value AND that any NA or 0 values are given a "no value for this" in the triples. @laurianvm may be able to advise on this.

For depth, we will take the ranges given and translate them into an upper and lower limit, and if we do get a < or > we will do the same, but only inthe TTL files - in the transformed logsheets these will continue to be just the string 3-4 or <10 or whatever. To be consistent over all stations, we will make all depths a max and a min. @laurianvm can you take care of this in your template - if not, let's chat

For the enviro measurements: at the next harvest I will ask that this is flagged specifically and so I will see if there are any like this (was it my imagination that I saw this?). If there are, we will deal with that in the ttl file only, by making that a "max" value

@laurianvm
Copy link
Contributor

laurianvm commented Oct 16, 2024

'<' --> max depth
'>' --> min depth
3-4 --> min depth = 3 and max depth = 4
3 --> min depth = 3 and max depth = 3 (to be consistent!)

change ontology and templates to include this
goal = consistent description of depth across all data

(not for the depth in observatory logsheets/data/templates - rather sampling tab)

@laurianvm
Copy link
Contributor

@kmexter will check whether there are any max/min occurring in measurments

@kmexter
Copy link
Contributor Author

kmexter commented Oct 23, 2024

For the QC there is a check that depth is < dpeth max, for this the check will either not check it (treat as a string float - float) or will check the max depth against the value from observatory tab.
@cpavloud for ENA I think you said that depth has to be one values - can you remind me was that to be the min, the max, or the average?

@kmexter
Copy link
Contributor Author

kmexter commented Oct 23, 2024

@kmexter to check if this is also the case for measurements, if so they have to be strings.
In fact, if there any ranges in measurements, that should be an error - @cpavloud ?
But also check if there are max/min values

@cpavloud
Copy link

I think it's more appropriate if we keep the maximum depth for the ENA submission.

@kmexter
Copy link
Contributor Author

kmexter commented Oct 23, 2024

Discussed with @bulricht: this is now to be treated as a string, however as there is a QC check for this against the tot_depth_water_col in the observatory tab of each logsheet, this will have to be done in a new way:
take the string. if it has a - is it, then assume it is formatted as float - float. take the larger of the two floats. This is the what you should check against max depth and is also what needs to get into the ENA XML files.

@kmexter
Copy link
Contributor Author

kmexter commented Oct 23, 2024

@laurianvm if you have finished your part of this issue, can you say so here so I know that it is only Bram's part left to do?

@kmexter
Copy link
Contributor Author

kmexter commented Oct 24, 2024

Hmm. @laurianvm :-} we think it is better that rather than you transforming these depths into a max and min, it is better that it is done to the transformed loghsheets - we create 2 new columns, depth_min and depth_max. That may mean you backpeddling on some template changes? We think that because it is best to do the least amount of data changes in the templates as possible.
I wonder tho how I would indicate this in the logsheet_schema_extended, as these will be one column coming in and two going out, with different vocabulary terms (so they have to be in two rows in that file). @bulricht can you advise?

@kmexter
Copy link
Contributor Author

kmexter commented Nov 27, 2024

@kmexter to find new BODC terms for these for logsheets transformed

@kmexter
Copy link
Contributor Author

kmexter commented Nov 27, 2024

@kmexter raise new issue to say how to do this when some entries are a range and some are single values
and change how it is in logsheet schema - need to change the way the logsheet schema extended is written as should not be a xsd:string or float but just "xsd:float" or "range" and bram's code has to understand that

I should double check if there are any unbounded ranges, and if so needs to be a range instead (0-5 = < 5 and 5-maxdepth - >5)

And tell the stations tell them how to indicate a range: 3-5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants