-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observatory logsheet errors in the googlesheets to be fixed by HQ #13
Comments
In the end There is clearly something bad about having col R in the water logsheets, sampling tab, being part of the source mat id, as often that column (size_frac_up) is missing values or has NA in there. Sometimes this is not a problem (the ids look odd but are unique) but for many logsheets it is a problem as it created duplicated IDs - and each row MUST have a unique ID. So that needs looking into. For sediment, EMT21 UVIGO has illegal source_mat_ids but these are necessary, so we need to discuss how to change things to accommodate that. Similar problem for water for VB IMEC. Different depths of sampling are creating non-unique IDs |
(OK, I'm just adding this here rather than creating a new Issue.) After the observatory 'sampling' sheets were updated and 'new_sampling' sheets renamed 'sampling' (this occurred on 28th August), there are only 2 'source_material_id's in the run_information sheets of Batch 1 & 2 that do not have matching 'source_mat_id's in the observatory sample sheets: Missing source_mat_id is EMOBON_HCMR-1_Wa_210917_3um_blank row 72 batch 1 Missing source_mat_id is EMOBON_HCMR-1_Wa_210917_0.2um_blank row 75 batch 1 I am going to assume that it they match the "blank1" but this needs to be corrected in the 'replicate' field of the 'sampling' sheets Also notice how the autoformatted "source_mat_id" in the sampling sheet, e.g. 2EMOBON_HCMR-1_Wa_210628_3um_blank1" becomes "EMOBON_HCMR-1_Wa_210628_3um_blank" in the "measured" sheet even though it is copied from the correct cell in the "sampling" sheet - what is going on? |
I think I had created issues for these things that @kmexter is mentioning, individually in each's observatory's repo and @melinalou should have corrected (at least most of them). |
clearly there is still some fixing to do then, as there are ranges and > in many logsheets. |
I can check this and maybe delete the ranges in depth and size_frac_low and up? or in every column with numbers? |
I think for the size_frac it should be clear what to do, but for the ranges - we cannot delete them as we need a value in there, so the observatory has to chose a value |
Yes that's right!I will change only the size_frac. |
Done. |
size_frac should be a range also, @cymon you could add a QC step in your code to check if the values are ok, since size_frac_low should always be lower than size_frac_up (if both numbers exist) |
On Mon, 2 Sept 2024 at 11:06, Christina Pavloudi ***@***.***> wrote:
*size_frac* should be a range
size_frac is a range, e.g 3-200, int dash int, which means it's a string
type
Do you want to throw a validation error if the range is given with a float
e.g. 0.2-3 ? This is effectively the same question as below...
BTW size_frac would be better if auto-formatted on the sheets...
*size_frac_low* and *size_frac_up* should be numbers
The current Updated Definition define these as integers, but are often
mis-specified as a float: so do you want to throw validation error rather
than keep the original floats:
https://github.com/emo-bon/emo-bon-data-validation/blob/main/Batch1and2_combined_logsheets_2024-09-02.csv
also, @cymon <https://github.com/cymon> you could add a QC step in your
code to check if the values are ok, since size_frac_low should always be
lower than size_frac_up (if both numbers exist)
Yes, you can validate values based on the values in other fields: I'll add
this check.
|
size_frac can be read just as a string. it is size_frac_up and _low that we actually use for ena and triples |
On Mon, 2 Sept 2024 at 12:39, Katrina Exter ***@***.***> wrote:
size_frac can be read just as a string. it is size_frac_up and _low that
we actually use for ena and triples
note: if you did not know already, the data types are also specified in
https://github.com/emo-bon/observatory-profile/blob/main/logsheet_schema_extended.csv
-> col3
No, I didn't know that document existed... it'll take types directly from
there rather than trying to interpret the examples in the Updated
Definitions (which sometimes look wrong - but this may just be the way
GoogleSheets is displaying the data).
Message ID: ***@***.***>
--
…___________________________________________
Cymon J. Cox
Senior Researcher
Plant Systematics and Bioinformatics Digital Laboratory
Centro de Ciencias do Mar (CCMAR) - CIMAR-Lab. Assoc.
Mailing address:
CCMAR - Centro de Ciencias do Mar,
Universidade do Algarve
Campus de Gambelas
Edif. 7
8005-139 Faro
Portugal
Phone: +351 289800051 ext 7380
Fax: +351 289800051
Email: ***@***.***
@ccmar <https://ccmar.ualg.pt/users/cymon> | Google Scholar
<https://scholar.google.co.uk/citations?user=f5M7DhkAAAAJ&hl=en&oi=ao> |
Scopus
<http://www.scopus.com/inward/authorDetails.url?authorID=7402112716&partnerID=MN8TOARS>
|
Orcid <http://orcid.org/0000-0002-4927-979X> | CienciaVitae
<https://www.cienciavitae.pt/6B15-9771-1D04>
GPG: Public key on keyserver.ubuntu.com
___________________________________________
|
I was going over all the googlesheets (manually!) to see which ones we could manually harvest and turn into triples (bypassing Bram's QC code until it is fixed, and perhaps using Cymon's output for those identified observatories to create the triples from).
As a result of that, I have comments about QC errors for all stations, and rather than raising separate issues for each observatory, I am putting them here. These errors REALLY need to be fixed by the stations themselves, but EMO BON HQ (whoever that is) will need to coordinate that. Tell them this is urgent! It took me a whole #$%@#$ day to investigate this, the least they can do is fix their logsheets (yes, I am VERY annoyed!).
(Christina and Cymon - assigning you more so that you see this issue rather than because I expect you to do anything, But please, do read all of this text, because there are some questions in there that I would like you to think about/answer.)
In some cases, where I say there are missing values (for mandatory columns), they should ensure that they either put in a value, type in "expected MM-YY", or NA if there will never be a value.
Note: for tidal stages it is not possible to select "expected" or "NA" as this column is created as a drop down. I strong suggest that HCMR go to all logsheets and add 2 more options to all of them: "yet to be measured" and "not available", as there are a few stations that do seems to need these options for this column. Please, can EMOBON HQ confirm here when you have done this?
Sediment
UMF UmU
Sampling tab
Measured tab
ROSKOGO SBR
Samping tab
Measured tab
STHVN MSS
Sampling
Measured tab
OOB Banylus
Sampling
Measured
EMT21 UVIGO
Sampling
Water
EMT21 UVIGO
Sampling
VB IMEC
Sampling
Measured
UMF_UmU
Sampling
STHVN MSS
No value at all....is this station "dead"? I see that we have not harvested this into GH, so I assume this station never happened in the end. In which case, perhaps remove the googlesheet?
ROSKOGO SBR
Sampling
Measured
PiEGetzo UPV/EHU
Sampling
NRMCB SZN
Sampling
MBAL4 MBA
Observatory
Sampling
1.again missing sive_frac_up leading to odd-looking source_mat_id - see e.g. row 10. Can the value be added to col R for row 10, and for all blanks in this sheet? BUT: this is a blank, note, and as I asked above, I wondered if these needed size_fracs at all?
Measured
LMO LnU
Sampling
Measured
IUIEilat1 UIU
Observatory
FYI I think the UIU in the spreadsheet name should be IUI
HCMR-1-UBPC HCMR
Measured
ESC68N UiT
Measured
DOORS BlackSea
No data - should this logsheet be removed from the drive?
BPNS Belgium
Sampling
1.again NAs in size_frac_up but here it is a problem as it makes the source_mat_ids the same - row 13 and 18 for example. This is a NONO. Perhaps we should use size_frac_low in the sample id?
BERGEN UiB
Sampling
AAOT CNR ISMAR
Sampling
Measured
The text was updated successfully, but these errors were encountered: