Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding the ARMS logsheets #20

Open
kmexter opened this issue Sep 5, 2024 · 33 comments
Open

adding the ARMS logsheets #20

kmexter opened this issue Sep 5, 2024 · 33 comments
Assignees

Comments

@kmexter
Copy link
Contributor

kmexter commented Sep 5, 2024

I think the template ARMS logsheet is finished
https://docs.google.com/spreadsheets/d/1gGOmCKFP3LXlTGFRXDzWe4aNb8NDp_hRIyhCrFJkb5w/edit?gid=295305945#gid=295305945

if everyone agrees it is OK, then we need to copy the changes to the others and get data added into them

@cpavloud and @melanthia can you each say "yes" when you agree the template one is ready?
Who will fix the others and add data to them/get data added to them?

@cpavloud
Copy link

cpavloud commented Sep 7, 2024

It cannot be a "mimarks-specimen" because there is no ARMS-mimarks sheet (no ARMS env_package). So, we should either choose another checklist to use or use the general ENA default checklist and add (most of) our extra terms as custom fields. We could alternatively use the GSC MIxS miscellaneous natural or artificial environment checklist (it would be much better than using the water or the sediment one.

Also, the preservative technically is DESS and not DMSO. They are two different things (and people are already confused about this, they have used the terms interchangeably).

Also, I have had a comment about using "unidentified" as a taxon, it's not correct (it's not the best thing to use).
I have proposed to change it with "metagenome".

Also, I also don't think that "Metazoa" is the correct taxon for the actual samples. For the sessile fractions, it should be one of the metagenomes, e.g. aquatic metagenome or biofilm metagenome or marine metagenome. For the motile fractions, it could be marine plankton metagenome, as it is for the water checklists.

@kmexter
Copy link
Contributor Author

kmexter commented Sep 9, 2024

to be disussed? not sure I am the right person to make the decisions here

@cpavloud
Copy link

cpavloud commented Sep 9, 2024

Maybe we should discuss it with Matthias?

@kmexter
Copy link
Contributor Author

kmexter commented Sep 10, 2024

yup. he is not on this gh tho...

@cpavloud
Copy link

I think the next ENVO release will have the ARMS terms we had requested :)
EnvironmentOntology/envo#1402

@kmexter
Copy link
Contributor Author

kmexter commented Sep 24, 2024

@cpavloud I looked at that issue, but it does not say what the URL is to the term and I cannot find it via https://www.ebi.ac.uk/ols4/ontologies/envo. Do you know the URL I can add to the logsheets metadata file?

@cpavloud
Copy link

I don't know if a URL exists yet.
From what I can tell, the terms will appear in the next ENVO release.

@kmexter
Copy link
Contributor Author

kmexter commented Sep 24, 2024

Ok, will raise a new issue to keep an eye on that, since it should not block the resolution of this issue

@kmexter
Copy link
Contributor Author

kmexter commented Sep 25, 2024

It cannot be a "mimarks-specimen" because there is no ARMS-mimarks sheet (no ARMS env_package). So, we should either choose another checklist to use or use the general ENA default checklist and add (most of) our extra terms as custom fields. We could alternatively use the GSC MIxS miscellaneous natural or artificial environment checklist (it would be much better than using the water or the sediment one.

I would go for the general one and add: if you agree I will raise a new issue to sort out what to add to that.
So sampl_description would be "EMO BON ARMS hard bottom sample from station VH1. ARMS unit deployed on 2021-06-08 an dcollected on 2022-06-08. Size fraction SF 40 um." right?

Also, the preservative technically is DESS and not DMSO. They are two different things (and people are already confused about this, they have used the terms interchangeably).

I see no problem with saying DESS here instead of DMSO. If no-one disagrees, I will change this.

Also, I have had a comment about using "unidentified" as a taxon, it's not correct (it's not the best thing to use). I have proposed to change it with "metagenome".

I see no problem with this - if no-one disagrees, I will change this

Also, I also don't think that "Metazoa" is the correct taxon for the actual samples. For the sessile fractions, it should be one of the metagenomes, e.g. aquatic metagenome or biofilm metagenome or marine metagenome. For the motile fractions, it could be marine plankton metagenome, as it is for the water checklists.

I think the metazoa word is the translation of the tax_id given (33208). To follow what is done in the water and sediment, we should instead change this to "metagenome" which is what 256318 is. Is that OK? Or should we use "marine metagenome" in both the tax_id and the scientific_name column?

@cpavloud
Copy link

I would prefer using the GSC MIxS miscellaneous natural or artificial environment checklist instead of the ENA default checklist.
There are differences between them which will require different handling from our part. E.g. in the ENA default checklist there is one column for lat_lon whereas in the GSC MIxS miscellaneous natural or artificial environment checklist there are two columns, one for latitude and one for longitude (as in the MIxS water and in the MIxS sediment). So data management wise, it would probably be easier to go for a MIxS checklist.

Sure, we can use "metagenome" instead of "Metazoa" in the scientific_name column. And in the tax_id column we can use "256318" instead of "33208".

@kmexter
Copy link
Contributor Author

kmexter commented Sep 26, 2024

OK, I will make the suggested changes to the template and then pass this on to HCMR to move those changes out to the rest of the logsheets.
I will raise a new issue on us all on creating our chosen ENA checklist for the ARMS units

@kmexter
Copy link
Contributor Author

kmexter commented Sep 26, 2024

Ah @cpavloud I see the "unidentified" for the blanks, which is the same approach taken for water and sediment
I agree that this is not a good one, but I suspect that there is no better tax_id. Ditto, will we fill out the same checklist in ENA for blanks - if no, then column investigation_type would have to change also. But we can discuss what to do with this later - in some ways, does it matter? - but in any case, what we do for the ARMS ones we would need to do with the wa and so also, so that can be a new issue. Feel free to raise a new issue if you want.

@cpavloud
Copy link

The term "metagenome" is a much better one.
This is the one I have used for the samples I already submitted to ENA. I had created an issue about this, so that it could be corrected to all the logsheets... See this.
"unidentified" is a very general one and it's not true actually.

We don't need to change the column "investigation_type". This is a different thing.

@kmexter
Copy link
Contributor Author

kmexter commented Sep 26, 2024

Yes, I realised I had incorrectly changed that column - I now created a new template (as it was the only way to remove the old tabs) - so check out https://docs.google.com/spreadsheets/d/19YKWWezII-MLPPMYC3u9ELEwszw75U88nZwJ0vh5PRQ/edit?gid=1737828808#gid=1737828808. Should the investigation type be "metagenome" or "mimarks-specimen"

@kmexter
Copy link
Contributor Author

kmexter commented Sep 26, 2024

The term "metagenome" is a much better one. This is the one I have used for the samples I already submitted to ENA. I had created an issue about this, so that it could be corrected to all the logsheets... See this. "unidentified" is a very general one and it's not true actually.

ok I see that, however that means we have metagenome and tax_id 256318 for the blanks and the actual samples for the ARMS. Just checking that this is OK?

@cpavloud
Copy link

Yeah, there is no reason for the blanks to have a different "tax_id" and "scientific_name". It's fine if everything is "metagenome".

The investigation type should be "metagenome", yes.

@kmexter
Copy link
Contributor Author

kmexter commented Sep 26, 2024

OK, @melanthia and @melinalou the ARMS template logsheet - which you can find on https://docs.google.com/spreadsheets/d/19YKWWezII-MLPPMYC3u9ELEwszw75U88nZwJ0vh5PRQ/edit?gid=1737828808#gid=1737828808 - is ready to be turned into station logsheets

The template has numerous changes in it that need to be transmitted to the other ones, and I suggest that the easiest way to do this is simply to copy the template into new ones with the names of the ARMS stations.
Now, there are already logsheets for all the stations in the googledrive https://drive.google.com/drive/folders/1rpHNzvrVoRKhFGbeeAH9hVeuDsHaZsyR but they are now wrong. I suggest you move those all to an "old" subfolder, so you still have them and then you create new ones from the new template in the main folder. Then you will have to go to the "old" ones and copy over the entries that are in there into the new logsheets. For most stations this is just a few new sample IDs in column 1 of the sampling tab.
TZS_UHels
Toralla_UVigo
Svalbald_IOPAN
ROSKOGO_SBR
Piran_Slovenia
PiEGetxo_PiE
Italy_UniBologna
GURR_UGalway
Gdnia_UGdansk
ARMS_Eilat_IUI
Denmark_AU
Crete_HCMR
Bodo_NU
BAS_UK

So it is not a lot of work. However, there are two stations with quite few entries in them
BelgianCoast_VLIZ
Koster_UGoth

Remember that you cannot just copy-paste, because we made changes to the observatory and sampling tab in the template that means you need to compared carefully the new rows for these 2 stations with the example entries in the template logsheet (the rows in yellow).

Once the new logsheets have been made and filled in the way I have described here, someone then needs to then add to those logsheets the information that already exists in PlutoF and the ARMS-MBON googlesheet. You EMOBON/HCMR guys will have to decide who that will be - you, Matthias, or the stations themselves. (You can tell them that until those logsheets are filled in, the ARMS data will not be part of the EMO BON collection :-D). Good luck!

@melinalou
Copy link

question:
if the extra entries in source_mat_id_orig seems like this "EMO BON Toralla JTA ARMS 210701 211009 SF (1)" do we need to convert it into "EMO_BON_Toralla_JTA_ARMS_210701_211009_SF_1" or we keep it as it is in the original format?

@kmexter
Copy link
Contributor Author

kmexter commented Sep 30, 2024

the original material sample ID needs to be kept, because presumably that is what is written on the sample labels. So do not change what is in source_mat_id_orig. The equation in source_mat_id should be sufficient to produce the correct sample ID, you should not have to do anything there except drag-and-drop the cell down to activate the equation

@melinalou
Copy link

ok thank you!

@melinalou
Copy link

All done, can you please confirm that they are correct?
I've made a folder with the old files and the others out of the folder are the new https://drive.google.com/drive/folders/1rpHNzvrVoRKhFGbeeAH9hVeuDsHaZsyR

@kmexter
Copy link
Contributor Author

kmexter commented Oct 4, 2024

I looked at them briefly as I don't have the time to check them all
Why is col O in green in all of them? Probably a copy-paste thing as it template one (which you should move out of the old folder so it is with the rest of the new ones)?
As with another issue raised, please also colour the mandatory columns (title row)

There is probably quite a bit of metadata in the ARMS overview googlesheet and PlutoF for these stations, but someone who knows these should fill in the metadata from there into these logsheets. @JustinePa has the knowledge, but I am not sure if she has the time. If not, Justine, could you ask Matthias to do it?
....Remembering that we have to stick to the EMOBON IDs now, not the ARMS one, so that will mean you will have to translate the IDs yourself. @cpavloud has the IDs for the emo bon ARMS samples in her omics spreadsheets (I don't know where they are), so perhaps between you, you can do this? We could then add ARMS to the EMOBON harvesting process.

@JustinePa
Copy link

Hi,
I can try and do it today and/or on Monday.
For the IDs, I would then use the ones in column D here. @cpavloud can you confirm?
And just one thing I'm still unsure of, are the IDs of non-EMOBON observatories "ARMS_(...)" or "EMOBON_(...)"?

@kmexter
Copy link
Contributor Author

kmexter commented Oct 4, 2024

any observatory that is in an emo bon logsheets gets and emobon id
you don't have to do it today - when you are back is also fine

@JustinePa
Copy link

Okay great, then I'll need to correct the column I mentioned.
Will let you know when I'm done with this!

@melinalou
Copy link

I looked at them briefly as I don't have the time to check them all Why is col O in green in all of them? Probably a copy-paste thing as it template one (which you should move out of the old folder so it is with the rest of the new ones)? As with another issue raised, please also colour the mandatory columns (title row)

There is probably quite a bit of metadata in the ARMS overview googlesheet and PlutoF for these stations, but someone who knows these should fill in the metadata from there into these logsheets. @JustinePa has the knowledge, but I am not sure if she has the time. If not, Justine, could you ask Matthias to do it? ....Remembering that we have to stick to the EMOBON IDs now, not the ARMS one, so that will mean you will have to translate the IDs yourself. @cpavloud has the IDs for the emo bon ARMS samples in her omics spreadsheets (I don't know where they are), so perhaps between you, you can do this? We could then add ARMS to the EMOBON harvesting process.

done!

@JustinePa
Copy link

Hi!

I've filled up the logsheets as I could with info from PlutoF and the ARMS GoogleSheet. There are still some fields that should probably be completed by the providers.

Also, I have a few observations/questions:

  1. Some logsheets have the field "obs_id" in column B (and not column U as the others). Also, those do not have the field "ha_id". These logsheets are:
    BAS_UK
    BelgianCoast_VLIZ
    Bodo_NU
    Crete_HCMR
    Italy_UniBologna
    (this makes the automatic filling of the field "samp description" unreliable.

  2. Some logsheets are combining different observatories (or at least observatories in the sense of ARMS-MBON), meaning there are different obs_id in one logsheet: is that normal?
    For example: Denmark (observatories Nuuk, Daneborg, Laeso, Limfjord) or Italy (observatories AdriaticEastCoast, RavennaHarbour, RavennaMarina)

  3. Should the observatory IDs be the ones found in here, for the ones present in the table? And those IDs are then used in source_mat_id and sampling_event?

@cpavloud could you check and answer these?

@kmexter
Copy link
Contributor Author

kmexter commented Oct 28, 2024

  1. that is a mistake that will need to be corrected by @melinalou -> Can you check, melina, that the columns obs_id and ARMS_unit_id are always in the same place for all observatories? I don't actually mind where they go - given that we have different obs_id for the same urm....observatory owners....they being in columns B and C is fine for me.

we need to find another naming convention -> "observatory" is what we have used in ARMS-MBON for a grouping of units in a place with a unique habitat/location but "observatory" is also the name we are using for the "owners" of the activies. Perhaps we need to change the word "observatory" to something else for ARMS? @cpavloud (I know you are on vacation right now - it is ok to comment when you get back), @melanthia, thoughts?

...that leads to point 3
3) yes, damm, you are right - we need to change this, both the file and potentially the name we use for the source_mat_ids. So for example, for Italy_UniBologna (the name of the observatory as written into the ARMS logsheet) is but in that observatories file (https://github.com/emo-bon/governance-data/blob/main/observatories.csv) there is no observatory called that, however the Italy observatory called AAOT is marked as having hard_subtrates so I am guess this is the same "observatory" as Italy_UniBologna. So in this case we would either have to change the observatory (the "italy_unibologna") to "AAOT" or we have to add new observatories to that file that are just for hard_substrate from italy. And ditto for the other hard substrate observatories. Both are OK for me, @melanthia and @cpavloud @JustinePa any preferences?

  1. and back to point 2: hmm, yes, this is hard to get around, as one "owner" (aka emobon "observatory") can have multiple ARMS-unit-groupings (aka ARMS-MBON "observatories"). As it is on the observatory tab of the logsheets is fine, HOWEVER we need to make the link between the rows in the sampling tab to the obs_id+ARMS_unit_id more explicit. At present this is only embedded in the event_id, but as that ID is made manually, I think it is prone to error. I think we need to add a new column that is "expanded_unit_id" which would be e.g. AdriaticEastCoast_Ven1 in the sampling tab, and then we can see if we can make the colummns sampling_event be formed via an equation. Thoughts?

@melinalou
Copy link

  1. that is a mistake that will need to be corrected by @melinalou -> Can you check, melina, that the columns obs_id and ARMS_unit_id are always in the same place for all observatories? I don't actually mind where they go - given that we have different obs_id for the same urm....observatory owners....they being in columns B and C is fine for me.

we need to find another naming convention -> "observatory" is what we have used in ARMS-MBON for a grouping of units in a place with a unique habitat/location but "observatory" is also the name we are using for the "owners" of the activies. Perhaps we need to change the word "observatory" to something else for ARMS? @cpavloud (I know you are on vacation right now - it is ok to comment when you get back), @melanthia, thoughts?

...that leads to point 3 3) yes, damm, you are right - we need to change this, both the file and potentially the name we use for the source_mat_ids. So for example, for Italy_UniBologna (the name of the observatory as written into the ARMS logsheet) is but in that observatories file (https://github.com/emo-bon/governance-data/blob/main/observatories.csv) there is no observatory called that, however the Italy observatory called AAOT is marked as having hard_subtrates so I am guess this is the same "observatory" as Italy_UniBologna. So in this case we would either have to change the observatory (the "italy_unibologna") to "AAOT" or we have to add new observatories to that file that are just for hard_substrate from italy. And ditto for the other hard substrate observatories. Both are OK for me, @melanthia and @cpavloud @JustinePa any preferences?

  1. and back to point 2: hmm, yes, this is hard to get around, as one "owner" (aka emobon "observatory") can have multiple ARMS-unit-groupings (aka ARMS-MBON "observatories"). As it is on the observatory tab of the logsheets is fine, HOWEVER we need to make the link between the rows in the sampling tab to the obs_id+ARMS_unit_id more explicit. At present this is only embedded in the event_id, but as that ID is made manually, I think it is prone to error. I think we need to add a new column that is "expanded_unit_id" which would be e.g. AdriaticEastCoast_Ven1 in the sampling tab, and then we can see if we can make the colummns sampling_event be formed via an equation. Thoughts?

I will do it by tomorrow!

@kmexter
Copy link
Contributor Author

kmexter commented Oct 29, 2024

Well, do point 1 yes please, but points 2 and 3 need discussion first!

@melinalou
Copy link

1 done!

@JustinePa
Copy link

  1. I think creating "expanded_unit_id" is a good idea. Couldn't we also make the "event_id" with an equation then?

  2. Choosing between your two suggestions really depends on this new definition/naming convention we'll decide for "observatory". For example, in the case of Italy_UniBologna, I don't know if AAOT is supposed to encompass the three "observatories" AdriaticEastCoast, RavennaHarbour and RavennaMarina. Those are not that far from each other, so maybe they could be grouped together, but this does not work for the Dannish ones for example. This comes back to the question "what makes an observatory?"

@kmexter
Copy link
Contributor Author

kmexter commented Oct 30, 2024

  1. If others agree on a new column, I will look to see what one can do with equations
  2. needs a larger discussion with @melanthia and @cpavloud and @isanti as I don't know enough about the different emo bon partners to know how they should be grouped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants