-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use parentEventID if not necessary #8
Comments
I think there needs to be a way to group events per animal which does not involve parsing character delimited strings. Also, I think encoding hierarchy in strings makes it harder to check referential integrity. Of course any |
@peterdesmet I'm interested to understand why you'd like to avoid hierarchical events? They seem to offer a lot of flexibility. I get that they might be difficult to interpret/ingest from system to system. |
I deleted parentEventID from my event table. So now, the only place where unique animals are identified in organismName in the occurrence table. But I agree with @pieterprovoost's point: I think something is missing here and the event table (the closest to a 'summary' table) needs to define unique individuals somewhere. Otherwise it is more difficult to check the data or compile it into a data frame, and there is a good chance of confusion, e.g. that deployments get confused with individuals, or the user doesn't notice that multiple records are about the same individual. I have a similar concern with FOM records that don't include a unique animal identifier and have no associated occurrenceID (measurements not taken at the same time as a GPS fix). However I don't see any other good place to define individuals in the event table. Happy for any ideas! |
Using string parsing to define relations gives me ER nightmares. But complicated hierarchies won't be universally handled well. I think we should use parentEventID to tie together all of the events and occurrence records, but make a recommendation to use it for a simple parent-child relationship with no further levels. |
IMO matching measurements to occurrences is not a good solution. (1) There are bio-logging datasets with no occurrences at all except the capture events (e.g. datasets of light level, conductivity and temperature). (2) I really doubt there is one good method for doing this, it will depend on sensor sampling schedules, species/habitat and analysis question. We are unnecessarily processing the data in ways that are not necessarily biologically meaningful and might confuse interpretation. For now I'll add parentEventID back in. |
I'm wondering if looking at this from different user points of view might help? Can we think of some different users and work back from there / make sure the data will be presented to them in a way that's most useful? User 1: General GBIF/OBIS user. Just wants to know where individuals of a species occur in space. Most important that they understand all of these occurrences are the same individual. Are there other users we can think of? Who will be using the acceleration data? |
What @pieterprovoost suggests is probably the best way, but it feels like trying to fit a square peg in a round hole. "Organism" is a concept in Darwin Core, it's just not a "core" file now. Using the Event Core concept has the advantage that GBIF/OBIS can currently handle that data, but it might be good to do the exercise in how we would express biologging data - reusing Darwin Core terms - if we had more freedom in how to structure it. /cc @timrobertson100 |
In "Mahoney-data-DwC-A-test-2" I noticed that
parentEventID
is populated with the animalID:I would not do this:
eventID
is written, e.g.F53:capture1
The text was updated successfully, but these errors were encountered: