Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occurrence cube downloads #1978

Open
MortenHofft opened this issue Nov 4, 2024 · 7 comments
Open

Occurrence cube downloads #1978

MortenHofft opened this issue Nov 4, 2024 · 7 comments

Comments

@MortenHofft
Copy link
Member

MortenHofft commented Nov 4, 2024

ping @peterdesmet and @MattBlissett

I've started the work of adding a new download option

And an SQL ui for downloads

Could you please help evaluate if this is functionally what you had in mind?
Functionally there are 2 things that aren't working

If the functions are as you expected, then we could think about usability (bearing in mind that we will have to reimplement this in the recent future)

  • Do we need articles help pages to refer to?
  • More help texts in the UI?
  • different labels?
  • Extend the general download API with a description field, which we could auto generate (from the form) for more a human readable context.
  • allow comments in the SQL?
  • What is a good name for the download type. And what title?
  • How to make this meaningful with the table we have for other formats to show what you can expect
  • does an about page for SQL editor makes sense. placeholder for now. If so someone should write it
  • ...?
@peterdesmet
Copy link
Member

peterdesmet commented Nov 6, 2024

Nice!! Here's some feedback.

Download page

  • Download type name: Cube rather than SQL cube
  • Coordinates column can be ✔ (if selected). Currently this uses a symbol that is different than others ( not )

Modal

  • Title Download cube is fine

  • Modal help text:

    This download format allows you to aggregate occurrences by their taxonomic, temporal and/or spatial properties. For example, a data cube can be configured to aggregate occurrences by family, month and grid cell of the European Environment Agency reference grid (three dimensions) and count the number of occurrences (a measure) per combination. The result is a CSV file.

    Once configured, a SQL query will be created to generate the data cube. For more advanced use, it is possible to further customize the query by editing the created SQL.

    You can read more about species occurrence cubes here.

  • Dimensions help text:

    A dimension represents an aspect along which data can be aggregated. Selecting a higher resolution (e.g. species over family, date over year, 100 m over 10 km) will result in more categories and therefore more records.

  • Taxonomic dimension help text:

    This dimension aggregates occurrences by their taxonomic rank.

  • Temporal dimension help text:

    This dimension aggregates occurrences by time.

  • Spatial dimension help text:

    This dimension aggregates occurrences in a spatial grid.

  • Spatial resolution help text:

    The size of each grid cell.

  • I would update the values and order for Spatial resolution + add sections:

Global
- Military grid reference systems (MGRS)
- Extended quarter degree grid (QDGC)
- ISEA3H grid
Europe
- EEA reference grid
  • The values for Spatial resolution should probably be updated slightly. For one, I think that the spatial resolution for MGRS is in meters.

  • EEA spatial resolution: add a space between value and unit (1 km not 1km etc.)

  • MGRS spatial resolution: move finest resolution to top of dropdown, so values are in order. I don't know what this value entails though (I thought 1 m was the finest)

  • Can we have better spatial resolution labels than Level 0-6 for EXTENDED_QUARTER_DEGREE_GRID? E.g. name them after how big they are in degrees. Input from @MattBlissett needed.

  • Can we have better spatial resolution labels than Level 0-22 for ISEA3H_GRID? E.g. name them after how big they are (in meters?). Input from @MattBlissett needed.

  • Randomize points within uncertainty circle help text:

    For occurrence records with a coordinate uncertainty that covers more than one grid cell, should a random cell be chosen? If no is chosen, then the cell containing the centroid of the record is used.

  • Labels are good, would use regular case for Randomize points within uncertainty circle

  • Rename Measurements to Measures

  • Measures help text:

    A calculated quantitative value for each combination of dimensions.

  • Currently unclear that occurrence count is always included as a measurement. How to best indicate this?

  • Occurrence count (always included) help text:

    The number of occurrences.

  • Occurrence count at higher taxonomic level help text:

    Additional higher taxonomic ranks for which the number of occurrences should also be included. Useful to assert sampling bias.

  • Include minimum coordinate uncertainty help text:

    The lowest recorded coordinate uncertainty (in meters). Useful to assert the spatial precision of the data.

  • Include minimum temporal uncertainty help text:

    The lowest recorded temporal uncertainty (in seconds). Useful to assert the temporal precision of the data.

SQL editor

  • I think comments in SQL are fine to me, not sure if they are retained by the query string though
  • Update help text at bottom to (no then):

The easiest way to download and explore data is via the occurrence search user interface. But for complex queries and aggregations, the SQL editor provides more freedom.

@MortenHofft
Copy link
Member Author

Thanks @peterdesmet

On the SQL editor, I would include the link to the occurrence search in the text, rather than a button:

I agree it is nicer, it is only because it is makes life easier for translators. Having them write markdown with variables have caused issues in the past.

@MortenHofft
Copy link
Member Author

MortenHofft commented Nov 11, 2024

The values for Spatial resolution should probably be updated slightly. For one, I think that the spatial resolution for MGRS is in meters.

Yeah I know those are wrong. I'm waiting for you or Matt to tell me what they should be please. I've changed the MGRS as you specified above

EXTENDED_QUARTER_DEGREE_GRID should be?
ISEA3H_GRID should be?

@MortenHofft
Copy link
Member Author

MortenHofft commented Nov 11, 2024

I've added mock help texts to all fields and added 2 mock articles (one for sql download and one for cubes).

Help texts
If someone with better english skills and understanding can correct the help texts that would be great. Alternatively I can also try my best, it is just a type of thing that takes me forever. If you believe some fields are self explanatory, then let me know and I can remove the help text.

  • review help texts
  • Style help texts a bit, the amount of help texts make it all a bit bland I find. More spacing would probably help.

Articles
For the articles: then someone needs to write them if we still want them.
https://www.gbif-uat.org/occurrence-cubes
https://www.gbif-uat.org/occurrence/download/sql#about

  • write tool text
  • write cube article

Known API bugs

  • the field naming for order is different between environments
  • downloads do not work in UAT, not sure about other env

Download pages
Arriving at a download page is confusing if you come from a cube download format. You configured a cube via a UI, and then arrive at an SQL string. It is a requirement to display this better. One way about it could be to add a new feature to downloads generally.

  • API option to attach a human readable description of a download when doing a download.
  • Auto generate a human readable description for cube downloads.
  • Always show available descriptions on download pages.

That is just one idea. Other ideas for how to make the transition easier for users are welcome

Other

  •  review grid resolution translation names
  • Ask coms and data products to provide feedback, refine styling and text.

@timrobertson100
Copy link
Member

Thanks. I think the text helps in guiding the user.

I think adding the ability to give it a human readable type / description would be good. Alternatively, we could introduce a cube download format in the API itself, which takes the form parameters but does the SQL conversion behind the backend API. The reason to do that, would be to display to the parameters used on the download page which is shown from the DOI. A user could still "open this in the SQL builder" before submitting to do more complicated queries, but it'd hide SQL completely for anyone who didn't. I don't know what would be the more scalable option.

@peterdesmet
Copy link
Member

  1. @MortenHofft I have reviewed the modal and the help text. See my updated Occurrence cube downloads #1978 (comment). Two inputs from @MattBlissett needed.

  2. Is the functionality ready for testing?

  3. Do we need a separate (stable) help page at https://www.gbif-uat.org/occurrence-cubes describing the functionality or is it sufficient to refer to https://techdocs.gbif.org/en/data-use/data-cubes? To be assed by communication team.

  4. @timrobertson100 having an endpoint "which takes the form parameters" would indeed be better documentation of the dimensions in the recorded metadata, which in turn would help conversion to e.g. EBV Cubes. Just having the SQL statement doesn't tell us anything about the dimensions that were selected, since the columns can be named however they want.

@timrobertson100
Copy link
Member

On 4. please see our proposed approach here. A JSON object (the context) would hold the submitted form parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants