Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable server-side csv download for all charts #3648

Closed
11 of 15 tasks
danyx23 opened this issue May 26, 2024 · 2 comments · Fixed by #3613
Closed
11 of 15 tasks

Enable server-side csv download for all charts #3648

danyx23 opened this issue May 26, 2024 · 2 comments · Fixed by #3613

Comments

@danyx23
Copy link
Contributor

danyx23 commented May 26, 2024

Context

Our mission is to make the data we need to make progress on the world’s biggest problems accessible. For most of our pro users, accessibility means working with data tools – but getting data from our website to those tools is currently quite annoying.

It’s currently hard to quickly reuse data from a chart in an analytics environment. This is because the current CSV download is created in the browser, making it impossible to offer a single-line code snippet for analytics environments.

Solution

We want to create the ability to download the data for a chart. Technically, this will re-use the infrastructure that we created to render dynamic thumbnails to render a CSV.

The URL scheme should be
https://ourworldindata.org/grapher/life-expectancy.csv - for the data
https://ourworldindata.org/grapher/life-expectancy.metadata.json - for the metadata
https://ourworldindata.org/grapher/life-expectancy.zip - for a zip file of csv, metadata.json and a README.md

and maybe
https://ourworldindata.org/grapher/life-expectancy.xlsx - for an excel file with 3 sheets, one with description text, one with metadata and one with the data

Additionally, we want to offer two download options:

  • the full data download (the input table in grapher)
  • and the data as it is used for the current visualization (i.e. filtered to the visible date range, taking the current selection into account, maybe taking other query params into account as well)

This issue is only about creating the CF functions necessary to fetch this data. The UI changes will be handled in the follow-up issue #4015.

We also want to have two options for the column names:

  • the default should be to use the verbose long names of the indicators
  • with a flag it should be able to try and use the columnShortName if available. This was introduced with the etl and is supposed to be a column name that is easier to work with in code, if maybe a bit harder to read for a human

Must have

  • a cf worker that serves the csv data - in full (input table)
  • a cf worker that serves the csv data - only the visible subset (transformedTable)
  • a way to switch between long verbose column names and short names
  • a cf worker that serves the metadata (a sanitized subset of the metadata, roughly what the sources panel shows)
  • a cf worker that serves a zip file of both the csv and the metadata and a readme. The readme should explain what is in the zip file and contain roughly the information of the sources tab
  • code samples in the download tab to show how to access the data above in different languages
  • Only data that we can re-share must be accessible this way (i.e. obey the is_protected flag)

Can have

  • a cf worker that serves an xlsx file with three tabs (data, metadata, "readme")
  • store download counters per chart and file type (maybe in D1? Or can we add them to GA? Should we try to add per country?)

Checklist before publishing

  • Verify that csv download is rejected if nonRedistributable is set
  • Check the filtered csv for all chart types and see if it makes sense
    • Check why some chart types like scatters have "time" multiple times but with always just "time" as the column name
  • Check that filtered csv with tolerance looks ok
  • Check that filtered csv with day as year works ok (consider outputting days in ISO format)
  • Make sure CORS are handled correctly
@danyx23
Copy link
Contributor Author

danyx23 commented Oct 3, 2024

When implementing Marwa's new designs make sure to also fix the style leak described here: #3872

@danyx23
Copy link
Contributor Author

danyx23 commented Oct 3, 2024

@danyx23 to create a new follow-up issue for Marcel for the next cylce

@danyx23 danyx23 linked a pull request Oct 5, 2024 that will close this issue
danyx23 added a commit that referenced this issue Oct 10, 2024
This PR adds functionality to generate a CSV file on the server in a CF worker for the data of any chart. It also allows downloading a metadata.json file, a readme.md and a zip file of all three of these things. 

This is in preparation of surfacing these things in the download UI of grapher in the upcoming cycle 2024.6 (#4015). This PR does not make any use of the new CF functions endpoints yet and the download UI is not yet changed. 

This PR implements #3648 

## Testing

To test this, try http get requests against `/grapher/SLUG.zip`, `/grapher/SLUG.csv`, `/grapher/SLUG.metadata.json` at localhost:8788, or at the staging server linked below.

There is also an observable notebook that lets you browse the generated readme files in an easier way: https://observablehq.com/d/d410e9b2d2b7c330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants