Enable server-side csv download for all charts #3648

danyx23 · 2024-05-26T09:59:51Z

Context

Our mission is to make the data we need to make progress on the world’s biggest problems accessible. For most of our pro users, accessibility means working with data tools – but getting data from our website to those tools is currently quite annoying.

It’s currently hard to quickly reuse data from a chart in an analytics environment. This is because the current CSV download is created in the browser, making it impossible to offer a single-line code snippet for analytics environments.

Solution

We want to create the ability to download the data for a chart. Technically, this will re-use the infrastructure that we created to render dynamic thumbnails to render a CSV.

The URL scheme should be
https://ourworldindata.org/grapher/life-expectancy.csv - for the data
https://ourworldindata.org/grapher/life-expectancy.metadata.json - for the metadata
https://ourworldindata.org/grapher/life-expectancy.zip - for a zip file of csv, metadata.json and a README.md

and maybe
https://ourworldindata.org/grapher/life-expectancy.xlsx - for an excel file with 3 sheets, one with description text, one with metadata and one with the data

Additionally, we want to offer two download options:

the full data download (the input table in grapher)
and the data as it is used for the current visualization (i.e. filtered to the visible date range, taking the current selection into account, maybe taking other query params into account as well)

This issue is only about creating the CF functions necessary to fetch this data. The UI changes will be handled in the follow-up issue #4015.

We also want to have two options for the column names:

the default should be to use the verbose long names of the indicators
with a flag it should be able to try and use the columnShortName if available. This was introduced with the etl and is supposed to be a column name that is easier to work with in code, if maybe a bit harder to read for a human

Must have

a cf worker that serves the csv data - in full (input table)
a cf worker that serves the csv data - only the visible subset (transformedTable)
a way to switch between long verbose column names and short names
a cf worker that serves the metadata (a sanitized subset of the metadata, roughly what the sources panel shows)
a cf worker that serves a zip file of both the csv and the metadata and a readme. The readme should explain what is in the zip file and contain roughly the information of the sources tab
code samples in the download tab to show how to access the data above in different languages
Only data that we can re-share must be accessible this way (i.e. obey the is_protected flag)

Can have

a cf worker that serves an xlsx file with three tabs (data, metadata, "readme")
store download counters per chart and file type (maybe in D1? Or can we add them to GA? Should we try to add per country?)

Checklist before publishing

Verify that csv download is rejected if nonRedistributable is set
Check the filtered csv for all chart types and see if it makes sense
- Check why some chart types like scatters have "time" multiple times but with always just "time" as the column name
Check that filtered csv with tolerance looks ok
Check that filtered csv with day as year works ok (consider outputting days in ISO format)
Make sure CORS are handled correctly

danyx23 · 2024-10-03T13:15:45Z

When implementing Marwa's new designs make sure to also fix the style leak described here: #3872

danyx23 · 2024-10-03T14:38:29Z

@danyx23 to create a new follow-up issue for Marcel for the next cylce

This PR adds functionality to generate a CSV file on the server in a CF worker for the data of any chart. It also allows downloading a metadata.json file, a readme.md and a zip file of all three of these things. This is in preparation of surfacing these things in the download UI of grapher in the upcoming cycle 2024.6 (#4015). This PR does not make any use of the new CF functions endpoints yet and the download UI is not yet changed. This PR implements #3648 ## Testing To test this, try http get requests against `/grapher/SLUG.zip`, `/grapher/SLUG.csv`, `/grapher/SLUG.metadata.json` at localhost:8788, or at the staging server linked below. There is also an observable notebook that lets you browse the generated readme files in an easier way: https://observablehq.com/d/d410e9b2d2b7c330

github-actions bot added the needs triage label May 26, 2024

danyx23 added priority 2 - important and removed needs triage labels May 26, 2024

danyx23 self-assigned this May 26, 2024

ikesau added site viz and removed site labels Jul 9, 2024

ikesau mentioned this issue Jul 9, 2024

Cycle 2024.4 Friendly Data Catalog #3781

Closed

12 tasks

danyx23 added priority 3 - nice to have and removed priority 2 - important labels Oct 3, 2024

This was referenced Oct 4, 2024

Cycle 2024.6: Refreshed Grapher download UI that uses server side csv generation #4015

Closed

✨ generate csv and zip file server side #3613

Merged

danyx23 linked a pull request Oct 5, 2024 that will close this issue

✨ generate csv and zip file server side #3613

Merged

danyx23 closed this as completed in #3613 Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable server-side csv download for all charts #3648

Enable server-side csv download for all charts #3648

danyx23 commented May 26, 2024 •

edited

Loading

danyx23 commented Oct 3, 2024

danyx23 commented Oct 3, 2024

Enable server-side csv download for all charts #3648

Enable server-side csv download for all charts #3648

Comments

danyx23 commented May 26, 2024 • edited Loading

Context

Solution

Must have

Can have

Checklist before publishing

danyx23 commented Oct 3, 2024

danyx23 commented Oct 3, 2024

danyx23 commented May 26, 2024 •

edited

Loading