The SDG CSV data filler is the first script in a pipeline to convert SDG data in CSV format, to CSVW format, which is a W3C Standard.
The script will become part of pipeline which may be integrated into the build scripts for the UK SDG site.
Later it may be integrated into the build scripts for the Open SDG platform, meaning that countries and cities which use the platform may choose to have the CSVW export function on their site.
The script functions as follows:
- It scrapes the UK SDG data repository of the SDG site for links to CSV files
- Downloads the CSV data from the URL.
- It checks settings in the overrides yaml file makes 3 different data transformations unique to any dataset and to each column as follows:
- If parameter 'fill_gaps' is True for the data set it will fill any gaps,
nan
,NaN
orNull
values with the gap filler value for that column - If parameter 'fix_headers' is True it will standardise the headers by replacement. This is currently not used, but may need to be in the future. It is currently set to False
- if parameter 'standardise_cells'is True it will replace any non-standard values specified, and replace them with a standard value, e.g. it may replace 'male', 'Males' and 'M' with the standard value 'Male'.
- It outputs the transformed data in CSV format to a folder called "out"
- Code the main 'entry point' function
- Code for a fix_headers function.
- Code unit tests for each function in modules.py
- Use a Python-github library to get data from github instead of scraping
- suggested to try PyGithub
SDG CSV data filler is under an MIT licence.