A general template for SDD/PDH projects, incorporating some good practices for github based development.
To use this template create a new repository using this repository as a template. See in the top right corner of this page the green button "Use this template". Click on it and follow the instructions. This will create a new repository with the same structure as this one. Then clone the new repository to your local machine and start working on your project.
The code is functional. In src/script.R
you can find a usage example, where we compare the staging and production version for a variety of tables.
- The current version depends on comparing table across different instances of .Stat (e.g., base and new data version can be reached through different .stat urls) rather than different spaces (i.e., validate and disseminate). This is possible to achieve by changing the agency field.
- The current version performs
{|dataflows| *} |indicators| * |geographies|
calls, which is a lot if you are trying to compare many big, dense, dataflows. It can be improved by reducing the nummber performing the groupings at a second stage (eventually, it can be brought down to{|dataflows|}
API calls). - Changes in DSD schema are not handled. And I suspect they won't be handled that nicely if the dimensions between base and new data updates are different.
- it might be nice to offer the possibility of generating directly the
.pdf
or.md
versions of the diff tables. This should be possible thanks to{kblExtra}
but Windows is not playing nicely.
There are four main folders in this repository:
docs
: Contains the documentation of the project.src
: Contains the source code of the project.raw_data
: Contains temporary local copies of the raw data used in the project. This folder won't be uploaded to the repository.output
: Contains the temporary output files generated by the project (png, pdfs, small data units). This folder won't be uploaded to the repository.
The .gitignore
file is configured to ignore the most common development temporary files for Python, R, and Stata. It also ignore most file formats in the /temp/
subdirectories.