Skip to content

When we perform big data updates, we need a way to report the changes at indicator levels, comparing before and after. Prototype code trying to solve this problem.

Notifications You must be signed in to change notification settings

PacificCommunity/dotstat-compare-tables

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDH SDD GitHub's template

A general template for SDD/PDH projects, incorporating some good practices for github based development.

Usage

To use this template create a new repository using this repository as a template. See in the top right corner of this page the green button "Use this template". Click on it and follow the instructions. This will create a new repository with the same structure as this one. Then clone the new repository to your local machine and start working on your project.

Current status

The code is functional. In src/script.R you can find a usage example, where we compare the staging and production version for a variety of tables.

known limitations

  • The current version depends on comparing table across different instances of .Stat (e.g., base and new data version can be reached through different .stat urls) rather than different spaces (i.e., validate and disseminate). This is possible to achieve by changing the agency field.
  • The current version performs {|dataflows| *} |indicators| * |geographies| calls, which is a lot if you are trying to compare many big, dense, dataflows. It can be improved by reducing the nummber performing the groupings at a second stage (eventually, it can be brought down to {|dataflows|}API calls).
  • Changes in DSD schema are not handled. And I suspect they won't be handled that nicely if the dimensions between base and new data updates are different.
  • it might be nice to offer the possibility of generating directly the .pdfor .md versions of the diff tables. This should be possible thanks to {kblExtra} but Windows is not playing nicely.

Folder structure

There are four main folders in this repository:

  • docs: Contains the documentation of the project.
  • src: Contains the source code of the project.
  • raw_data: Contains temporary local copies of the raw data used in the project. This folder won't be uploaded to the repository.
  • output: Contains the temporary output files generated by the project (png, pdfs, small data units). This folder won't be uploaded to the repository.

gitignore

The .gitignore file is configured to ignore the most common development temporary files for Python, R, and Stata. It also ignore most file formats in the /temp/ subdirectories.

About

When we perform big data updates, we need a way to report the changes at indicator levels, comparing before and after. Prototype code trying to solve this problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages