Skip to content

R scripts for tidying income tax data sourced from Statistics Canada

License

Notifications You must be signed in to change notification settings

bcgov/statscan-taxdata-tidying

Repository files navigation

img License

statscan-taxdata-tidying

A set of R scripts to read, tidy, merge & write aggregated annual Statistics Canada income tax data for British Columbia.

The scripts in this repository tidy aggregated annual Statistics Canada data similar to 'Tax filers and dependants with income by source of income' Table: 11-10-0007-01. The aggregated annual data are provided as sheets in .xls format under the Statistics Canada Open Licence, one .xls file per year for aggregated individual tables and aggregated family tables. The Technical Reference Guide for the Annual Income Estimates for Census Families, Individuals and Seniors is available on the Statistics Canada website: T1 Family File, Final Estimates, 2016.

Usage

Raw Data

The source .xls files per year (or per table in the case of Individual Table 13) must be manually placed in the appropriate subfolders: /data-raw/fam, /data-raw/ind, and /data-raw/ind13. Table 13 is being handled individually to accomodate a diversity of data structures.

Code

There are three core scripts, one for each table type (Individuals, Families and Individual Table 13):

  • fam-clean.R
  • ind-clean.R
  • ind-clean-13.R

The run-all.R script should be sourceed to run the scripts all at once. The setup.R and functions.R scripts in the /R folder are sourced programatically.

All packages used in the analysis can be installed from CRAN using install.packages().

Tidy Data Outputs

Tidied .CSV equivalent files for each Table—individuals and families—are written to the /data-output folder.

Testing

The test scripts in the /test subfolder can be used to test the integrity of the data after tidying. These scripts contain code that compares the tidied outputs for individuals and families with the raw data, ensuring data cleanup does not the change original files.

Getting Help or Reporting an Issue

To report bugs/issues/feature requests, please file an issue.

How to Contribute

If you would like to contribute, please see our CONTRIBUTING guidelines.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License

Copyright 2019 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.

This repository is maintained by Data Science Partnerships Team (OCIO) .


This project was created using the bcgovr package.

About

R scripts for tidying income tax data sourced from Statistics Canada

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages