Skip to content

Data Generation and Visualization

Mary Richards edited this page Sep 13, 2023 · 14 revisions

This page describes how to go from raw data source to data visualization. If there is any confusion, please reach out to Mary for clarification. Suggestions to make the instructions clearer are always welcome.


1. Set up the network folders and subfolders (file explorer)

We are organizing the 'Y:\Equity Indicators\tracker-webpage-content' folder into thematic subfolders. They are ordered by letters to reflect the same sequence as in VISION 2050.

For these next steps you should have permission to write to 'Y:\Equity Indicators' - if you don't have permission, you will be unable to create the necessary subfolders, will receive an error message, and will need to contact Piset.

Within each thematic folder, you will have:

  1. subfolders for each indicator
  2. one thematic webpage review document (already created)

1. subfolder for each indicator

You will need to create a subfolder for your particular indicator, with the following format: X##-indicator-name. The first part of the folder will include the corresponding letter, then the numbering system will start at 01, and the indicator should be spelled out for clarity with hyphens if it is multiple words (for example, f01-median-income). The indicators will be loosely ordered based on their relevance. Within each of the indicator folders you will need to create 2 or 3 subfolders, depending on the indicator/data source:

  1. rda-data: This subfolder is an intermediate location for the R data files, which will be generated from the 'data-gen' script and loaded/used in the 'vis' script. There will be two .rda files after running the 'data-gen' script - one for creating the map and the other for creating the facet column and facet line charts.
  2. web-page-html-outputs: This subfolder is where the HTML versions of the map and 2 charts will be saved. It is necessary to have a separate folder for these outputs so that they can be copied to the website development folder (outside the agency's firewall) after internal review, from which they will directly feed the webpage visuals.
  3. raw-data [OPTIONAL]: This subfolder is necessary for tract-level or external data sets that are not available through the agency's Elmer database or accessible through an API. It provides a space to download/save data that may be unique to this project or from a source that requires downloading in the form of CSVs or spreadsheets. This raw data will be loaded into the 'data-gen' script and explored/cleaned as needed.

In addition to creating the subfolders, you will also need to include the form that the planning reviewer will use to review the draft indicator webpage (much later in the process). The easiest way to do this is to copy the X##-webpage-review.docx file in the 'Y:\Equity Indicators\tracker-webpage-content' folder and paste it into your indicator folder. Rename the file to reflect the correct alpha-numeric code for your theme and indicator. In the document, you can fill in the theme, indicator, and analyst information at the top of the form.

2. one thematic webpage review document (already created)

This review document will be used later on in the planning review process for receiving feedback on the general thematic landing page. It will be named using a similar code as mentioned above, however, because it is at the thematic level, it only requires the letter and not the number (example: Y:\Equity Indicators\tracker-webpage-content\f-economy\f-thematic-webpage-review.docx).

This document has already been created for each thematic folder so you do not need to do anything.

2. Clone GitHub repository (GitHub and file explorer)

Please navigate to the Equity Tracker (https://github.com/psrc/equity-tracker) GitHub folder and clone the repository to your local directory of choice. There are mutliple ways to do this - through the webpage or through GitHub Desktop. You will need GitHub Desktop, so download and install you don't already have it on your machine. Reach out if you have any questions about this step.

Through your file explorer window, navigate to your newly set up local GitHub directory. Set up your directory structure for the code by creating (if not already existing) the correctly formatted thematic and indicator folders - similar to the one you set up in the Y drive (Y:\Equity Indicators\tracker-webpage-content). For example, C:\GitHub\equity-tracker\data-visualization\tracker-webpage-content\f-economy\f01-median-income. This does not require any of the internal folders you created in the Y drive, like 'rda-data' because we will only be storing the code in GitHub.

3. Copy/rename RMarkdown template scripts (file explorer)

Within the local GitHub\equity-tracker\data-visualization folder, copy 2 of the 4 template files to the indicator sub-folder you just created (X-theme\X##-indicator-name). There are 4 main RMarkdown template scripts to choose from:

  • data-gen-pums-template.Rmd
  • data-gen-tract-template.Rmd
  • vis-pums-template.Rmd
  • vis-tract-template.Rmd

Based on the indicator/data source, you will need to decide which template scripts are most applicable. You will ultimately need to choose one of the 'data-gen' scripts and one of the 'vis' scripts - depending on if the indicator/data source is PUMS (person-based) or at the tract level (location-based).

When renaming the scripts, please follow a the subfolder naming system and include the indicator name (for example, f01-data-gen-median-income.Rmd and f01-vis-median-income.Rmd).

4. Set up and run/knit RMarkdown files (RStudio)

Open RStudio and navigate: File > New Project. Select 'Existing Directory' and browse/set to equity-tracker\data-visualization.

4a. data-gen script (RStudio)

In RStudio, navigate to your renamed 'data-gen' script and open it. There are 2 purposes to this script:

  1. Become familiar with the data source to understand how the data set is formatted and what information is available. In addition to exploring the data, this code is also included to check for any issues or concerns by generating basic descriptive stats, visualizing the spread of the values, and examining potential outliers.
  2. Generate the final data sets used to create the visualizations for your indicator. This may require transforming the data (cleaning, aggregating, calculating, pivoting) so that it is in its final version and ready to use in the 'vis' script.

Please go through the script, editing information and inputs as necessary - this includes renaming titles, pulling in the raw data, checking/transforming data as needed, reassigning variables, and renaming folder paths. It may be easiest to go through the template, editing and running each chunk as you go so that you can make changes one step at a time. The templates use an indicator/data source as an example and some of the steps included in the template may not be necessary depending on the format of your data. For example, the life expectancy values are weighted by population numbers. If the indicator/data source already normalizes the values in some way, this step is unecessary and the chunks related to weighting can be commented out/removed. Commenting out is safer so that you can re-add it later if needed. If you are unsure whether a step is needed or not for your indicator/data source, then please check with Brian or Mary.

The eventual output of this script will be 2 .rda files - one that will help create the map and one that will help create the two facet charts. Once the all of the inputs correspond to your indicator/data source (make sure the directories and the variable names are correct), you can run the code (Ctlr+Alt+R). I would recommend running the code and not knitting it. Make sure that there are 2 .rda files in the 'Y:\Equity Indicators\X-theme\X##-indicator-name\rda-data folder. If the .rda files are not in this location check to make sure the directories are correct and reach out to Mary if you have any questions.

4b. vis script (RStudio)

In RStudio, navigate to your renamed 'vis' script and open it. There are 2 purposes to this script:

  1. Generate the html outputs for the indicator webapges. These will be 3 separate html outputs (map, facet column chart, line facet chart) that will be stored in the Y:\Equity Indicators\X-theme\X##-indicator-name\webpage-html-outputs folder
  2. Generate 2 files that will help others within the project team review the visuals and the 'Data call outs' and 'Insights & Analysis' sections. These will be in the form of an .html and a .docx - the .html will allow reviewers to interact with the interactive elements of the map and charts, while the .docx will allow reviewers to leave comments/suggestions for the analyst.

Please go through the script, editing information and inputs as necessary - this includes renaming the title, pulling in the .rda files, checking/transforming data as needed (hopefully minimally, if any at this step), reassigning variables, and renaming folder paths. It may be easiest to go through the template, editing and running each chunk as you go so that you can make changes one step at a time. Don't forget to add a general explanation around line 60 of the vis script to provide a basic definition of the indicator and its significance to the project. The templates use an indicator/data source as an example and some of the steps included in the template may not be necessary depending on the format of your data.

Draft the 'Data call outs' and 'Insights & Analysis' sections by looking through the output map/charts and exploring the data. This may require some filtering, calculations, and research. The data call outs will be values that stand out or are interesting, while the insights/analysis may include more context or detail. There are places in the RMarkdown to calculate and note these findings. Use the template text as an example for some of the types of things you can draft, but it's whatever you think is most interesting after exploring your indicator/data source. These may take awhile to generate while you explore the data frames and the visuals.

Once the all of the inputs correspond to your indicator/data source (make sure the directories and the variable names are correct), you can knit the code. This needs to happen twice - once to generate the .html file and once to generate the .docx file.

  • HTML: around line 50 comment out the code: output_type <- "word" and then 'Knit to HTML'
  • DOCX: around line 51 comment out the code: output_type <- "html" and then 'Knit to Word'

Make sure that these 2 output files are located in the GitHub\equity-tracker\data-visualization\tracker-webpage-content\X-theme\X##-indicator-name folder.

5. Organize files (RStudio)

Once the .html and .docx files have been generated you need to organize the various outputs to ensure that the files are in the correct locations. This will help reviewers and others find them easily and keep the various locations (GitHub, Y drive) organized. This requires two steps.

The chunks for each step are located at the end of the 'vis' script:

  1. Copy files from GitHub to the Y drive. The first part of this code is for copying the .html and .docx files from GitHub where they were knitted, to the Y drive so that reviewers can access them on the network. The second part of this code is removing all of the files from GitHub so that the only things that remain are the .Rmd files. This second part is mostly to keep GitHub clean so that the various outputs are saved in distinct places.
  2. Copy files from Y drive to the website development folder (outside the agency's firewall). This step ensures that the visuals can be embedded in the webpages. This additional intermediate step is necessary because the webpages can't access items on the network (behind the firewall) and it provides an intermediate way to control what is visible on the webpage. Once the visuals are on the website development folder, they will automatically update on the webpages because the file paths are directly embedded in iframes.

At the top of these RMarkdown code chunks, they have include=FALSE, eval=FALSE so that they do not automatically run when knitting to .html and .docx. As a result, these two code chunks will need to be run separately once the script has been knit.

To make sure that the first step (GitHub > Y drive) worked as it should, please check two locations.

  • GitHub\equity-tracker\data-visualization\tracker-webpage-content\X-theme\X##-indicator-name - should have the two RMarkdown files (data-gen and vis)
  • Y:\Equity Indicators\X-theme\X##-indicator-name - should have .html and .docx files with the same naming structure as the RMarkdown file

Unfortunately, without access to the website development folder (oustide the agency's firewall), you will be unable to check that the second step worked (Y drive > website development folder) because the permissions are limited. The folder structure in this directory is similar to the one on the Y drive, so as long as the theme_dir folder is set up correctly (as described above), the process should run smoothly. If you want to confirm, please contact Mary with the indicator that you're working on.


5.5. OPTIONAL: Check folders/files (file explorer)

If you would like to make sure that the files are saved/stored in the correct places, you can run one more check of the two primary file locations. Names are generalized for simplicity but should have the correct alpha-numeric theme/indicator information in the names:

GitHub\equity-tracker\data-visualization\tracker-webpage-content\X-theme\X##-indicator-name

  • data-gen.Rmd
  • vis.Rmd
  • files folder

Y:\Equity Indicators\X-theme\X##-indicator-name

  • raw-data folder (depending on indicator/data source)
    • .csv(s)
  • rda-data folder
    • data.rda
    • map-data.rda
  • webpage-html-outputs folder
    • column.html
    • line.html
    • map.html
    • files folder
  • vis.docx
  • vis.html
  • webpage-review.docx

Once the outputs are generated and saved to the right locations, please update the last tab (Production & Review) in the spreadsheet on Teams (Teams: Equity Tracker > General > Files > Indicator Selection Process > IndicatorResearch_08222023) by adding an 'X' in the appropriate columns (B-H).

You are ready to move onto the review phase as described in the Review Process and Webpage Development instructions.