This assignment focuses on extracting data from web APIs, processing JSON responses, and analyzing COVID-19 data across US states. You'll learn to interact with the US Census and CDC APIs, wrangle complex datasets, and create visualizations to explore relationships between population demographics and COVID-19 outcomes.
- Accept the assignment via the GitHub Classroom link
 - Clone this repository to your local machine
 - Open the repository folder in RStudio as a project
 - Complete all problems in the 
pset-04-wrangling.qmdfile 
pset-04-wrangling.qmd- Main assignment file with all 15 problemscensus-key.R- Census API key file (you'll create this)README.md- This instruction file
- httr2 - HTTP requests and API interactions
 - tidyverse - Data manipulation, visualization, and analysis
 - janitor - Data cleaning and formatting
 - jsonlite - JSON data parsing
 - lubridate - Date and time manipulation
 
- US Census API Key - Sign up at https://api.census.gov/data/key_signup.html
 - Internet connection - Required for API calls throughout assignment
 - File management - Store your API key in 
census-key.R(never commit this file) 
- HTTP requests - Using 
httr2to interact with web APIs - JSON parsing - Converting JSON responses to R data structures
 - Data extraction - Pulling specific data from complex API responses
 - Data cleaning - Using 
janitorfor header management and formatting - Date parsing - Converting strings to proper Date formats with 
lubridate - API parameters - Handling limits, filters, and query parameters
 - Data visualization - Creating time series and summary plots
 
By the end of this assignment, your repository should have this structure:
Root Directory:
pset-04-wrangling.qmd- Main assignment file with all 15 problemspset-04-wrangling.html- Rendered HTML outputcensus-key.R- Your Census API key file (Problem 1) [DO NOT COMMIT]README.md- This instruction file
Important Security Note:
- Add 
census-key.Rto your.gitignorefile to prevent accidentally committing your API key - Your API key should never appear in your Git history or be visible on GitHub
 
15 Problems covering:
- Problems 1-2: US Census API setup and request construction
 - Problems 3-5: HTTP response handling and content extraction
 - Problems 6-7: Data cleaning and visualization with Census data
 - Problems 8-9: JSON parsing and regional data integration
 - Problems 10-11: CDC API data extraction with parameter handling
 - Problems 12-13: Time series analysis and date manipulation
 - Problems 14-15: Additional CDC data and summary visualizations
 
- Obtain a Census API key and store it securely in 
census-key.R - Never commit your API key - add 
census-key.Rto.gitignore - Handle API responses properly - check status codes and content types
 - Clean data thoroughly - APIs often return messy, nested data
 - Create informative visualizations - properly label all plots
 - Show both code and output in your rendered document
 
- API rate limits: Be mindful of how frequently you call APIs
 - Error handling: Check response status codes before processing data
 - Data types: Convert strings to appropriate types (dates, numbers)
 - Missing data: Handle 
NAvalues that may come from API responses - URL construction: Use 
httr2functions rather than manual string concatenation - JSON structure: Examine JSON responses carefully before parsing
 
US Census Population Estimates API
- Provides state population data for 2020-2021
 - Requires API key for access
 - Returns data in JSON format with nested structure
 
CDC COVID-19 Data APIs
- Case data: Daily COVID-19 cases by state
 - Death data: COVID-19 deaths with demographic breakdowns
 - Large datasets requiring limit parameter adjustments
 - Public APIs with no authentication required
 
Additional Data
- Regional classifications from GitHub JSON file
 - State abbreviation mappings using R's built-in data
 
Your submission is your final committed and pushed repository. Make sure to:
- Complete all 15 problems in the assignment file
 - Render your document to HTML successfully
 - Include both .qmd and .html files in your submission
 - DO NOT commit your 
census-key.Rfile - Commit your changes with meaningful messages
 - Push your final work to GitHub
 - Verify that all code and output are visible in your GitHub repository
 
October 5, 2025, 11:59 PM
- Post questions on our class Slack in the #pset-04 channel
 - Attend office hours for API and data extraction help
 - Review course materials on web scraping and APIs
 - Check package documentation using 
?function_name - Consult API documentation for parameter details
 
- Committing your API key (serious security issue)
 - Not checking API response status before processing
 - Ignoring data type conversions (keeping everything as character)
 - Poor error handling when APIs return unexpected responses
 - Not adjusting API limits (getting incomplete data)
 - Forgetting to handle missing values in API responses
 
| Criteria | Excellent (A) | Good (B) | Satisfactory (C) | Needs Improvement (D-F) | 
|---|---|---|---|---|
| API Integration & HTTP Requests (30 points) | 27-30: Perfect use of httr2, proper error handling, all API calls successful. Demonstrates mastery of web API integration. | 24-26: Good API usage with minor issues in parameter handling or response processing. | 18-23: Basic API functionality working but missing some error handling or parameter optimization. | 0-17: Poor API usage, failed requests, major errors in HTTP handling or missing functionality. | 
| Data Extraction & JSON Parsing (25 points) | 23-25: Excellent JSON parsing, perfect data extraction from complex nested structures. Clean conversion to tidy data formats. | 20-22: Good data extraction with minor issues in JSON parsing or data structure handling. | 15-19: Basic data extraction working but some issues with complex nested data or inefficient parsing. | 0-14: Poor data extraction, major JSON parsing errors, or inability to handle API response structures. | 
| Data Cleaning & Transformation (20 points) | 18-20: Masterful data cleaning, proper type conversions, excellent use of janitor and tidyverse tools. Perfect date parsing. | 16-17: Good data cleaning with minor issues in type conversion or date handling. | 12-15: Basic cleaning completed but some issues with data types, dates, or missing value handling. | 0-11: Poor data cleaning, major type conversion errors, or improper handling of messy API data. | 
| Visualization & Analysis (15 points) | 14-15: Beautiful, informative visualizations with proper labels, titles, and formatting. Clear insights from data. | 12-13: Good visualizations with minor aesthetic or labeling issues. Plots convey intended information. | 9-11: Basic plots created but missing some labels, poor aesthetics, or unclear presentation. | 0-8: Poor or missing visualizations, major labeling problems, or plots don't convey meaningful information. | 
| Code Organization & Documentation (5 points) | 5: Exceptionally clean, well-documented code. Clear logic flow, excellent error handling, easy to follow. | 4: Good code organization with minor documentation gaps. Code is readable and well-structured. | 3: Basic code organization but some unclear sections or missing comments where needed. | 0-2: Poor code organization, unclear logic, missing documentation, or difficult to follow. | 
| Technical Requirements & Security (5 points) | 5: Perfect API key handling, document renders flawlessly, all technical requirements met, excellent GitHub practices. | 4: Meets technical requirements with minor issues. Good security practices maintained. | 3: Some technical issues but basic requirements met, minor security oversights. | 0-2: Major technical problems, security issues (committed API keys), or rendering failures. | 
Specific Things Graders Will Look For:
- Secure API key handling: Never commits sensitive credentials
 - Proper HTTP request construction: Correct use of httr2 functions
 - Robust error handling: Checks response status before processing
 - Accurate data extraction: Successfully parses complex JSON structures
 - Clean data transformation: Proper type conversions and date parsing
 - Informative visualizations: Well-labeled, publication-quality plots
 
Common Deductions:
- Committing API keys or other sensitive information (-15 points)
 - Not checking API response status (-5 points per instance)
 - Poor JSON parsing or data extraction (-5 points per problem)
 - Incorrect data type handling (-3 points per problem)
 - Missing or poor plot labels (-3 points per plot)
 - Document rendering failures (-10 points)
 
This rubric emphasizes API integration skills and data extraction techniques essential for modern data science workflows involving web data sources.
- httr2 package documentation: 
help(package = "httr2") - US Census API documentation: https://www.census.gov/data/developers/data-sets.html
 - CDC Data Portal: https://data.cdc.gov/
 - JSON structure exploration: Use 
str()to examine parsed JSON objects - HTTP status codes: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
 - lubridate cheat sheet: RStudio cheat sheets for date/time manipulation