From b25246a98b0604f85ef99753d0db112bf2225681 Mon Sep 17 00:00:00 2001 From: Madeleine Bonsma-Fisher Date: Mon, 28 Nov 2022 09:39:41 -0500 Subject: [PATCH] faculty notes --- faculty_notes.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) create mode 100644 faculty_notes.md diff --git a/faculty_notes.md b/faculty_notes.md new file mode 100644 index 0000000..e02e3ab --- /dev/null +++ b/faculty_notes.md @@ -0,0 +1,73 @@ +--- +title: "Faculty notes - Working with data in Python" +output: + pdf_document: default + html_notebook: default +urlcolor: blue +--- + +## Overview + +This 4-hour workshop takes learners through the basics of programming in Python via the Jupyter Lab interface and culminates with exploration and visualization of real-world bicycle count data from the City of Toronto. +This material focuses on using the package `pandas` for working with spreadsheet-type data and the packages `matplotlib` and `seaborn` for data visualization. +The material is designed to be delivered as a participatory live-coding workshop where the instructor projects their computer screen while coding and learners follow along on their own computers. +This material is based on [workshops](https://uoftcoders.github.io/2018-07-12-utoronto/) hosted by [UofT Coders](https://uoftcoders.github.io), inspired by the [Data Carpentry Ecology Python lesson](https://datacarpentry.org/python-ecology-lesson/). + +* **Prerequisites:** This material assumes no background knowledge of programming. +* **Target audience:** Undergraduates, graduate students, faculty, or staff in any discipline. + +## Learning objectives + +#### Part 1: introduction to programming in Python +- Overview of the capabilities of Python and how to use + JupyterLab for exploratory data analyses. +- Learn about some differences between Python and Excel. +- Learn basic Python commands. +- Learn about the Markdown syntax and how to use it within the Jupyter Notebook. + +#### Part 2: working with data in Python +- Describe what a data frame is +- Load external data from a .csv file into a data frame with `pandas` +- Summarize the contents of a data frame with `pandas`. +- Learn to use data frame attributes `loc[]`, `head()`, `info()`, `describe()`, `shape`, `columns`, `index`. +- Understand the split-apply-combine concept for data analysis. +- Use `groupby()`, `sum()`, `agg()` and `size()` to apply this technique. + +#### Part 3: visualizing data +- Produce scatter plots, line plots, and histograms using `seaborn` and `matplotlib`. +- Understand how to graphically explore relationships between variables. +- Apply grids for faceting in `seaborn`. +- Set universal plot settings. +- Use `seaborn` grids with `matplotlib` functions + +## Lesson outline + +- Communicating with computers (5 min) + - Advantages of text-based communication (5 min) + - Speaking Python (5 min) + - Natural and formal languages (5 min) +- The Jupyter Notebook (10 min) +- Data analysis in Python (5 min) + - Packages (5 min) + - How to get help (5 min) +- Manipulating and analyzing data with pandas + - Data set background (10 min) + - What are data frames (15 min) + - Data wrangling with pandas (40 min) +- Split-apply-combine techniques in `pandas` + - Using `sum()` and `mean()` to summarize categorical data (20 min) + - Using `size()` to summarize categorical data (10 min) +- Data visualization with `matplotlib` and `seaborn` (10 min) + - Visualizing one quantitative variable with multiple categorical variables (40 min) + - Visualizing the relationship of two quantitative variable with multiple categorical variables (40min) + - Using any plotting function with `seaborn` grids (10 min) + +## Data description + +Parts 2 and 3 of this material use data from the City of Toronto [Open Data Catalogue](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/), a great resource with lots of publicly available data. The dataset used is [counts of bicycles](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#7e3a3b94-92d8-2932-2c59-2c88a6cc0f3f) from the College St. bikelanes in September 2010 and September 2017. +I cleaned the data and processed it into a long spreadsheet format that is used in this lesson. The cleaned data can be downloaded at this link: https://bit.ly/2Cs1Mq1 or https://gist.githubusercontent.com/mbonsma/be7482639d7a2d5cfc52505aadb9b53e/raw/1f68fce4a127fdd3b2313728dd84cf21e86e7df3/college_spadina_2010_2017.csv + +## Challenges to address + +Participatory live-coding is suitable for small to medium-sized groups of learners. This workshop was presented to a group of about 35 people, and I would recommend a group no larger than 40. +It is very helpful to have several helpers present that can move around the room and help learners debug while the workshop progresses. I recommend using a [sticky note system](https://dynamicecology.wordpress.com/2015/01/13/sticky-notes-as-a-teaching-and-lab-meeting-tool/) for monitoring the pace and for letting learners flag when they need help. \ No newline at end of file