From b25246a98b0604f85ef99753d0db112bf2225681 Mon Sep 17 00:00:00 2001
From: Madeleine Bonsma-Fisher <madeleine.bonsma@gmail.com>
Date: Mon, 28 Nov 2022 09:39:41 -0500
Subject: [PATCH] faculty notes

---
 faculty_notes.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 faculty_notes.md

diff --git a/faculty_notes.md b/faculty_notes.md
new file mode 100644
index 0000000..e02e3ab
--- /dev/null
+++ b/faculty_notes.md
@@ -0,0 +1,73 @@
+---
+title: "Faculty notes - Working with data in Python"
+output:
+  pdf_document: default
+  html_notebook: default
+urlcolor: blue
+---
+
+## Overview
+
+This 4-hour workshop takes learners through the basics of programming in Python via the Jupyter Lab interface and culminates with exploration and visualization of real-world bicycle count data from the City of Toronto. 
+This material focuses on using the package `pandas` for working with spreadsheet-type data and the packages `matplotlib` and `seaborn` for data visualization. 
+The material is designed to be delivered as a participatory live-coding workshop where the instructor projects their computer screen while coding and learners follow along on their own computers.
+This material is based on [workshops](https://uoftcoders.github.io/2018-07-12-utoronto/) hosted by [UofT Coders](https://uoftcoders.github.io), inspired by the [Data Carpentry Ecology Python lesson](https://datacarpentry.org/python-ecology-lesson/). 
+
+* **Prerequisites:** This material assumes no background knowledge of programming.
+* **Target audience:** Undergraduates, graduate students, faculty, or staff in any discipline.
+
+## Learning objectives
+
+#### Part 1: introduction to programming in Python
+- Overview of the capabilities of Python and how to use
+  JupyterLab for exploratory data analyses.
+- Learn about some differences between Python and Excel.
+- Learn basic Python commands.
+- Learn about the Markdown syntax and how to use it within the Jupyter Notebook.
+
+#### Part 2: working with data in Python
+- Describe what a data frame is
+- Load external data from a .csv file into a data frame with `pandas`
+- Summarize the contents of a data frame with `pandas`.
+- Learn to use data frame attributes `loc[]`, `head()`, `info()`, `describe()`, `shape`, `columns`, `index`.
+- Understand the split-apply-combine concept for data analysis.
+- Use `groupby()`, `sum()`, `agg()` and `size()` to apply this technique.
+
+#### Part 3: visualizing data
+- Produce scatter plots, line plots, and histograms using `seaborn` and `matplotlib`.
+- Understand how to graphically explore relationships between variables.
+- Apply grids for faceting in `seaborn`.
+- Set universal plot settings.
+- Use `seaborn` grids with `matplotlib` functions
+
+## Lesson outline
+
+- Communicating with computers (5 min)
+    - Advantages of text-based communication (5 min)
+    - Speaking Python (5 min)
+    - Natural and formal languages (5 min)
+- The Jupyter Notebook (10 min)
+- Data analysis in Python (5 min)
+    - Packages (5 min)
+    - How to get help (5 min)
+- Manipulating and analyzing data with pandas
+    - Data set background (10 min)
+    - What are data frames (15 min)
+    - Data wrangling with pandas (40 min)
+- Split-apply-combine techniques in `pandas`
+    - Using `sum()` and `mean()` to summarize categorical data (20 min)
+    - Using `size()` to summarize categorical data (10 min)
+- Data visualization with `matplotlib` and `seaborn` (10 min)
+    - Visualizing one quantitative variable with multiple categorical variables (40 min)
+    - Visualizing the relationship of two quantitative variable with multiple categorical variables (40min)
+    - Using any plotting function with `seaborn` grids (10 min)
+
+## Data description
+
+Parts 2 and 3 of this material use data from the City of Toronto [Open Data Catalogue](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/), a great resource with lots of publicly available data. The dataset used is [counts of bicycles](https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/#7e3a3b94-92d8-2932-2c59-2c88a6cc0f3f) from the College St. bikelanes in September 2010 and September 2017. 
+I cleaned the data and processed it into a long spreadsheet format that is used in this lesson. The cleaned data can be downloaded at this link: https://bit.ly/2Cs1Mq1 or https://gist.githubusercontent.com/mbonsma/be7482639d7a2d5cfc52505aadb9b53e/raw/1f68fce4a127fdd3b2313728dd84cf21e86e7df3/college_spadina_2010_2017.csv
+
+## Challenges to address
+
+Participatory live-coding is suitable for small to medium-sized groups of learners. This workshop was presented to a group of about 35 people, and I would recommend a group no larger than 40. 
+It is very helpful to have several helpers present that can move around the room and help learners debug while the workshop progresses. I recommend using a [sticky note system](https://dynamicecology.wordpress.com/2015/01/13/sticky-notes-as-a-teaching-and-lab-meeting-tool/) for monitoring the pace and for letting learners flag when they need help. 
\ No newline at end of file