Home

"Take Off with Stats in Python," or _EngComp2_ is the second learning module of the Engineering Computations collection.

Since February 2018, until April 2025, the module was complemented by an open online course in a self-hosted instance of the Open edX learning platform. After seven years, this platform is being taken offline. Here, we reproduce the contents of the "About" page of the online course.

Course Description

This learning module in engineering computations (EngComp2) builds from a foundation in Python programming to develop data practices and computational problem-solving. You learn to handle data programmatically, reading data from files, cleaning and organizing data, and performing exploratory data analysis. You will use real data, learn to make pretty data visualizations, and gain insight from data.

The target audience is first- or second-year science and engineering students, but only high-school-level mathematics background is assumed.

What You'll Learn

Exploratory data analysis with real data (canned craft beers in the US, lead exposure from cosmetics, life expectancy and wealth).
Handling labeled data with the pandas library: data frames and series.
Visualizing quantitative and categorical data.
Getting insights from data using data-frame and series methods, various plots and interactive widgets.

Instructor

Lorena A. Barba Professor of Mechanical and Aerospace Engineering The George Washington University

Course Outline

About this course
- Overview
- Installing Python or using cloud
- Copyright notice
Cheers! Stats with beers
- Read the data file
- Explore the data
- Ready, stats, go!
- Distribution plots
- What we've learned
- Graded HW
- Graded HW2
- Graded HW3
Seeing stats in a new light
- Read the data and explore the data
- Visualizing quantitative data
- Visualizing categorical data
- Visualizing multiple data
- What we've learned
Lead in lipstick
- Setting the scene
- Lead exposure from lipstick
- References
Life expectancy and wealth
- The best stats you've ever seen
- Grouping data for analysis
- Visualizing the data
- Dig deeper and get insights from the data
- Using widgets to visualize interactively
- References
- Graded HW4
- Final challenge

About This Course

This is the second in a series of course modules on computation for engineering students. It focuses on the foundations of data analysis, descriptive statistics, and data visualization. The module introduces the pandas library for data analysis in Python.

This learning module uses the Python programming language, and Jupyter notebooks. All of the learning tools are free and open source, and all the materials are open and free.

The module consists of four lessons:

Cheers! Stats with beers
Seeing stats in a new light
Lead in lipstick (a full worked-out example)
Life expectancy and wealth

Requirements

This module assumes a foundation in Python programming, as provided by completing our first module, "Get Data Off the Ground with Python."

Lessons

Follow the links below to see each lesson rendered by the nbviewer service as a static webpage.

You can also launch an interactive session with the course's Jupyter notebooks using the free Binder service. Try it!

Launch in Binder

After Binder launches, you should see a Jupyter dashboard (file navigator). Select the folder notebooks_en to access the five lessons of this course as fully executable Jupyter notebooks.

Please note that Binder is a free service from Project Jupyter. Depending on demand, it can be a bit slow. But it's free!

Lesson 1: Cheers! Stats with beers

Exploratory analysis using a data set of canned craft beers in the US. Introduces the pandas library and its data types: Data Frames and Series. Use pandas to read a data file, extract selected columns, and remove null values. Descriptive statistics: measures of central tendency and variability. Distribution plots: histograms with Matplotlib. Comparing with a normal distribution.

Lesson 2: Seeing stats in a new light

Continuing with the data set of canned craft beers, this lesson focuses on visualizing statistics. For quantitative data: histograms and box plots; for categorical data: bar plots. Visualizing multiple data with scatter plots and bubble charts.

Lesson 3: Lead in lipstick

A full worked example using what you learned in lessons 1 and 2: using data from studies by the US Food and Drug Administration on the lead content in lipstick, we fact-check alarming news headlines. Based on Prof. Kristin Sainani's lecture, "Exploring real data: lead in lipstick," of her Stanford Online course "Statistics in Medicine.”

Lesson 4: Life expectancy and wealth

A deeper dive into pandas for data analysis, using data of life expectancy and gross-domestic product (income) per capita over time, for various countries across the world. Grouping data for analysis and dataframe manipulation.

Frequently Asked Questions

Why are you using Python?

Python is free. Python is a complete programming solution, with excellent interactive options and visualization tools. Python is a good learning language: it has easy syntax, it is interpreted and it has dynamic typing. Python has a large community: people post and answer each other's questions about Python all the time. For numerical computing, Python libraries can do everything you need to do. Python is exploding in popularity and is used for teaching programming at the top schools. Python is used in industry; it can help you get a job.

I'm having problems viewing the course. Could it be my browser?

The Open edX platform works best with current versions of Chrome, Firefox or Safari, or with Internet Explorer version 9 and above. See the list of supported browsers for the most up-to-date information.

What software do I need for this course?

At first, you can work on this course without any new software: just your browser. We will guide you to use free online resources for interactive computing with Python and Jupyter. For example, you can follow along with our lessons using the free website jupyter.org/try. Bear in mind that it doesn't give you permanent storage: if you create a Jupyter document on this service, be sure to download it before leaving the website.

What is Jupyter?

Jupyter is a set of open-source tools for interactive computing. At the center of the Jupyter world is the Notebook: a document that combines text and multi-media content with executable code. It is a powerful platform to learn computing because it lets you chunk a program into small, digestible portions, and intermix these with narration and explanation. It is also becoming the staple environment to develop ideas and present finished analyses in data science and engineering.

What does it mean that the course materials are open?

It means that the authors of all the materials used in this course give everyone in the world a license to use the material in any way, to redistribute, modify and essentially do whatever they like with it. The only condition is that we are given attribution. Content is under a Creative Commons CC-BY 4.0 International license and code is under a BSD 3-clause license.

Is there a required textbook for this course?

There is no required textbook. We have written up original materials and share them with everyone completely free. You can even download a PDF version for printing. You can cite the typeset document as:

Barba, Lorena A.; Clementi, Natalia C. (2017): Engineering Computations Module 2: Take off with stats. figshare. doi:10.6084/m9.figshare.5673499

I'm an instructor at another institution. Can I adopt these materials for my course?

Feel free! We're happy if you use the materials in any way. All we ask is that you attribute the materials to us.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!