layout	title	description
default	Lectures and Class Material	Links to the pre-recorded lectures and material

Lectures and Class Material

Lecture material: Link to the parent GitHub Repository.

Back to home QLS612 website

Slack workspace QLS612 slack

1. Reproducibility in Life Science

Instructor: JB Poline

Outline

With this lecture, you will get a general introduction to reproducible - or irreproducible - life sciences. Specifically, you will

learn what is meant by reproducibility of research results in the life sciences
undertand the main causes for irreproducible results
learn the possible collective and individual actions for curbing irreproducibility

Material: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Lecture Resources

Canonical paper: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript

Questions you will be able to answer after taking this module:

Is the term “replicability” generally applied to obtaining the same results with another (new) dataset ?
Is the root cause of irreproducibility the publication incentive ?
What is a similar result with the same methodology or pipeline but different data ?

2. Introduction to the Terminal and Bash

Instructor: Jacob Sanz-Robinson

Outline

To follow most of the other modules you will have to have some basic understanding of the command line. In this module we'll take a look at the the BourneAgainSHell (BASH), the default command line in most Linux systems. You will learn how to:

move around on your computer with the command line, create and open directories and files
find things with the command line (files and programs, PATH variables)
run useful command line programs and find help (find, grep, ls, and man / documentation)

Materials:

Github Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Questions you will be able to answer after taking this module:

What is a command line shell
How would you copy thousands of files with file names starting with "my_good_file..." to a different directory on your computer?
Among thousands of files and directories you know there is one where you wrote down "location of my thesis backup". How do you find this file?
What is an environment variable and how can you change it?

3. Introduction to Python

Instructor: Jacob Sanz-Robinson and Michelle Wang

Outline

This lecture is designed to get students up and running with Python. It is expected that Python 3 (preferably 3.7 or later) is installed, and that the students have some basic previous experience in a scripting language.
It will guide students through the fundamental syntax, concepts, and data structures required to code in Python 3.
Topics include: Running your code, commenting, variables, arithmetic, logic, strings, lists, tuples, dictionaries, functions, libraries, if statements, loops, exceptions, and classes.

Material: GitHub Link

Pre-recorded lecture video: YouTube Link

Questions you will be able to answer after taking this module:

(1) How does the use of a ‘break’ statement alter the flow of a loop in Python?

(2) What happens if you attempt to append new elements to a Tuple?

(3) Without running the code on your machine, what is the printed output when the following code is run?

my_dictionary = {"a" : 1, "b" : {"c" : {"d" : [4,5,6,4]}}, "c" : [1,2,3]}
x = my_dictionary["b"]["c"]["d"].append(my_dictionary["c"][-3])
print(my_dictionary.values())

a) [1, {'c': {'d': [4, 5, 6, 4}}, [1, 2, 3]]
b) [1, {'c': {'d': [4, 5, 6, 4, 1]}}, [1, 2, 3]]
c) [1, [4,5,6,4,1], [1,2,3]]
d) [1, [4,5,6,4], [1,2,3]]

(4) Without running the code on your machine, which string is returned by my_function when called with the specified parameters?

def my_function(x, y, z):
    result = ""
    if len(z) <= 6 and len(z) > 2:
        result = z[-2] + y
    else:
        result = x + y
    return x + x + result

my_function("111", "abc", "0100")

a) ‘1111110abc’
b) ‘0abc111111’
c) ‘111111bca0’
d) ‘1111111110’

4. Numpy, Scipy, and Pandas: The Python Toolbox for Data Analyses

Instructor: Tristan Glatard

Outline

This lecture will introduce NumPy, Pandas, and SciPy, three of the main libraries in the scientific Python ecosystem. At the end of the lecture, participants will be able to:

Manipulate arrays of numbers with NumPy
Manipulate data frames with Pandas
Apply numerical methods from the scientific Python ecosystem

Materials: GitHub Link

Lecture Resources

A Visual Intro to NumPy and Data Representation by Jay Alammar, up to "Transposing and Reshaping.
Pandas DataFrame introduction
Pandas read-write tutorial
Scipy introduction
Scipy IO tutorial

Questions you will be able to answer after taking this module:

(1) NumPy's main data structure is a Python list

True
False

(2) Pandas's main data structure is a 2D table

True
False

(3) A Pandas Series is a one-dimensional array

True
False

5. Introduction to Git and GitHub

Instructor: Kendra Oudyk

Outline

Git and GitHub are key tools for doing version control in both academia and industry. These tools can help students do more effient, open, and reproducible research. Further, knowing these tools can help prepare students for careers in academia and industry. In this lecture, students will learn

What is version control and why has it become so important in science and industry;
How to track and share their own work using Git and GitHub; and
How to collaborate and contribute to open projects using Git and GitHub.

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Questions you will be able to answer after taking this module:

In a ________ version control system, individuals have the entire repository and its history in their local repository.

a) Centralized
b) Distributed

What is the basic workflow for tracking a change and sharing it on github?

a) git commit, git add, git push
b) git pull, git add, git push
c) git add, git commit, git push

How do you start a parallel line of development, in order to do nonlinear version control?

a) make a tag
b) start a new branch
c) create a remote repository

How do you make a copy of another GitHub repo on your GitHub account?

a) git clone <repo address>
b) go to the repo's GitHub page and click "fork"
c) go to the repo's GitHub page and open an issue to ask for a copy
d) go to the repo's GitHub page and do a pull request

6. Data Preprocessing in Python

Instructor: Nadia Blostein

This module is designed to introduce students to data preprocessing (ie preparation) in Python. Data preprocessing is a critical prerequisite to any data analysis or machine learning application. Students will be preprocessing .csv and .png data from the following repository and the session will cover the topics below:

Outline

Load and examine your data
Data reformatting
Data filtering
Data transforms
Data visualization
Examining and manipulating 2D images with scikit image and scipy

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Lecture resourecs

One-hot encoding
10 Python image manipulation tools
6 Different Ways to Compensate for Missing Values In a Dataset
Imputation of mixed data with multilevel singular value decomposition
Understanding the Difference Between Normalization vs. Standardization

Questions you will be able to answer after taking this module:

What is a problem that can arise when you one-hot encode a feature with a lot of categories?
What Python library can you use to generate histograms?
If you are using a Gaussian filter to blur an image, which of the following sigma values will blur your image the most: 0.1, 2, 4, 5, 6 ?
What Python package is faster for matrix computations: Pandas or Numpy?

7. Introduction to Machine Learning part 1: supervised learning

Instructor: Nikhil Bhagwat

Outline

Define machine-learning nomenclature
Describe basics of the “learning” process
Explain model design choices and performance trade-offs
Introduce model selection and validation frameworks
Explain model performance metrics

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Lecture Resources

Linear Algebra Review and Reference
Review of Probability Theory

Questions you will be able to answer after taking this module:

Model training - what is under/over-fitting?
Model selection - what is (nested) cross-validation?
Model evaluatation - what are type-1 and type-2 errors?

8. Introduction to Machine Learning part 2: Model selection & validation; dimensionality reduction

Instructor: Jérôme Dockès

Outline

Learn how to properly select a machine-learning model, set hyperparameters, and evaluate prediction performance.
Understand the challenges of learning from high-dimensional data and learn about tools to mitigate the issue.

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Link, PDF

Questions you will be able to answer after taking this module:

I am predicting continuous cognitive scores of 1,000 participants using 20,000 brain imaging features. I use least-squares regression. What is regularization and why do I need it?
I decide to use ridge regression (l2 regularization). How can I set the regularization hyperparameter?
I also add a dimensionality reduction step to my model: PCA. I do 5-fold cross-validation, and I perform a full grid-search, using 3 folds for the inner validation loop. I use a grid of 3 options for the number of PCA components and 6 options for the ridge hyperparameter. How many times (at least) will I need to fit a PCA?

9. Introduction to Data Visualization in Python

Instructor: Jonathan Armoza

Outline

This module will teach students fundamental concepts of data visualization and familiarize them with several graphing libraries in Python (Matplotlib, Seaborn, Plot.ly, Bokeh) with the goals of using visualizations as a tool to understand data and creating graphics for multiple science contexts.
It will guide students through the process of familiarizing themselves with graphing libraries, and choosing plots that display the information accurately and clearly.
It will provide students with a perspective on best practices for visualization design.

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Questions you will be able to answer after taking this module:

Which plot types are best to visualize scalar, categorical, or distributional data? How does the answer to that question change if the data relationship being plotted is univariate vs multivariate?
What are a few best practics for visualization design that balances clarity and consideration for audience?
Why would I choose to generate static visualizations vs interactive ones?
Which Python graphing libraries are most efficient to do so? And what are some of the capabilities of each?
Is a data visualization an objective research output?

10. Virtualization of computing environments

Instructor: Sebastian Urchs

Outline

Learn why containerization and virtualization are important for research projects.
Have an overview of different solutions to create isolated environments.
Get some basic hands on experience with Python virtual environments and Docker.

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Lecture Resources

The Visual Display of Quantitative Information by Edward R. Tufte
Gapminder
Lev Manovich

Questions you will be able to answer after taking this module: (to check your understanding of the pre-recorded materials)

When working with the file system inside a Docker container, which statements are true?
- I cannot see files on the host system from inside the container
- files written into the container file system are lost with the container
- I can mount paths on the host system into the container to expose their contents to it
What is an advantage of Docker over a Virtual Machine?
- a Docker container can run any operating system, independently of the host operating system
- Docker is a good choice for shared systems because of its high level of security
- Docker containers are easier to specify, build, and manage and have better sharing infrastructure
What is the difference between a Docker container and a Docker image?
- A Docker container is a registry service to store and share Docker images
- A Docker image is a read-only snapshot and a Docker container is a running instance of it
- A Docker container is a read-only snapshot that can be easily shared (e.g. on Dockerhub) and from it, many live Docker images can be spawned
What is an advantage conda has over pip for Python environments?
- conda is usually prepackaged with Python, so you don't have to install anything
- conda has more Python packages than pip because of the Anaconda distribution
- conda can resolve non-Python dependencies and can also create virtual environments

11. High Performance Computing (HPC)

Instructor: Brent McPherson

Outline

Learn the key facts about High Performance Computing (HPC) and Cloud computing
Understand the advantages and the constraints of HPC
Learn the key concepts and practical bash commands to get started on the Compute Canada HPC

Materials: GitHub Link

Pre-recorded lecture video: YouTube Link

Slides: Slides

Questions you will be able to answer after taking this module:

Choose the area that Advanced Research Computing traditionally does not include

a) HPC/Clusters
b) Research Data Management
c) Cloud Computing
d) Video Games

Choose all components that are part of an HPC Compute Node

a) Processor/Core
b) Display/Monitor
c) Memory
d) Mouse
e) Local Disk

Choose all ways to access an HPC Cluster

a) Secure shell to a Login Node
b) Secure shell to a Compute Node
c) Secure transfer to a Data Transfer Node

back

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lectures-materials.md

lectures-materials.md

Lectures and Class Material

1. Reproducibility in Life Science

2. Introduction to the Terminal and Bash

3. Introduction to Python

4. Numpy, Scipy, and Pandas: The Python Toolbox for Data Analyses

5. Introduction to Git and GitHub

6. Data Preprocessing in Python

7. Introduction to Machine Learning part 1: supervised learning

8. Introduction to Machine Learning part 2: Model selection & validation; dimensionality reduction

9. Introduction to Data Visualization in Python

10. Virtualization of computing environments

11. High Performance Computing (HPC)

Files

lectures-materials.md

Latest commit

History

lectures-materials.md

File metadata and controls

Lectures and Class Material

1. Reproducibility in Life Science

2. Introduction to the Terminal and Bash

3. Introduction to Python

4. Numpy, Scipy, and Pandas: The Python Toolbox for Data Analyses

5. Introduction to Git and GitHub

6. Data Preprocessing in Python

7. Introduction to Machine Learning part 1: supervised learning

8. Introduction to Machine Learning part 2: Model selection & validation; dimensionality reduction

9. Introduction to Data Visualization in Python

10. Virtualization of computing environments

11. High Performance Computing (HPC)