This repository contains materials for a 3-week course designed to introduce high school students (9th grade or higher) to data analysis using R with a focus on studies involving genetic data.
- Instructor: Gustavo de los Campos ( [email protected] )
- Modality: Synchronous distant learning (via discord)
- Required background: High school student, 9th grade or higher.
- Expected time commitment: 10-15 hr/week. We will meet as a group on Monday for a lecture and an introduction of the week's task, have a short lecture and office hours on Wednesday, and meet as a group on Friday to present and discuss results.
- Other requisites: Students are required to have access to a computer, have a personal Discord and Github accounts.
- Disclaimer:
- The course is offered by Gustavo de los Campos.
- This is not a Michigan State University course.
- Upon completion, you will receive a certificate signed by the instructor.
-
Week 1: Introduction to Github, R, and RStudio
-
Introduction to R: Types, conditional statements, loops, plots, arrays, and importing/exporting data.
-
Introduction to Github.
-
Reporting using RMarkdown and RStudio.
-
Task 1:
- Reading a genomic data set in R.
- Producing summary statistics for phenotypes and genotypes.
- Report results.
-
-
Week 2: Descriptive statistics and association analysis
-
Variance, covariance and correlation.
-
Simple linear regression.
-
Introduction to Genome Wide Association (GWA) Analysis.
-
Task 2:
- Produce a Manhattan plot.
- Identify SNPs significantly associated with a trait.
- Report results.
-
-
Week 3: Beyond single-marker-phenotype analysis:
-
Multiple-linear regression (models, estimation, and goodness of fit).
-
Training versus testing accuracy.
-
The curse of dimensionality.
-
Task 3:
- Fit a multiple regression model to a training set (we will consider various approaches).
- Evaluate prediction accuracy in the training and testing set.
- Report results.
-
-
Introduction to R:
- RIntro in GitHub
- R for Data Science
- Too many youtube videos to be listed here!
-
RMarkdown:
-
Introduction to GWAS:
- Will add a couple of references here...
Date | Time | Activity | Matierials |
---|---|---|---|
M., July 19, 2021 | 5:00pm-6:00pm | Lecture | |
Wed., July 21, 2021 | 5:00pm-6:00pm | Q&A + short lecture | |
Fr., July 23, 2021 | 5:00pm-6:00pm | Presentation of Reports | |
M., July 26, 2021 | 5:00pm-6:00pm | Lecture | |
Wed., July 28, 2021 | 5:00pm-6:00pm | Q&A + short lecture | |
Fr., July 30, 2021 | 5:00pm-6:00pm | Presentation of Reports | |
M., August 2, 2021 | 5:00pm-6:00pm | Lecture | |
Wed., August 3, 2021 | 5:00pm-6:00pm | Q&A + short lecture | |
Fr., August 5, 2021 | 5:00pm-6:00pm | Presentation of Reports |