Skip to content

RJuro/CALDISS-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

CALDISS course: Exploratory Data Analysis in Python

Roman Jurowetzki, 19/3 - 2019

Building on a basic knowledge of python and statistics, this workshop series teaches you how to do exploratory data analysis with python in Jupyter Notebooks. The purpose of exploratory data analysis (EDA) is to apply an inductive approach to data and gain insights from data without necessarily working from a pre-defined hypothesis.

In this 2-day workshop we will briefly review the usage of Python Pandas package for tabular data (that has been in the focus of an earlier CALDISS workshop).

Contants and methods

We will be working with computational notebooks (Jupyter Notebooks) and I will be using Google Colab (free Google Docs for Python on hosted instances). You can do the same with your own anaconda installation or any Jupyter installation of your choice (however you may need to install packages).

alt text

Day one will cover various ways to explore larger datasets with a time dimension as well as visualize some of the results. We will be working with the Dataset from Stanford’s Open Policing project: https://openpolicing.stanford.edu/.

Notebook for Day 1

Day two will go deeper, exploring unsupervised machine learning techniques (dimensionality reduction – PCA, NMF, T-SNE, UMAP and clustering – Kmeans, Hierarchical, (H)DBSCAN) for exploratory analysis.

Notebook for Day 2

Some other resources

  • https://mlcourse.ai/: Notebook based course in machine learning (we covered Topics 1 and 7 in the past 2 days)
  • https://www.fast.ai/: If you want to go (much) further. Consider also the Computation Linear Algebra courses they offer.
  • get access to Datacamp or Dataquest (free trial)
  • http://datasciencemasters.org/: Curriculum of many resources to consider if you want to really get skilled...including maths

About

Exploratpry Data Analysis in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages