CALDISS course: Exploratory Data Analysis in Python

Roman Jurowetzki, 19/3 - 2019

Building on a basic knowledge of python and statistics, this workshop series teaches you how to do exploratory data analysis with python in Jupyter Notebooks. The purpose of exploratory data analysis (EDA) is to apply an inductive approach to data and gain insights from data without necessarily working from a pre-defined hypothesis.

In this 2-day workshop we will briefly review the usage of Python Pandas package for tabular data (that has been in the focus of an earlier CALDISS workshop).

Contants and methods

We will be working with computational notebooks (Jupyter Notebooks) and I will be using Google Colab (free Google Docs for Python on hosted instances). You can do the same with your own anaconda installation or any Jupyter installation of your choice (however you may need to install packages).

Day one will cover various ways to explore larger datasets with a time dimension as well as visualize some of the results. We will be working with the Dataset from Stanford’s Open Policing project: https://openpolicing.stanford.edu/.

Notebook for Day 1

Day two will go deeper, exploring unsupervised machine learning techniques (dimensionality reduction – PCA, NMF, T-SNE, UMAP and clustering – Kmeans, Hierarchical, (H)DBSCAN) for exploratory analysis.

Notebook for Day 2

Some other resources

https://mlcourse.ai/: Notebook based course in machine learning (we covered Topics 1 and 7 in the past 2 days)
https://www.fast.ai/: If you want to go (much) further. Consider also the Computation Linear Algebra courses they offer.
get access to Datacamp or Dataquest (free trial)
http://datasciencemasters.org/: Curriculum of many resources to consider if you want to really get skilled...including maths

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
CALDISS_EDA1.ipynb		CALDISS_EDA1.ipynb
CALDISS_EDA2.ipynb		CALDISS_EDA2.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CALDISS course: Exploratory Data Analysis in Python

Roman Jurowetzki, 19/3 - 2019

Contants and methods

Some other resources

About

Releases

Packages

Languages

RJuro/CALDISS-EDA

Folders and files

Latest commit

History

Repository files navigation

CALDISS course: Exploratory Data Analysis in Python

Roman Jurowetzki, 19/3 - 2019

Contants and methods

Some other resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages