Building on a basic knowledge of python and statistics, this workshop series teaches you how to do exploratory data analysis with python in Jupyter Notebooks. The purpose of exploratory data analysis (EDA) is to apply an inductive approach to data and gain insights from data without necessarily working from a pre-defined hypothesis.
In this 2-day workshop we will briefly review the usage of Python Pandas package for tabular data (that has been in the focus of an earlier CALDISS workshop).
We will be working with computational notebooks (Jupyter Notebooks) and I will be using Google Colab (free Google Docs for Python on hosted instances). You can do the same with your own anaconda installation or any Jupyter installation of your choice (however you may need to install packages).
Day one will cover various ways to explore larger datasets with a time dimension as well as visualize some of the results. We will be working with the Dataset from Stanford’s Open Policing project: https://openpolicing.stanford.edu/.
Day two will go deeper, exploring unsupervised machine learning techniques (dimensionality reduction – PCA, NMF, T-SNE, UMAP and clustering – Kmeans, Hierarchical, (H)DBSCAN) for exploratory analysis.
- https://mlcourse.ai/: Notebook based course in machine learning (we covered Topics 1 and 7 in the past 2 days)
- https://www.fast.ai/: If you want to go (much) further. Consider also the Computation Linear Algebra courses they offer.
- get access to Datacamp or Dataquest (free trial)
- http://datasciencemasters.org/: Curriculum of many resources to consider if you want to really get skilled...including maths