This repository contains the source files for the course Python for Data Science
taught in the second year (Master 1) at ENSAE.
The course website is available here:
🌐 https://pythonds.linogaliana.fr/
Some visualizations produced during the course:
This course is suitable for both beginners and advanced learners.
The syllabus below is fully clickable and collapsible.
1. Getting started: why Python for data science?
🔗 https://pythonds.linogaliana.fr/en/content/getting-started/
- Getting a functional Python environment for data science
- How to deal with a data set
- Python basics
2. Data wrangling
🔗 https://pythonds.linogaliana.fr/en/content/manipulation/
- Numpy, the foundation of data science
- Introduction to Pandas
- Data wrangling with Pandas
- Spatial data with GeoPandas
- Webscraping with Python
- Retrieving data with APIs
- Mastering regular expressions
- Importing data from Parquet and S3
3. Data visualisation and communication
🔗 https://pythonds.linogaliana.fr/en/content/visualisation/
- Building graphics with Python
- Introduction to cartography
4. Modeling
🔗 https://pythonds.linogaliana.fr/en/content/modelisation/
- Why preprocessing matters
- Evaluating model quality
- Introduction to classification
- Introduction to regression
- Feature selection
- Clustering
5. Natural Language Processing (NLP)
🔗 https://pythonds.linogaliana.fr/en/content/nlp/
- Cleaning and structuring texts
- Bag-of-words approach
- Text embeddings
The course content relies heavily on open data, including French datasets (from data.gouv
and Insee) and American datasets.
Complementary course with Romain Avouac (@avouacr):
https://ensae-reproductibilite.github.io/website/
Tip
Run examples instantly on SSP Cloud or Google Colab. Here is an example for Pandas
chapter:
I welcome contributions!