Skip to content

linogaliana/python-datascientist

Repository files navigation

Data Science with Python

DOI Production deployment

Note

This is the English 🇬🇧🇺🇸 version of the README.
To see the French 🇫🇷 version, click here:
fr


📚 About

This repository contains the source files for the course Python for Data Science taught in the second year (Master 1) at ENSAE.

The course website is available here:
🌐 https://pythonds.linogaliana.fr/


🎨 Gallery

Some visualizations produced during the course:

Figure 1 Figure 7 Figure 3 Figure 8
Figure 5 Figure 6 Figure 2 Figure 4
Figure 13 Figure 9 Figure 14 Figure 11
Figure 15 Figure 16 Figure 10 Figure 12


📖 Course content

This course is suitable for both beginners and advanced learners.
The syllabus below is fully clickable and collapsible.

1. Getting started: why Python for data science?

🔗 https://pythonds.linogaliana.fr/en/content/getting-started/

  • Getting a functional Python environment for data science
  • How to deal with a data set
  • Python basics
2. Data wrangling

🔗 https://pythonds.linogaliana.fr/en/content/manipulation/

  • Numpy, the foundation of data science
  • Introduction to Pandas
  • Data wrangling with Pandas
  • Spatial data with GeoPandas
  • Webscraping with Python
  • Retrieving data with APIs
  • Mastering regular expressions
  • Importing data from Parquet and S3
3. Data visualisation and communication

🔗 https://pythonds.linogaliana.fr/en/content/visualisation/

  • Building graphics with Python
  • Introduction to cartography
4. Modeling

🔗 https://pythonds.linogaliana.fr/en/content/modelisation/

  • Why preprocessing matters
  • Evaluating model quality
  • Introduction to classification
  • Introduction to regression
  • Feature selection
  • Clustering
5. Natural Language Processing (NLP)

🔗 https://pythonds.linogaliana.fr/en/content/nlp/

  • Cleaning and structuring texts
  • Bag-of-words approach
  • Text embeddings

🔗 Resources

The course content relies heavily on open data, including French datasets (from data.gouv and Insee) and American datasets.

Complementary course with Romain Avouac (@avouacr):
https://ensae-reproductibilite.github.io/website/


🚀 Accessing the course in Jupyter Notebooks

Tip

Run examples instantly on SSP Cloud or Google Colab. Here is an example for Pandas chapter:

SSP Cloud VSCode SSP Cloud Jupyter Open in Colab


🤝 Contributing

I welcome contributions!

Note

See the guide for contributors:
CONTRIBUTING.md

About

Dépôt associé au cours Python pour data scientists (ENSAE 2e année)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors 19

Languages