This repository contains the code for the lecture/seminar "Learning from Data" at the Leuphana University Lüneburg taking place in the winter term 2022/23. The course is part of the master program Management & Data Science and is taught by Prof. Burkhardt Funk and Jonas Scharfenberger. This repository will be updated with the course material during the semester. The material will be available in the form of .py files and Jupyter notebooks.
The course will cover
- Theoretical foundation of statistical learning
- Learning settings and frameworks
- Linear models
- Regularization and feature selection
- Model evaluation
- Neuronal nets, SVMs and their application
- Introduction, Confusion Matrix & ROC Curve
- Learning Problem, Types of Machine Learning & Perceptron Learning Algorithm
- Linear Regression, Gradient Descent & Stochastic Gradient Descent
- Logistic Regression, Error Functions & Gaussian Discriminant Analysis
- Decision Trees, Entropy Measurement & Gain Ratio
- Learnability, Concept Learning & Hoeffding's Inequality
- Generalization Theory, Breakpoints of Hypothesis Sets & VC Dimension
- Bias and Variance, Model Complexity & Choice of Hypothesis Set
- Support-Vector-Machines, Lagrange Formulation & Quadratic Programming
- Neural Networks, Impact of Layers & Backpropagation
- Feature Space, Transformations & Radial-Basis Functions
- Overfitting, Regularization & Cross-Validation
- Bagging, Boosting & Random Forests
- Clustering, K-Means & Expectation-Maximization
Within the file lecture_12_overfitting.py some experiments from the slides are replicated, namely the overfitting depending on the noise level, the target complexity and the number of data points is studied. The code provides the option to use multiprocessing in order to speed up the calculations. If you want to use the multithreading version, please choose a proper number of threads - otherwise your computer might be running out of capacity. If you want to use the single-threaded version, please set the variable USE_MULTIPROCESSING to False.
- Perceptron Learning Algorithm
- Linear Regression
- Logistic Regression
- Decision Trees
- Hoeffding's Inequality
- VC Dimension & Growth Functions
- Bias-Variance-Tradeoff
- Lagrange-Formulation & SVM
- MLPs & Backpropagation
- Non-Linear Transformations & RBFs
- Regularization & Overfitting
Within the implementation of the Support-Vector-Machine in assignment_08.py the cvxopt package is used to solve the Lagrange formulation in order to receive the alphas. Since the standard solver does not always yield the correct solution based on the input features, the Mosek solver is used instead (the mosek package is included in the requirements file). However, a licence is required to make use of the Mosek solver. To get your own licence file, please request an academic-licence on the Mosek-Page.
General remark: When committing your changes, please make use of the guidelines on how to write commit messages!
If you want to add your solutions, remarks or anything else to this repository, choose one of the following ways:
- Create a fork of this repository, commit your changes and open a pull request to the master branch.
- Send me a message so that I can add you as a contributor to the repository. Then create a new branch from the master named "dev/your_name", commit your changes and open a pull request to the master branch. Please do not commit to the master branch directly!
If you want to see any specific content in the repo, feel free to open up a new issue on GitHub and briefly describe what you want included in the repository. In case of any questions, please send me a short message.