diff --git a/1-introduction.md b/1-introduction.md new file mode 100644 index 00000000..f8489275 --- /dev/null +++ b/1-introduction.md @@ -0,0 +1,474 @@ +--- +title: 'Introduction' +teaching: 40 +exercises: 15 +--- + +:::::::::::::::::::::::::::::::::::::: questions + +- "What is Deep Learning?" +- "When does it make sense to use and not use Deep Learning?" +- "When is it successful?" +- "What are the tools involved?" +- "What is the workflow for Deep Learning?" +- "Why did we choose to use Keras in this lesson?" +- "How do neural networks learn?" + +:::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::: objectives + +- "Recall the sort of problems for which Deep Learning is a useful tool" +- "List some of the available tools for Deep Learning" +- "Recall the steps of a Deep Learning workflow" +- "Identify the inputs and outputs of a deep neural network." +- "Explain the operations performed in a single neuron" +- "Test that you have correctly installed the Keras, Seaborn and Sklearn libraries" +- "Describe what a loss function is" + +:::::::::::::::::::::::::::::::::::::::::::::::: + + +## What is Deep Learning? + + +### Deep Learning, Machine Learning and Artificial Intelligence + +Deep Learning (DL) is just one of many techniques collectively known as machine learning. Machine learning (ML) refers to techniques where a computer can "learn" patterns in data, usually by being shown numerous examples to train it. People often talk about machine learning being a form of artificial intelligence (AI). Definitions of artificial intelligence vary, but usually involve having computers mimic the behaviour of intelligent biological systems. Since the 1950s many works of science fiction have dealt with the idea of an artificial intelligence which matches (or exceeds) human intelligence in all areas. Although there have been great advances in AI and ML research recently we can only come close to human like intelligence in a few specialist areas and are still a long way from a general purpose AI. +The image below shows some differences between artificial intelligence, Machine Learning and Deep Learning. + + +![ +Image credit: Tukijaaliwa, CC BY-SA 4.0, via Wikimedia Commons, +[original source]( https://en.wikipedia.org/wiki/File:AI-ML-DL.svg) +](fig/01_AI_ML_DL_differences.png){ +alt='An infographic showing the relation of AI, ML, NN and DL. NN are methods in DL which is a subset of ML algorithms that falls within the umbrella of AI' +} + + +#### Neural Networks + +A neural network is an artificial intelligence technique loosely based on the way neurons in the brain work. + +##### A single neuron +A neural network consists of connected computational units called **neurons**. Each neuron ... + +- has one or more inputs ($x_1, x_2, ...$), e.g. input data expressed as floating point numbers +- most of the time, each neuron conducts 3 main operations: + + take the weighted sum of the inputs where ($w_1, w_2, ... $) indicate weights + + add an extra constant weight (i.e. a bias term) to this weighted sum + + apply a non-linear function to the output so far (using a predefined activation function such as the ReLU function) +- return one output value, again a floating point number. +- one example equation to calculate the output for a neuron is: $output = ReLU(\sum_{i} (x_i*w_i) + bias)$ + + +![](fig/01_neuron.png){alt='A diagram of a single artificial neuron combining inputs and weights using an activation function.' width='600px'} + +##### Combining multiple neurons into a network +Multiple neurons can be joined together by connecting the output of one to the input of another. These connections are associated with weights that determine the 'strength' of the connection, the weights are adjusted during training. In this way, the combination of neurons and connections describe a computational graph, an example can be seen in the image below. In most neural networks neurons are aggregated into layers. Signals travel from the input layer to the output layer, possibly through one or more intermediate layers called hidden layers. +The image below shows an example of a neural network with three layers, each circle is a neuron, each line is an edge and the arrows indicate the direction data moves in. + +![ +Image credit: Glosser.ca, CC BY-SA 3.0 , via Wikimedia Commons, +[original source](https://commons.wikimedia.org/wiki/File:Colored_neural_network.svg) +](fig/01_neural_net.png){ +alt='A diagram of a three layer neural network with an input layer, one hidden layer, and an output layer.' +} + +::: challenge +## Neural network calculations +. + +#### 1. Calculate the output for one neuron +Suppose we have: + +- Input: X = (0, 0.5, 1) +- Weights: W = (-1, -0.5, 0.5) +- Bias: b = 1 +- Activation function _relu_: `f(x) = max(x, 0)` + +What is the output of the neuron? + +_Note: You can use whatever you like: brain only, pen&paper, Python, Excel..._ + +#### 2. (optional) Calculate outputs for a network + +Have a look at the following network: + +![](fig/01_xor_exercise.png){alt='A diagram of a neural network with 2 inputs, 2 hidden layer neurons, and 1 output.' width='400px'} + +a. Calculate the output of the network for the following combinations of inputs: + +| x1 | x2 | y | +|----|----|---| +| 0 | 0 | ..| +| 0 | 1 | ..| +| 1 | 0 | ..| +| 1 | 1 | ..| + +b. What logical problem does this network solve? + +:::: solution +## Solution + +#### 1: calculate the output for one neuron + +You can calculate the output as follows: + +* Weighted sum of input: `0 * (-1) + 0.5 * (-0.5) + 1 * 0.5 = 0.25` +* Add the bias: `0.25 + 1 = 1.25` +* Apply activation function: `max(1.25, 0) = 1.25` + +So, the neuron's output is `1.25` + +#### 2: Calculate outputs for a network +a. +| x1 | x2 | y | +|----|----|--- | +| 0 | 0 | **0** | +| 0 | 1 | **1** | +| 1 | 1 | **0** | +| 1 | 0 | **1** | +b. This solves the XOR logical problem, the output is 1 if only one of the two inputs is 1. + +:::: +::: + +::: challenge +## Activation functions +Look at the following activation functions: + +![A. Sigmoid activation function](fig/01_sigmoid.svg){alt='Plot of the sigmoid function' width='200px'} + + +![B. ReLU activation function](fig/01_relu.svg){alt='Plot of the ReLU function' width='200px'} + + +![C. Identity (or linear) activation function](fig/01_identity_function.svg){alt='Plot of the Identity function' width='200px'} + +Combine the following statements to the correct activation function: + +1. This function enforces the activation of a neuron to be between 0 and 1 +2. This function is useful in regression tasks when applied to an output neuron +3. This function is the most popular activation function in hidden layers, since it introduces non-linearity in a computationally efficient way. +4. This function is useful in classification tasks when applied to an output neuron +5. (optional) For positive values this function results in the same activations as the identity function. +6. (optional) This function is not differentiable at 0 +7. (optional) This function is the default for Dense layers (search the Keras documentation!) + +*Activation function plots by Laughsinthestocks - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=44920411, +https://commons.wikimedia.org/w/index.php?curid=44920600, https://commons.wikimedia.org/w/index.php?curid=44920533* + +:::: solution +## Solution +1. A +2. C +3. B +4. A +5. B +6. B +7. C +:::: +::: + + +##### What makes deep learning deep learning? +Neural networks aren't a new technique, they have been around since the late 1940s. But until around 2010 neural networks tended to be quite small, consisting of only 10s or perhaps 100s of neurons. This limited them to only solving quite basic problems. Around 2010 improvements in computing power and the algorithms for training the networks made much larger and more powerful networks practical. These are known as deep neural networks or Deep Learning. + +Deep Learning requires extensive training using example data which shows the network what output it should produce for a given input. One common application of Deep Learning is classifying images. Here the network will be trained by being "shown" a series of images and told what they contain. Once the network is trained it should be able to take another image and correctly classify its contents. But we are not restricted to just using images, any kind of data can be learned by a Deep Learning neural network. This makes them able to appear to learn a set of complex rules only by being shown what the inputs and outputs of those rules are instead of being taught the actual rules. Using these approaches Deep Learning networks have been taught to play video games and even drive cars. The data on which networks are trained usually has to be quite extensive, typically including thousands of examples. For this reason they are not suited to all applications and should be considered just one of many machine learning techniques which are available. + +While traditional "shallow" networks might have had between three and five layers, deep networks often have tens or even hundreds of layers. This leads to them having millions of individual weights. +The image below shows a diagram of all the layers (there are too many neurons to draw them all) on a Deep Learning network designed to detect pedestrians in images. +The input (left most) layer of the network is an image and the final (right most) layer of the network outputs a zero or one to determine if the input data belongs to the class of data we are interested in. +This image is from the paper ["An Efficient Pedestrian Detection Method Based on YOLOv2" by Zhongmin Liu, Zhicai Chen, Zhanming Li, and Wenjin Hu published in Mathematical Problems in Engineering, Volume 2018](https://doi.org/10.1155/2018/3518959) + +![](fig/01_deep_network.png){alt='An example of a deep neural network'} + +### How do neural networks learn? +What happens in a neural network during the training process? +The ultimate goal is of course to find a model that makes predictions that are as close to the target value as possible. +In other words, the goal of training is to find the best set of parameters (weights and biases) +that bring the error between prediction and expected value to a minimum. +The total error between prediction and expected value is quantified in a loss function (also called cost function). +There are lots of loss functions to pick from, and it is important that you pick one that matches your problem definition well. +We will look at an example of a loss function in the next exercise. + +::: challenge +## Exercise: Loss function +. + +#### 1. Compute the Mean Squared Error +One of the simplest loss functions is the Mean Squared Error. MSE = $\frac{1}{n} \Sigma_{i=1}^n({y}-\hat{y})^2$ . +It is the mean of all squared errors, where the error is the difference between the predicted and expected value. +In the following table, fill in the missing values in the 'squared error' column. What is the MSE loss +for the predictions on these 4 samples? + +| **Prediction** | **Expected value** | **Squared error** | +|----------------|--------------------|-------------------| +| 1 | -1 | 4 | +| 2 | -1 | .. | +| 0 | 0 | .. | +| 3 | 2 | .. | +| | **MSE:** | .. | + +#### 2. (optional) Huber loss +A more complicated and less used loss function for regression is the [Huber loss](https://keras.io/api/losses/regression_losses/#huber-class). + +Below you see the Huber loss (green, delta = 1) and Squared error loss (blue) +as a function of `y_true - y_pred`. + +![](fig/01_huber_loss.png){alt='Huber loss (green, delta = 1) and squared error loss (blue) +as a function of y_true - y_pred' width='400px'} + +Which loss function is more sensitive to outliers? + +:::: solution +## Solution +#### 1. 'Compute the Mean Squared Error' +| **Prediction** | **Expected value** | **Squared error** | +|----------------|--------------------|-------------------| +| 1 | -1 | 4 | +| 2 | -1 | 9 | +| 0 | 0 | 0 | +| 3 | 2 | 1 | +| | **MSE:** | 3.5 | + +#### 2. 'Huber loss' +The squared error loss is more sensitive to outliers. Errors between -1 and 1 result in the same loss value +for both loss functions. But, larger errors (in other words: outliers) result in quadratically larger losses for +the Mean Squared Error, while for the Huber loss they only increase linearly. +:::: +::: + +So, a loss function quantifies the total error of the model. +The process of adjusting the weights in such a way as to minimize the loss function is called 'optimization'. +We will dive further into how optimization works in episode 3. +For now, it is enough to understand that during training the weights in the network are adjusted so that the loss decreases through the process of optimization. +This ultimately results in a low loss, and this, generally, implies predictions that are closer to the expected values. + +### What sort of problems can Deep Learning solve? + +* Pattern/object recognition +* Segmenting images (or any data) +* Translating between one set of data and another, for example natural language translation. +* Generating new data that looks similar to the training data, often used to create synthetic datasets, art or even "deepfake" videos. + * This can also be used to give the illusion of enhancing data, for example making images look sharper, video look smoother or adding colour to black and white images. But beware of this, it is not an accurate recreation of the original data, but a recreation based on something statistically similar, effectively a digital imagination of what that data could look like. + +#### Examples of Deep Learning in Research + +Here are just a few examples of how Deep Learning has been applied to some research problems. Note: some of these articles might be behind paywalls. + +* [Detecting COVID-19 in chest X-ray images](https://arxiv.org/abs/2003.09871) +* [Forecasting building energy load](https://ieeexplore.ieee.org/document/7793413) +* [Protein function prediction](https://pubmed.ncbi.nlm.nih.gov/29039790/) +* [Simulating Chemical Processes](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401) +* [Help to restore ancient murals](https://heritagesciencejournal.springeropen.com/articles/10.1186/s40494-020-0355-x) + + +### What sort of problems can Deep Learning not solve? + +* Any case where only a small amount of training data is available. +* Tasks requiring an explanation of how the answer was arrived at. +* Classifying things which are nothing like their training data. + +### What sort of problems can Deep Learning solve, but should not be used for? + +Deep Learning needs a lot of computational power, for this reason it often relies on specialised hardware like graphical processing units (GPUs). Many computational problems can be solved using less intensive techniques, but could still technically be solved with Deep Learning. + +The following could technically be achieved using Deep Learning, but it would probably be a very wasteful way to do it: + +* Logic operations, such as computing totals, averages, ranges etc. (see [this example](https://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow) applying Deep Learning to solve the "FizzBuzz" problem often used for programming interviews) +* Modelling well defined systems, where the equations governing them are known and understood. +* Basic computer vision tasks such as edge detection, decreasing colour depth or blurring an image. + +::: challenge +## Deep Learning Problems Exercise + +Which of the following would you apply Deep Learning to? + +1. Recognising whether or not a picture contains a bird. +2. Calculating the median and interquartile range of a dataset. +3. Identifying MRI images of a rare disease when only one or two example images available for training. +4. Identifying people in pictures after being trained only on cats and dogs. +5. Translating English into French. + +:::: solution +## Solution +1. and 5 are the sort of tasks often solved with Deep Learning. +2. is technically possible but solving this with Deep Learning would be extremely wasteful, you could do the same with much less computing power using traditional techniques. +3. will probably fail because there is not enough training data. +4. will fail because the Deep Learning system only knows what cats and dogs look like, it might accidentally classify the people as cats or dogs. +:::: +::: + +## How much data do you need for Deep Learning? +The rise of Deep Learning is partially due to the increased availability of very large datasets. +But how much data do you actually need to train a Deep Learning model? +Unfortunately, this question is not easy to answer. It depends, among other things, on the +complexity of the task (which you often do not know beforehand), the quality of the available dataset and the complexity of the network. For complex tasks with large neural networks, we often see that adding more data continues to improve performance. However, this is also not a generic truth: if the data you add is too similar to the data you already have, it will not give much new information to the neural network. + +::: callout + +## What if I do not have enough data? + +In case you have too little data available to train a complex network from scratch, it is sometimes possible to use a pretrained network that was trained on a similar problem. Another trick is data augmentation, where you expand the dataset with artificial data points that could be real. An example of this is mirroring images when trying to classify cats and dogs. An horizontally mirrored animal retains the label, but exposes a different view. +::: + +## Deep Learning workflow + +To apply Deep Learning to a problem there are several steps we need to go through: + +### 1. Formulate/ Outline the problem + +Firstly we must decide what it is we want our Deep Learning system to do. Is it going to classify some data into one of a few categories? For example if we have an image of some hand written characters, the neural network could classify which character it is being shown. Or is it going to perform a prediction? For example trying to predict what the price of something will be tomorrow given some historical data on pricing and current trends. + +[//]: # "What about pattern association tasks like language translation?" + +### 2. Identify inputs and outputs + +Next we need to identify what the inputs and outputs of the neural network will be. This might require looking at our data and deciding what features of the data we can use as inputs. If the data is images then the inputs could be the individual pixels of the images. + +For the outputs we will need to look at what we want to identify from the data. If we are performing a classification problem then typically we will have one output for each potential class. + + +### 3. Prepare data + +Many datasets are not ready for immediate use in a neural network and will require some preparation. Neural networks can only really deal with numerical data, so any non-numerical data (for example words) will have to be somehow converted to numerical data. + +Next we will need to divide the data into multiple sets. +One of these will be used by the training process and we will call it the training set. +Another will be used to evaluate the accuracy of the training and we will call that one the test set. +Sometimes we will also use a 3rd set known as a validation set to refine the model. + +### 4. Choose a pre-trained model or build a new architecture from scratch + +Often we can use an existing neural network instead of designing one from scratch. Training a network can take a lot of time and computational resources. There are a number of well publicised networks which have been shown to perform well at certain tasks, if you know of one which already does a similar task well then it makes sense to use one of these. + +If instead we decide we do want to design our own network then we need to think about how many input neurons it will have, how many hidden layers and how many outputs, what types of layers we use (we will explore the different types later on). This will probably need some experimentation and we might have to try tweaking the network design a few times before we see acceptable results. + + +### 5. Choose a loss function and optimizer + +The loss function tells the training algorithm how far away the predicted value was from the true value. We will look at choosing a loss function in more detail later on. + +The optimizer is responsible for taking the output of the loss function and then applying some changes to the weights within the network. It is through this process that the "learning" (adjustment of the weights) is achieved. + + +### 6. Train the model + +We can now go ahead and start training our neural network. We will probably keep doing this for a given number of iterations through our training dataset (referred to as _epochs_) or until the loss function gives a value under a certain threshold. The graph below show the loss against the number of _epochs_, generally the loss will go down with each _epoch_, but occasionally it will see a small rise. + +![](fig/training-0_to_1500.svg){alt='A graph showing an exponentially decreasing loss over the first 1500 epochs of training an example network.'} + +### 7. Perform a Prediction/Classification + +After training the network we can use it to perform predictions. This is the mode you would +use the network in after you have fully trained it to a satisfactory performance. Doing +predictions on a special hold-out set is used in the next step to measure the performance +of the network. + +### 8. Measure Performance + +Once we trained the network we want to measure its performance. To do this we use some additional data that was not part of the training, this is known as a test set. There are many different methods available for measuring performance and which one is best depends on the type of task we are attempting. These metrics are often published as an indication of how well our network performs. + +### 9. Refine the model + +We refine the model further. We can for example slightly change the architecture of the model, or change the number of nodes in a layer. +Hyperparameters are all the parameters set by the person configuring the machine learning instead of those learned by the algorithm itself. +The hyperparameters include the number of epochs or the parameters for the optimizer. +It might be necessary to adjust these and re-run the training many times before we are happy with the result, this is often done automatically and that is referred to as hyperparameter tuning. + +### 10. Share Model + +Now that we have a trained network that performs at a level we are happy with we can go and use it on real data to perform a prediction. At this point we might want to consider publishing a file with both the architecture of our network and the weights which it has learned (assuming we did not use a pre-trained network). This will allow others to use it as as pre-trained network for their own purposes and for them to (mostly) reproduce our result. + + +::: challenge +## Deep Learning workflow exercise + +Think about a problem you would like to use Deep Learning to solve. + +1. What do you want a Deep Learning system to be able to tell you? +2. What data inputs and outputs will you have? +3. Do you think you will need to train the network or will a pre-trained network be suitable? +4. What data do you have to train with? What preparation will your data need? Consider both the data you are going to predict/classify from and the data you will use to train the network. + +:::: solution +Discuss your answers with the group or the person next to you. +:::: +::: + + +## Deep Learning Libraries + +There are many software libraries available for Deep Learning including: + +### TensorFlow + +[TensorFlow](https://www.tensorflow.org/) was developed by Google and is one of the older Deep Learning libraries, ported across many languages since it was first released to the public in 2015. It is very versatile and capable of much more than Deep Learning but as a result it often takes a lot more lines of code to write Deep Learning operations in TensorFlow than in other libraries. It offers (almost) seamless integration with GPU accelerators and Google's own TPU (Tensor Processing Unit) chips that are built specially for machine learning. + +### PyTorch + +[PyTorch](https://pytorch.org/) was developed by Facebook in 2016 and is a popular choice for Deep Learning applications. It was developed for Python from the start and feels a lot more "pythonic" than TensorFlow. Like TensorFlow it was designed to do more than just Deep Learning and offers some very low level interfaces. [PyTorch Lightning](https://www.pytorchlightning.ai/) offers a higher level interface to PyTorch to set up experiments. Like TensorFlow it is also very easy to integrate PyTorch with a GPU. In many benchmarks it outperforms the other libraries. + +### Keras + +[Keras](https://keras.io/) is designed to be easy to use and usually requires fewer lines of code than other libraries. We have chosen it for this workshop for that reason. Keras can actually work on top of TensorFlow (and several other libraries), hiding away the complexities of TensorFlow while still allowing you to make use of their features. + +The performance of Keras is sometimes not as good as other libraries and if you are going to move on to create very large networks using very large datasets then you might want to consider one of the other libraries. But for many applications the performance difference will not be enough to worry about and the time you will save with simpler code will exceed what you will save by having the code run a little faster. + +Keras also benefits from a very good set of [online documentation](https://keras.io/guides/) and a large user community. You will find that most of the concepts from Keras translate very well across to the other libraries if you wish to learn them at a later date. + +### Installing Keras and other dependencies + +Follow the instructions in the [setup]({{ page.root }}//setup) document to install Keras, Seaborn and Sklearn. + +## Testing Keras Installation +Lets check you have a suitable version of Keras installed. +Open up a new Jupyter notebook or interactive python console and run the following commands: +```python +from tensorflow import keras +print(keras.__version__) +``` +```output +2.12.0 +``` +You should get a version number reported. At the time of writing 2.12.0 is the latest version. + +## Testing Seaborn Installation +Lets check you have a suitable version of seaborn installed. +In your Jupyter notebook or interactive python console run the following commands: +```python +import seaborn +print(seaborn.__version__) +``` +```output +0.12.2 +``` +You should get a version number reported. At the time of writing 0.12.2 is the latest version. + +## Testing Sklearn Installation +Lets check you have a suitable version of sklearn installed. +In your Jupyter notebook or interactive python console run the following commands: +```python +import sklearn +print(sklearn.__version__) +``` +```output +1.2.2 +``` +You should get a version number reported. At the time of writing 1.2.2 is the latest version. + + +:::::::::::::::::::::::::::::::::::::: keypoints + +- "Machine learning is the process where computers learn to recognise patterns of data." +- "Artificial neural networks are a machine learning technique based on a model inspired by groups of neurons in the brain." +- "Artificial neural networks can be trained on example data." +- "Deep Learning is a machine learning technique based on using many artificial neurons arranged in layers." +- "Neural networks learn by minimizing a loss function." +- "Deep Learning is well suited to classification and prediction problems such as image recognition." +- "To use Deep Learning effectively we need to go through a workflow of: defining the problem, identifying inputs and outputs, preparing data, choosing the type of network, choosing a loss function, training the model, refine the model, measuring performance before we can classify data." +- "Keras is a Deep Learning library that is easier to use than many of the alternatives such as TensorFlow and PyTorch." + +:::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/2-keras.md b/2-keras.md new file mode 100644 index 00000000..fc6c26bd --- /dev/null +++ b/2-keras.md @@ -0,0 +1,785 @@ +--- +title: "Classification by a neural network using Keras" +teaching: 45 +exercises: 50 +--- + +::: questions +- "What is a neural network?" +- "How do I compose a Neural Network using Keras?" +- "How do I train this network on a dataset?" +- "How do I get insight into learning process?" +- "How do I measure the performance of the network?" +::: + +::: objectives +- "Use the deep learning workflow to structure the notebook" +- "Explore the dataset using pandas and seaborn" +- "Use one-hot encoding to prepare data for classification in Keras" +- "Describe a fully connected layer" +- "Implement a fully connected layer with Keras" +- "Use Keras to train a small fully connected network on prepared data" +- "Interpret the loss curve of the training process" +- "Use a confusion matrix to measure the trained networks' performance on a test set" +::: + + +## Introduction +In this episode we will learn how to create and train a Neural Network using Keras to solve a simple classification task. + +The goal of this episode is to quickly get your hands dirty in actually defining and training a neural network, without going into depth of how neural networks work on a technical or mathematical level. +We want you to go through the most commonly used deep learning workflow that was covered +in the introduction. +As a reminder below are the steps of the deep learning workflow: + +1. Formulate / Outline the problem +2. Identify inputs and outputs +3. Prepare data +4. Choose a pretrained model or start building architecture from scratch +5. Choose a loss function and optimizer +6. Train the model +7. Perform a Prediction/Classification +8. Measure performance +9. Refine the model +10. Save model + +In this episode we will focus on a minimal example for each of these steps, later episodes will build on this knowledge to go into greater depth for some or all of these steps. + +::: instructor +This episode really aims to go through the whole process once, as quickly as possible. +In episode 3 we will expand on all the concepts that are lightly inroduced in episode 2. Some concepts like monitoring the training progress, optimization and learning rate are explained in detail in episode 3. +It is good to stress this a few times, because learners will usually have a lot of questions like: +'Why don't we normalize our features' or 'Why do we choose Adam optimizer?'. +It can be a good idea to park some of these questions for discussion in episode 3 and 4. +::: + +::: callout +## GPU usage +For this lesson having a GPU (graphics card) available is not needed. +We specifically use very small toy problems so that you do not need one. +However, Keras will use your GPU automatically when it is available. +Using a GPU becomes necessary when tackling larger datasets or complex problems which +require a more complex Neural Network. +::: + +## 1. Formulate/outline the problem: penguin classification +In this episode we will be using the [penguin dataset](https://zenodo.org/record/3960218), this is a dataset that was published in 2020 by Allison Horst and contains data on three different species of the penguins. + +We will use the penguin dataset to train a neural network which can classify which species a +penguin belongs to, based on their physical characteristics. + +::: callout +## Goal +The goal is to predict a penguins' species using the attributes available in this dataset. +::: + +The `palmerpenguins` data contains size measurements for three penguin species observed on three islands in the Palmer Archipelago, Antarctica. +The physical attributes measured are flipper length, beak length, beak width, body mass, and sex. + +![*Artwork by @allison_horst*][palmer-penguins] + + +![*Artwork by @allison_horst*][penguin-beaks] + + +These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the [Palmer Station Long Term Ecological Research Program](https://pal.lternet.edu/), part of the [US Long Term Ecological Research Network](https://lternet.edu/). The data were imported directly from the [Environmental Data Initiative](https://environmentaldatainitiative.org/) (EDI) Data Portal, and are available for use by CC0 license ("No Rights Reserved") in accordance with the [Palmer Station Data Policy](https://pal.lternet.edu/data/policies). + +## 2. Identify inputs and outputs +To identify the inputs and outputs that we will use to design the neural network we need to familiarize +ourselves with the dataset. This step is sometimes also called data exploration. + +We will start by importing the [Seaborn](https://seaborn.pydata.org/) library that will help us get the dataset and visualize it. +Seaborn is a powerful library with many visualizations. Keep in mind it requires the data to be in a +pandas dataframe, luckily the datasets available in seaborn are already in a pandas dataframe. + +```python +import seaborn as sns +``` + +We can load the penguin dataset using +```python +penguins = sns.load_dataset('penguins') +``` + +This will give you a pandas dataframe which contains the penguin data. + +### Inspecting the data +Using the pandas `head` function gives us a quick look at the data: +```python +penguins.head() +``` + + | | species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | + |------:|---------------:|--------------:|------------------:|------------:|------------:|------------:|------------:| + | 0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male | + | 1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female | + | 2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female | + | 3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | NaN | + | 4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | Female | + + All columns but the 'species' columns are features that we can use. + +Let's look at the shape of the dataset: + + ```python + penguins.shape + ``` + +There are 344 samples and 7 columns, so 6 features + +### Visualization +Looking at numbers like this usually does not give a very good intuition about the data we are +working with, so let us create a visualization. + +#### Pair Plot +One nice visualization for datasets with relatively few attributes is the Pair Plot. +This can be created using `sns.pairplot(...)`. It shows a scatterplot of each attribute plotted against each of the other attributes. +By using the `hue='species'` setting for the pairplot the graphs on the diagonal are layered kernel density estimate plots for the different values of the `species` column. + +```python +sns.pairplot(penguins, hue="species") +``` + +![][pairplot] + +::: challenge + +## Pairplot + +Take a look at the pairplot we created. Consider the following questions: + +* Is there any class that is easily distinguishable from the others? +* Which combination of attributes shows the best separation for all 3 class labels at once? +* (optional) Create a similar pairplot, but with `hue="sex"`. Explain the patterns you see. +Which combination of features distinguishes the two sexes best? + +:::: solution +## Solution +* The plots show that the green class, Gentoo is somewhat more easily distinguishable from the other two. +* The other two seem to be separable by a combination of bill length and bill +depth (other combinations are also possible such as bill length and flipper length). + +Answer to optional question: + +```python +sns.pairplot(penguins, hue='sex') +``` + +![][sex_pairplot] + +You see that for each species females have smaller bills and flippers, as well as a smaller body mass. +You would need a combination of the species and the numerical features to successfully distinguish males from females. +The combination of `bill_depth_mm` and `body_mass_g` gives the best separation. + +:::: +::: + +### Input and Output Selection +Now that we have familiarized ourselves with the dataset we can select the data attributes to use +as input for the neural network and the target that we want to predict. + +In the rest of this episode we will use the `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, `body_mass_g` attributes. +The target for the classification task will be the `species`. + +::: callout +## Data Exploration +Exploring the data is an important step to familiarize yourself with the problem and to help you +determine the relevant inputs and outputs. +::: + +## 3. Prepare data +The input data and target data are not yet in a format that is suitable to use for training a neural network. + + +For now we will only use the numerical features `bill_length_mm`, `bill_depth_mm`, `flipper_length_mm`, `body_mass_g` only, +so let's drop the categorical columns: +```python +# Drop categorical columns +penguins_filtered = penguins.drop(columns=['island', 'sex']) +``` + +### Clean missing values +During the exploration phase you may have noticed that some rows in the dataset have missing (NaN) +values, leaving such values in the input data will ruin the training, so we need to deal with them. +There are many ways to deal with missing values, but for now we will just remove the offending rows by adding a call to `dropna()`: +```python +# Drop the rows that have NaN values in them +penguins_filtered = penguins_filtered.dropna() +``` + +Finally, we select only the features +```python +# Extract columns corresponding to features +features = penguins_filtered.drop(columns=['species']) +``` + +### Prepare target data for training +Second, the target data is also in a format that cannot be used in training. +A neural network can only take numerical inputs and outputs, and learns by +calculating how "far away" the species predicted by the neural network is +from the true species. +When the target is a string category column as we have here it is very difficult to determine this "distance" or error. +Therefore we will transform this column into a more suitable format. +Again there are many ways to do this, however we will be using the one-hot encoding. +This encoding creates multiple columns, as many as there are unique values, and +puts a 1 in the column with the corresponding correct class, and 0's in +the other columns. +For instance, for a penguin of the Adelie species the one-hot encoding would be 1 0 0 + +Fortunately pandas is able to generate this encoding for us. +```python +import pandas as pd + +target = pd.get_dummies(penguins_filtered['species']) +target.head() # print out the top 5 to see what it looks like. +``` + +::: challenge +## One-hot encoding +How many output neurons will our network have now that we one-hot encoded the target class? + +* A: 1 +* B: 2 +* C: 3 + +:::: solution +## Solution +C: 3, one for each output variable class + +:::: +::: + +### Split data into training and test set +Finally, we will split the dataset into a training set and a test set. +As the names imply we will use the training set to train the neural network, +while the test set is kept separate. +We will use the test set to assess the performance of the trained neural network +on unseen samples. +In many cases a validation set is also kept separate from the training and test sets (i.e. the dataset is split into 3 parts). +This validation set is then used to select the values of the parameters of the neural network and the training methods. +For this episode we will keep it at just a training and test set however. + +To split the cleaned dataset into a training and test set we will use a very convenient +function from sklearn called `train_test_split`. + +This function takes a number of parameters which are extensively explained in [the scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) : +- The first two parameters are the dataset (in our case `features`) and the corresponding targets (i.e. defined as target). +- Next is the named parameter `test_size` this is the fraction of the dataset that is +used for testing, in this case `0.2` means 20% of the data will be used for testing. +- `random_state` controls the shuffling of the dataset, setting this value will reproduce +the same results (assuming you give the same integer) every time it is called. +- `shuffle` which can be either `True` or `False`, it controls whether the order of the rows of the dataset is shuffled before splitting. It defaults to `True`. +- `stratify` is a more advanced parameter that controls how the split is done. By setting it to `target` the train and test sets the function will return will have roughly the same proportions (with regards to the number of penguins of a certain species) as the dataset. + +```python +from sklearn.model_selection import train_test_split + +X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=0, shuffle=True, stratify=target) +``` + +## 4. Build an architecture from scratch or choose a pretrained model + +### Keras for neural networks +We will now build our first neural network from scratch. Although this sounds like a daunting task, you will experience that with [Keras](https://keras.io/) it is actually surprisingly straightforward. + +Keras is a machine learning framework with ease of use as one of its main features. +It is part of the tensorflow python package and can be imported using `from tensorflow import keras`. + +Keras includes functions, classes and definitions to define deep learning models, cost functions and optimizers (optimizers are used to train a model). + +Before we move on to the next section of the workflow we need to make sure we have Keras imported. +We do this as follows: +```python +from tensorflow import keras +``` + +For this class it is useful if everyone gets the same results from their training. +Keras uses a random number generator at certain points during its execution. +Therefore we will need to set two random seeds, one for numpy and one for tensorflow: +```python +from numpy.random import seed +seed(1) +keras.utils.set_random_seed(2) +``` + +### Build a neural network from scratch + +Now we will build a neural network from scratch, and although this sounds like +a daunting task, with Keras it is actually surprisingly straightforward. + +With Keras you compose a neural network by creating layers and linking them +together. For now we will only use one type of layer called a fully connected +or Dense layer. In Keras this is defined by the `keras.layers.Dense` class. + +A dense layer has a number of neurons, which is a parameter you can choose when +you create the layer. +When connecting the layer to its input and output layers every neuron in the dense +layer gets an edge (i.e. connection) to ***all*** of the input neurons and ***all*** of the output neurons. +The hidden layer in the image in the introduction of this episode is a Dense layer. + +The input in Keras also gets special treatment, Keras automatically calculates the number of inputs +and outputs a layer needs and therefore how many edges need to be created. +This means we need to inform Keras how big our input is going to be. We do this by instantiating a `keras.Input` class and tell it how big our input is, thus the number of columns it contains. + +```python +inputs = keras.Input(shape=X_train.shape[1]) +``` + +We store a reference to this input class in a variable so we can pass it to the creation of +our hidden layer. +Creating the hidden layer can then be done as follows: +```python +hidden_layer = keras.layers.Dense(10, activation="relu")(inputs) +``` + +The instantiation here has 2 parameters and a seemingly strange combination of parentheses, so +let us take a closer look. +The first parameter `10` is the number of neurons we want in this layer, this is one of the +hyperparameters of our system and needs to be chosen carefully. We will get back to this in the section +on refining the model. +The second parameter is the activation function to use, here we choose relu which is 0 +for inputs that are 0 and below and the identity function (returning the same value) +for inputs above 0. +This is a commonly used activation function in deep neural networks that is proven to work well. +Next we see an extra set of parenthenses with inputs in them, this means that after creating an +instance of the Dense layer we call it as if it was a function. +This tells the Dense layer to connect the layer passed as a parameter, in this case the inputs. +Finally we store a reference so we can pass it to the output layer in a minute. + +Now we create another layer that will be our output layer. +Again we use a Dense layer and so the call is very similar to the previous one. +```python +output_layer = keras.layers.Dense(3, activation="softmax")(hidden_layer) +``` + +Because we chose the one-hot encoding, we use `3` neurons for the output layer. + +The softmax activation ensures that the three output neurons produce values in the range +(0, 1) and they sum to 1. +We can interpret this as a kind of 'probability' that the sample belongs to a certain +species. + +Now that we have defined the layers of our neural network we can combine them into +a Keras model which facilitates training the network. +```python +model = keras.Model(inputs=inputs, outputs=output_layer) +model.summary() +``` + +The model summary here can show you some information about the neural network we have defined. + +::: callout +## Trainable and non-trainable parameters +Keras distinguishes between two types of weights, namely: + +- trainable parameters: these are weights of the neurons that are modified when we train the model in order to minimize our loss function (we will learn about loss functions shortly!). + +- non-trainable parameters: these are weights of the neurons that are not changed when we train the model. These could be for many reasons - using a pre-trained model, choice of a particular filter for a convolutional neural network, and statistical weights for batch normalization are some examples. + +If these reasons are not clear right away, don't worry! In later episodes of this course, we will touch upon a couple of these concepts. +::: + +::: challenge +## Create the neural network +With the code snippets above, we defined a Keras model with 1 hidden layer with +10 neurons and an output layer with 3 neurons. + +1. How many parameters does the resulting model have? +2. What happens to the number of parameters if we increase or decrease the number of neurons + in the hidden layer? + +#### (optional) Keras Sequential vs Functional API +So far we have used the [Functional API](https://keras.io/guides/functional_api/) of Keras. +You can also implement neural networks using [the Sequential model](https://keras.io/guides/sequential_model/). +As you can read in the documentation, the Sequential model is appropriate for **a plain stack of layers** +where each layer has **exactly one input tensor and one output tensor**. + +3. (optional) Use the Sequential model to implement the same network + +:::: solution +## Solution +Have a look at the output of `model.summary()`: +```python +model.summary() +``` + +```output +Model: "model_1" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input_1 (InputLayer) [(None, 4)] 0 +_________________________________________________________________ +dense (Dense) (None, 10) 50 +_________________________________________________________________ +dense_1 (Dense) (None, 3) 33 +================================================================= +Total params: 83 +Trainable params: 83 +Non-trainable params: 0 +_________________________________________________________________ +``` +The model has 83 trainable parameters. + +If you increase the number of neurons in the hidden layer the number of +trainable parameters in both the hidden and output layer increases or +decreases in accordance with the number of neurons added. +Each extra neuron has 4 weights connected to the input layer, 1 bias term, and 3 weights connected to the output layer. +So in total 8 extra parameters. + +*The name in quotes within the string `Model: "model_1"` may be different in your view; this detail is not important.* + +#### (optional) Keras Sequential vs Functional API +3. This implements the same model using the Sequential API: +```python +model = keras.Sequential( + [ + keras.Input(shape=X_train.shape[1]), + keras.layers.Dense(10, activation="relu"), + keras.layers.Dense(3, activation="softmax"), + ] +) +``` + +We will use the Functional API for the remainder of this course, since it is more flexible and more explicit. +:::: +::: + + +::: callout +## How to choose an architecture? +Even for this small neural network, we had to make a choice on the number of hidden neurons. +Other choices to be made are the number of layers and type of layers (as we will see later). +You might wonder how you should make these architectural choices. +Unfortunately, there are no clear rules to follow here, and it often boils down to a lot of +trial and error. However, it is recommended to look what others have done with similar datasets and problems. +Another best practice is to start with a relatively simple architecture. Once running start to add layers and tweak the network to see if performance increases. +::: + +### Choose a pretrained model +If your data and problem is very similar to what others have done, you can often use a *pretrained network*. +Even if your problem is different, but the data type is common (for example images), you can use a pretrained network and finetune it for your problem. +A large number of openly available pretrained networks can be found in the [Model Zoo](https://modelzoo.co/), [pytorch hub](https://pytorch.org/hub/) or [tensorflow hub](https://www.tensorflow.org/hub/). + + +## 5. Choose a loss function and optimizer +We have now designed a neural network that in theory we should be able to +train to classify Penguins. +However, we first need to select an appropriate loss +function that we will use during training. +This loss function tells the training algorithm how wrong, or how 'far away' from the true +value the predicted value is. + +For the one-hot encoding that we selected before a fitting loss function is the Categorical Crossentropy loss. +In Keras this is implemented in the `keras.losses.CategoricalCrossentropy` class. +This loss function works well in combination with the `softmax` activation function +we chose earlier. +The Categorical Crossentropy works by comparing the probabilities that the +neural network predicts with 'true' probabilities that we generated using the one-hot encoding. +This is a measure for how close the distribution of the three neural network outputs corresponds to the distribution of the three values in the one-hot encoding. +It is lower if the distributions are more similar. + +For more information on the available loss functions in Keras you can check the +[documentation](https://www.tensorflow.org/api_docs/python/tf/keras/losses). + +Next we need to choose which optimizer to use and, if this optimizer has parameters, what values +to use for those. Furthermore, we need to specify how many times to show the training samples to the optimizer. + +Once more, Keras gives us plenty of choices all of which have their own pros and cons, +but for now let us go with the widely used [Adam optimizer](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam). +Adam has a number of parameters, but the default values work well for most problems. +So we will use it with its default parameters. + +Combining this with the loss function we decided on earlier we can now compile the +model using `model.compile`. +Compiling the model prepares it to start the training. + +```python +model.compile(optimizer='adam', loss=keras.losses.CategoricalCrossentropy()) +``` + +## 6. Train model +We are now ready to train the model. + +Training the model is done using the `fit` method, it takes the input data and +target data as inputs and it has several other parameters for certain options +of the training. +Here we only set a different number of `epochs`. +One training epoch means that every sample in the training data has been shown +to the neural network and used to update its parameters. + +```python +history = model.fit(X_train, y_train, epochs=100) +``` + +The fit method returns a history object that has a history attribute with the training loss and +potentially other metrics per training epoch. +It can be very insightful to plot the training loss to see how the training progresses. +Using seaborn we can do this as follow: +```python +sns.lineplot(x=history.epoch, y=history.history['loss']) +``` +![][training_curve] + +This plot can be used to identify whether the training is well configured or whether there +are problems that need to be addressed. + +::: challenge +## The Training Curve +Looking at the training curve we have just made. + +1. How does the training progress? + * Does the training loss increase or decrease? + * Does it change quickly or slowly? + * Does the graph look very jittery? +2. Do you think the resulting trained network will work well on the test set? + +When the training process does not go well: + +3. (optional) Something went wrong here during training. What could be the problem, and how do you see that in the training curve? +Also compare the range on the y-axis with the previous training curve. +![](../fig/02_bad_training_history_1.png){alt='Very jittery training curve with the loss value jumping back and forth between 2 and 4. The range of the y-axis is from 2 to 4, whereas in the previous training curve it was from 0 to 2. The loss seems to decrease a litle bit, but not as much as compared to the previous plot where it dropped to almost 0. The minimum loss in the end is somewhere around 2.'} + +:::: solution +## Solution +1. The training loss decreases quickly. It drops in a smooth line with little jitter. +This is ideal for a training curve. +2. The results of the training give very little information on its performance on a test set. + You should be careful not to use it as an indication of a well trained network. +3. (optional) The loss does not go down at all, or only very slightly. This means that the model is not learning anything. +It could be that something went wrong in the data preparation (for example the labels are not attached to the right features). +In addition, the graph is very jittery. This means that for every update step, +the weights in the network are updated in such a way that the loss sometimes increases a lot and sometimes decreases a lot. +This could indicate that the weights are updated too much at every learning step and you need a smaller learning rate +(we will go into more details on this in the next episode). +Or there is a high variation in the data, leading the optimizer to change the weights in different directions at every learning step. +This could be addressed by presenting more data at every learning step (or in other words increasing the batch size). +In this case the graph was created by training on nonsense data, so this a training curve for a problem where nothing can be learned really. + +We will take a closer look at training curves in the next episode. Some of the concepts touched upon here will also be further explained there. + +:::: +::: + +## 7. Perform a prediction/classification +Now that we have a trained neural network, we can use it to predict new samples +of penguin using the `predict` function. + +We will use the neural network to predict the species of the test set +using the `predict` function. +We will be using this prediction in the next step to measure the performance of our +trained network. +This will return a `numpy` matrix, which we convert +to a pandas dataframe to easily see the labels. +```python +y_pred = model.predict(X_test) +prediction = pd.DataFrame(y_pred, columns=target.columns) +prediction +``` +| | | | | +| --: | -------: | --------: | -------: | +| 0 | 0.304484 | 0.192893 | 0.502623 | +| 1 | 0.527107 | 0.095888 | 0.377005 | +| 2 | 0.373989 | 0.195604 | 0.430406 | +| 3 | 0.493643 | 0.154104 | 0.352253 | +| 4 | 0.309051 | 0.308646 | 0.382303 | +| ... | ... | ... | ... | +| 64 | 0.406074 | 0.191430 | 0.402496 | +| 65 | 0.645621 | 0.077174 | 0.277204 | +| 66 | 0.356284 | 0.185958 | 0.457758 | +| 67 | 0.393868 | 0.159575 | 0.446557 | +| 68 | 0.509837 | 0.144219 | 0.345943 | + + +Remember that the output of the network uses the `softmax` activation function and has three +outputs, one for each species. This dataframe shows this nicely. + +We now need to transform this output to one penguin species per sample. +We can do this by looking for the index of highest valued output and converting that +to the corresponding species. +Pandas dataframes have the `idxmax` function, which will do exactly that. + +```python +predicted_species = prediction.idxmax(axis="columns") +predicted_species +``` + +```output +0 Gentoo +1 Adelie +2 Gentoo +3 Adelie +4 Gentoo + ... +64 Adelie +65 Adelie +66 Gentoo +67 Gentoo +68 Adelie +Length: 69, dtype: object +``` + +## 8. Measuring performance +Now that we have a trained neural network it is important to assess how well it performs. +We want to know how well it will perform in a realistic prediction scenario, measuring +performance will also come back when refining the model. + +We have created a test set (i.e. y_test) during the data preparation stage which we will use +now to create a confusion matrix. + +### Confusion matrix +With the predicted species we can now create a confusion matrix and display it +using seaborn. +To create a confusion matrix we will use another convenient function from sklearn +called `confusion_matrix`. +This function takes as a first parameter the true labels of the test set. +We can get these by using the `idxmax` method on the y_test dataframe. +The second parameter is the predicted labels which we did above. + +```python +from sklearn.metrics import confusion_matrix + +true_species = y_test.idxmax(axis="columns") + +matrix = confusion_matrix(true_species, predicted_species) +print(matrix) +``` +```output +[[22 0 8] + [ 5 0 9] + [ 6 0 19]] +``` + +Unfortunately, this matrix is kinda hard to read. Its not clear which column and which row +corresponds to which species. +So let's convert it to a pandas dataframe with its index and columns set to the species +as follows: + +```python +# Convert to a pandas dataframe +confusion_df = pd.DataFrame(matrix, index=y_test.columns.values, columns=y_test.columns.values) + +# Set the names of the x and y axis, this helps with the readability of the heatmap. +confusion_df.index.name = 'True Label' +confusion_df.columns.name = 'Predicted Label' +``` + +We can then use the `heatmap` function from seaborn to create a nice visualization of +the confusion matrix. +The `annot=True` parameter here will put the numbers from the confusion matrix in +the heatmap. + +```python +sns.heatmap(confusion_df, annot=True) +``` +![][confusion_matrix] + +::: challenge +## Confusion Matrix +Measure the performance of the neural network you trained and +visualize a confusion matrix. + +- Did the neural network perform well on the test set? +- Did you expect this from the training loss you saw? +- What could we do to improve the performance? + +:::: solution +## Solution +The confusion matrix shows that the predictions for Adelie and Gentoo +are decent, but could be improved. However, Chinstrap is not predicted +ever. +The training loss was very low, so from that perspective this may be +surprising. +But this illustrates very well why a test set is important when training +neural networks. +We can try many things to improve the performance from here. +One of the first things we can try is to balance the dataset better. +Other options include: changing the network architecture or changing the +training parameters + +Note that the outcome you have might be slightly different from what is shown in this tutorial. +:::: +::: + +## 9. Refine the model +As we discussed before the design and training of a neural network comes with +many hyperparameter and model architecture choices. +We will go into more depth of these choices in later episodes. +For now it is important to realize that the parameters we chose were +somewhat arbitrary and more careful consideration needs to be taken to +pick hyperparameter values. + + +## 10. Share model +It is very useful to be able to use the trained neural network at a later +stage without having to retrain it. +This can be done by using the `save` method of the model. +It takes a string as a parameter which is the path of a directory where the model is stored. + +```python +model.save('my_first_model') +``` + +This saved model can be loaded again by using the `load_model` method as follows: +```python +pretrained_model = keras.models.load_model('my_first_model') +``` + +This loaded model can be used as before to predict. + +```python +# use the pretrained model here +y_pretrained_pred = pretrained_model.predict(X_test) +pretrained_prediction = pd.DataFrame(y_pretrained_pred, columns=target.columns.values) + +# idxmax will select the column for each row with the highest value +pretrained_predicted_species = pretrained_prediction.idxmax(axis="columns") +print(pretrained_predicted_species) +``` + +```output +0 Adelie +1 Gentoo +2 Adelie +3 Gentoo +4 Gentoo + ... +64 Gentoo +65 Gentoo +66 Adelie +67 Adelie +68 Gentoo +Length: 69, dtype: object +``` + + +[palmer-penguins]: fig/palmer_penguins.png "Palmer Penguins" +{alt='Illustration of the three species of penguins found in the Palmer Archipelago, Antarctica: Chinstrap, Gentoo and Adele'} + +[penguin-beaks]: fig/culmen_depth.png "Culmen Depth" +{alt='Illustration of the beak dimensions called culmen length and culmen depth in the dataset'} + +[pairplot]: fig/pairplot.png "Pair Plot" +{alt='Pair plot showing the separability of the three species of penguin for combinations of dataset attributes'} + +[sex_pairplot]: fig/02_sex_pairplot.png "Pair plot grouped by sex" +{alt='Pair plot showing the separability of the two sexes of penguin for combinations of dataset attributes'} + +[training_curve]: fig/02_training_curve.png "Training Curve" +{alt='Training loss curve of the neural network training which depicts exponential decrease in loss before a plateau from ~10 epochs'} + +[confusion_matrix]: fig/confusion_matrix.png "Confusion Matrix" +{alt='Confusion matrix of the test set with high accuracy for Adelie and Gentoo classification and no correctly predicted Chinstrap'} + + +:::: keypoints +- The deep learning workflow is a useful tool to structure your approach, it helps to make sure you do not forget any important steps. +- Exploring the data is an important step to familiarize yourself with the problem and to help you determine the relavent inputs and outputs. +- One-hot encoding is a preprocessing step to prepare labels for classification in Keras. +- A fully connected layer is a layer which has connections to all neurons in the previous and subsequent layers. +- keras.layers.Dense is an implementation of a fully connected layer, you can set the number of neurons in the layer and the activation function used. +- To train a neural network with Keras we need to first define the network using layers and the Model class. Then we can train it using the model.fit function. +- Plotting the loss curve can be used to identify and troubleshoot the training process. +- The loss curve on the training set does not provide any information on how well a network performs in a real setting. +- Creating a confusion matrix with results from a test set gives better insight into the network's performance. +:::: diff --git a/3-monitor-the-model.md b/3-monitor-the-model.md new file mode 100644 index 00000000..be1376db --- /dev/null +++ b/3-monitor-the-model.md @@ -0,0 +1,995 @@ +--- +title: "Monitor the training process" +teaching: 135 +exercises: 80 +--- + +::: questions +- "How do I create a neural network for a regression task?" +- "How does optimization work?" +- "How do I monitor the training process?" +- "How do I detect (and avoid) overfitting?" +- "What are common options to improve the model performance?" +::: + +::: objectives +- "Explain the importance of keeping your test set clean, by validating on the validation set instead of the test set" +- "Use the data splits to plot the training process" +- "Explain how optimization works" +- "Design a neural network for a regression task" +- "Measure the performance of your deep neural network" +- "Interpret the training plots to recognize overfitting" +- "Use normalization as preparation step for Deep Learning" +- "Implement basic strategies to prevent overfitting" +::: + +::: instructor +## Copy-pasting code +In this episode we first introduce a simple approach to the problem, +then we iterate on that a few times to, step-by-step, +working towards a more complex solution. +Unfortunately this involves using the same code repeatedly over and over again, +only slightly adapting it. + +To avoid too much typing, it can help to copy-paste code from higher up in the notebook. +Be sure to make it clear where you are copying from +and what you are actually changing in the copied code. +It can for example help to add a comment to the lines that you added. +::: + +In this episode we will explore how to monitor the training progress, evaluate our the model predictions and finetune the model to avoid over-fitting. For that we will use a more complicated weather data-set. + +## 1. Formulate / Outline the problem: weather prediction + +Here we want to work with the *weather prediction dataset* (the light version) which can be +[downloaded from Zenodo](https://doi.org/10.5281/zenodo.5071376). +It contains daily weather observations from 11 different European cities or places through the +years 2000 to 2010. For all locations the data contains the variables ‘mean temperature’, ‘max temperature’, and ‘min temperature’. In addition, for multiple locations, the following variables are provided: 'cloud_cover', 'wind_speed', 'wind_gust', 'humidity', 'pressure', 'global_radiation', 'precipitation', 'sunshine', but not all of them are provided for every location. A more extensive description of the dataset including the different physical units is given in accompanying metadata file. The full dataset comprises of 10 years (3654 days) of collected weather data across Europe. + +![European locations in the weather prediction dataset](fig/03_weather_prediction_dataset_map.png){alt='18 European locations in the weather prediction dataset'} + + A very common task with weather data is to make a prediction about the weather sometime in the future, say the next day. In this episode, we will try to predict tomorrow's sunshine hours, a challenging-to-predict feature, using a neural network with the available weather data for one location: BASEL. + +## 2. Identify inputs and outputs + +### Import Dataset +We will now import and explore the weather data-set: + +```python +import pandas as pd + +filename_data = "weather_prediction_dataset_light.csv" +data = pd.read_csv(filename_data) +data.head() +``` + +| | DATE | MONTH | BASEL_cloud_cover | BASEL_humidity | BASEL_pressure | ... | +|------:|------:|---------------:|--------------:|------------------:|------------:|------------:| +|0| 20000101 |1 |8 |0.89 |1.0286|... | +|1| 20000102 |1 |8 |0.87 |1.0318|... | +|2| 20000103 |1 |5 |0.81 |1.0314|... | +|3| 20000104 |1 |7 |0.79 |1.0262|... | +|4| 20000105 |1 |5 |0.90 |1.0246|... | + + +::: callout +## Load the data +If you have not downloaded the data yet, you can also load it directly from Zenodo: +```python +data = pd.read_csv("https://zenodo.org/record/5071376/files/weather_prediction_dataset_light.csv?download=1") +``` +::: + + +### Brief exploration of the data +Let us start with a quick look at the type of features that we find in the data. +```python +data.columns +``` + +```output +Index(['DATE', 'MONTH', 'BASEL_cloud_cover', 'BASEL_humidity', + 'BASEL_pressure', 'BASEL_global_radiation', 'BASEL_precipitation', + 'BASEL_sunshine', 'BASEL_temp_mean', 'BASEL_temp_min', 'BASEL_temp_max', + ... + 'SONNBLICK_temp_min', 'SONNBLICK_temp_max', 'TOURS_humidity', + 'TOURS_pressure', 'TOURS_global_radiation', 'TOURS_precipitation', + 'TOURS_temp_mean', 'TOURS_temp_min', 'TOURS_temp_max'], + dtype='object') +``` +There is a total of 9 different measured variables (global_radiation, humidity, etcetera) + + +Let's have a look at the shape of the dataset: +```python +data.shape +``` +```output +(3654, 91) +``` +This will give both the number of samples (3654) and the number of features (89 + month + +date). + +## 3. Prepare data + +### Select a subset and split into data (X) and labels (y) +The full dataset comprises of 10 years (3654 days) from which we will select only the first 3 years. The present dataset is sorted by "DATE", so for each row `i` in the table we can pick a corresponding feature and location from row `i+1` that we later want to predict with our model. As outlined in step 1, we would like to predict the sunshine hours for the location: BASEL. + +```python +nr_rows = 365*3 # 3 years +# data +X_data = data.loc[:nr_rows] # Select first 3 years +X_data = X_data.drop(columns=['DATE', 'MONTH']) # Drop date and month column + +# labels (sunshine hours the next day) +y_data = data.loc[1:(nr_rows + 1)]["BASEL_sunshine"] +``` + +In general, it is important to check if the data contains any unexpected values such as `9999` or `NaN` or `NoneType`. You can use the pandas `data.describe()` or `data.isnull()` function for this. If so, such values must be removed or replaced. +In the present case the data is luckily well prepared and shouldn't contain such values, so that this step can be omitted. + +### Split data and labels into training, validation, and test set + +As with classical machine learning techniques, it is required in deep learning to split off a hold-out *test set* which remains untouched during model training and tuning. It is later used to evaluate the model performance. On top, we will also split off an additional *validation set*, the reason of which will hopefully become clearer later in this lesson. + +To make our lives a bit easier, we employ a trick to create these 3 datasets, `training set`, `test set` and `validation set`, by calling the `train_test_split` method of `scikit-learn` twice. + +First we create the training set and leave the remainder of 30 % of the data to the two hold-out sets. + +```python +from sklearn.model_selection import train_test_split + +X_train, X_holdout, y_train, y_holdout = train_test_split(X_data, y_data, test_size=0.3, random_state=0) +``` + +Now we split the 30 % of the data in two equal sized parts. + +```python +X_val, X_test, y_val, y_test = train_test_split(X_holdout, y_holdout, test_size=0.5, random_state=0) +``` + +Setting the `random_state` to `0` is a short-hand at this point. Note however, that changing this seed of the pseudo-random number generator will also change the composition of your data sets. For the sake of reproducibility, this is one example of a parameters that should not change at all. + +## 4. Choose a pretrained model or start building architecture from scratch + +### Regression and classification + +In episode 2 we trained a dense neural network on a *classification task*. For this one hot encoding was used together with a `Categorical Crossentropy` loss function. +This measured how close the distribution of the neural network outputs corresponds to the distribution of the three values in the one hot encoding. +Now we want to work on a *regression task*, thus not predicting a class label (or integer number) for a datapoint. In regression, we like to predict one (and sometimes many) values of a feature. This is typically a floating point number. + +::: challenge +## Exercise: Architecture of the network +As we want to design a neural network architecture for a regression task, +see if you can first come up with the answers to the following questions: + +1. What must be the dimension of our input layer? +2. We want to output the prediction of a single number. The output layer of the NN hence cannot be the same as for the classification task earlier. This is because the `softmax` activation being used had a concrete meaning with respect to the class labels which is not needed here. What output layer design would you choose for regression? +Hint: A layer with `relu` activation, with `sigmoid` activation or no activation at all? +3. (Optional) How would we change the model if we would like to output a prediction of the precipitation in Basel in *addition* to the sunshine hours? + +:::: solution +## Solution +1. The shape of the input layer has to correspond to the number of features in our data: 89 +2. The output is a single value per prediction, so the output layer can consist of a dense layer with only one node. The *softmax* activiation function works well for a classification task, but here we do not want to restrict the possible outcomes to the range of zero and one. In fact, we can omit the activation in the output layer. +3. The output layer should have 2 neurons, one for each number that we try to predict. Our y_train (and val and test) then becomes a (n_samples, 2) matrix. +:::: +::: + + +In our example we want to predict the sunshine hours in Basel (or any other place in the dataset) for tomorrow based on the weather data of all 18 locations today. `BASEL_sunshine` is a floating point value (i.e. `float64`). The network should hence output a single float value which is why the last layer of our network will only consist of a single node. + +We compose a network of two hidden layers to start off with something. We go by a scheme with 100 neurons in the first hidden layer and 50 neurons in the second layer. As activation function we settle on the `relu` function as a it proved very robust and widely used. To make our live easier later, we wrap the definition of the network in a method called `create_nn`. + +```python +from tensorflow import keras + +def create_nn(): + # Input layer + inputs = keras.Input(shape=(X_data.shape[1],), name='input') + + # Dense layers + layers_dense = keras.layers.Dense(100, 'relu')(inputs) + layers_dense = keras.layers.Dense(50, 'relu')(layers_dense) + + # Output layer + outputs = keras.layers.Dense(1)(layers_dense) + + return keras.Model(inputs=inputs, outputs=outputs, name="weather_prediction_model") + +model = create_nn() +``` + +The shape of the input layer has to correspond to the number of features in our data: `89`. We use `X_data.shape[1]` to obtain this value dynamically + +The output layer here is a dense layer with only 1 node. And we here have chosen to use *no activation function*. +While we might use *softmax* for a classification task, here we do not want to restrict the possible outcomes for a start. + +In addition, we have here chosen to write the network creation as a function so that we can use it later again to initiate new models. + +Let us check how our model looks like by calling the `summary` method. + +```python +model.summary() +``` +```output +Model: "weather_prediction_model" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input (InputLayer) [(None, 89)] 0 +_________________________________________________________________ +dense (Dense) (None, 100) 9000 +_________________________________________________________________ +dense_1 (Dense) (None, 50) 5050 +_________________________________________________________________ +dense_2 (Dense) (None, 1) 51 +================================================================= +Total params: 14,101 +Trainable params: 14,101 +Non-trainable params: 0 +``` + + +When compiling the model we can define a few very important aspects. We will discuss them now in more detail. + +## Intermezzo: How do neural networks learn? +In the introduction we learned about the loss function: it quantifies the total error of the predictions made by the model. +During model training we aim to find the model parameters that minimize the loss. +This is called optimization, but how does optimization actually work? + +### Gradient descent +Gradient descent is a widely used optimization algorithm, most other optimization algorithms are based on it. +It works as follows: Imagine a neural network with only one neuron. +Take a look at the figure below. The plot shows the loss as a function of the weight of the neuron. +As you can see there is a global loss minimum, we would like to find the weight at this point in the parabola. +To do this, we initialize the model weight with some random value. Then we compute the gradient of the loss function with respect +to the weight. This tells us how much the loss function will change if we change the weight by a small amount. +Then, we update the weight by taking a small step in the direction of the negative gradient, so down the slope. +This will slightly decrease the loss. This process is repeated until the loss function reaches a minimum. +The size of the step that is taken in each iteration is called the 'learning rate'. + +![](fig/03_gradient_descent.png){alt='Plot of the loss as a function of the weights. Through gradient descent the global loss minimum is found'} + +### Batch gradient descent +You could use the entire training dataset to perform one learning step in gradient descent, +which would mean that one epoch equals one learning step. +In practice, in each learning step we only use a subset of the training data to compute the loss and the gradients. +This subset is called a 'batch', the number of samples in one batch is called the 'batch size'. + +::: challenge + +## Exercise: Gradient descent + +Answer the following questions: + +### 1. What is the goal of optimization? + +- A. To find the weights that maximize the loss function +- B. To find the weights that minimize the loss function + +### 2. What happens in one gradient descent step? + +- A. The weights are adjusted so that we move in the direction of the gradient, so up the slope of the loss function +- B. The weights are adjusted so that we move in the direction of the gradient, so down the slope of the loss function +- C. The weights are adjusted so that we move in the direction of the negative gradient, so up the slope of the loss function +- D. The weights are adjusted so that we move in the direction of the negative gradient, so down the slope of the loss function + +### 3. When the batch size is increased: +(multiple answers might apply) + +- A. The number of samples in an epoch also increases +- B. The number of batches in an epoch goes down +- C. The training progress is more jumpy, because more samples are consulted in each update step (one batch). +- D. The memory load (memory as in computer hardware) of the training process is increased + +:::: solution + +## Solution + +1. Correct answer: B. To find the weights that minimize the loss function. + The loss function quantifies the total error of the network, we want to have the smallest error as possible, hence we minimize the loss. + +2. Correct answer: D The weights are adjusted so that we move in the direction of the negative gradient, so down the slope of the loss function. + We want to move towards the global minimum, so in the opposite direction of the gradient. + +3. Correct answer: B & D + - A. The number of samples in an epoch also increases (**incorrect**, an epoch is always defined as passing through the training data for one cycle) + - B. The number of batches in an epoch goes down (**correct**, the number of batches is the samples in an epoch divided by the batch size) + - C. The training progress is more jumpy, because more samples are consulted in each update step (one batch). (**incorrect**, more samples are consulted in each update step, but this makes the progress less jumpy since you get a more accurate estimate of the loss in the entire dataset) + - D. The memory load (memory as in computer hardware) of the training process is increased (**correct**, the data is begin loaded one batch at a time, so more samples means more memory usage) + +:::: +::: + +## 5. Choose a loss function and optimizer +### Loss function +The loss is what the neural network will be optimized on during training, so choosing a suitable loss function is crucial for training neural networks. +In the given case we want to stimulate that the predicted values are as close as possible to the true values. This is commonly done by using the *mean squared error* (mse) or the *mean absolute error* (mae), both of which should work OK in this case. Often, mse is preferred over mae because it "punishes" large prediction errors more severely. +In Keras this is implemented in the `keras.losses.MeanSquaredError` class (see Keras documentation: https://keras.io/api/losses/). This can be provided into the `model.compile` method with the `loss` parameter and setting it to `mse`, e.g. + + +```python +model.compile(loss='mse') +``` + + +### Optimizer + +Somewhat coupled to the loss function is the *optimizer* that we want to use. +The *optimizer* here refers to the algorithm with which the model learns to optimize on the provided loss function. A basic example for such an optimizer would be *stochastic gradient descent*. For now, we can largely skip this step and pick one of the most common optimizers that works well for most tasks: the *Adam optimizer*. Similar to activation functions, the choice of optimizer depends on the problem you are trying to solve, your model architecture and your data. *Adam* is a good starting point though, which is why we chose it. + + +```python +model.compile(optimizer='adam', + loss='mse') +``` + +### Metrics + +In our first example (episode 2) we plotted the progression of the loss during training. +That is indeed a good first indicator if things are working alright, i.e. if the loss is indeed decreasing as it should with the number of epochs. +However, when models become more complicated then also the loss functions often become less intuitive. +That is why it is good practice to monitor the training process with additional, more intuitive metrics. +They are not used to optimize the model, but are simply recorded during training. +With Keras such additional metrics can be added via `metrics=[...]` parameter and can contain one or multiple metrics of interest. +Here we could for instance chose to use `'mae'` the mean absolute error, or the the *root mean squared error* (RMSE) which unlike the *mse* has the same units as the predicted values. For the sake of units, we choose the latter. + +```python +model.compile(optimizer='adam', + loss='mse', + metrics=[keras.metrics.RootMeanSquaredError()]) +``` + +Let's create a `compile_model` function to easily compile the model throughout this lesson: +```python +def compile_model(model): + model.compile(optimizer='adam', + loss='mse', + metrics=[keras.metrics.RootMeanSquaredError()]) +compile_model(model) +``` + +With this, we complete the compilation of our network and are ready to start training. + +## 6. Train the model + +Now that we created and compiled our dense neural network, we can start training it. +One additional concept we need to introduce though, is the `batch_size`. +This defines how many samples from the training data will be used to estimate the error gradient before the model weights are updated. +Larger batches will produce better, more accurate gradient estimates but also less frequent updates of the weights. +Here we are going to use a batch size of 32 which is a common starting point. +```python +history = model.fit(X_train, y_train, + batch_size=32, + epochs=200, + verbose=2) +``` + +We can plot the training process using the `history` object returned from the model training. +We will create a function for it, because we will make use of this more often in this lesson! +```python +import seaborn as sns +import matplotlib.pyplot as plt + +def plot_history(history, metrics): + """ + Plot the training history + + Args: + history (keras History object that is returned by model.fit()) + metrics (str, list): Metric or a list of metrics to plot + """ + history_df = pd.DataFrame.from_dict(history.history) + sns.lineplot(data=history_df[metrics]) + plt.xlabel("epochs") + plt.ylabel("metric") + +plot_history(history, 'root_mean_squared_error') +``` + +![](fig/03_training_history_1_rmse.png){alt='Plot of the RMSE over epochs for the trained model that shows a decreasing error metric'} + +This looks very promising! Our metric ("RMSE") is dropping nicely and while it maybe keeps fluctuating a bit it does end up at fairly low *RMSE* values. +But the *RMSE* is just the root *mean* squared error, so we might want to look a bit more in detail how well our just trained model does in predicting the sunshine hours. + +## 7. Perform a Prediction/Classification +Now that we have our model trained, we can make a prediction with the model before measuring the performance of our neural network. + +```python +y_train_predicted = model.predict(X_train) +y_test_predicted = model.predict(X_test) +``` + +## 8. Measure performance + +There is not a single way to evaluate how a model performs. But there are at least two very common approaches. For a *classification task* that is to compute a *confusion matrix* for the test set which shows how often particular classes were predicted correctly or incorrectly. + +For the present *regression task*, it makes more sense to compare true and predicted values in a scatter plot. + +So, let's look at how the predicted sunshine hour have developed with reference to their ground truth values. + +```python +# We define a function that we will reuse in this lesson +def plot_predictions(y_pred, y_true, title): + plt.style.use('ggplot') # optional, that's only to define a visual style + plt.scatter(y_pred, y_true, s=10, alpha=0.5) + plt.xlabel("predicted sunshine hours") + plt.ylabel("true sunshine hours") + plt.title(title) + +plot_predictions(y_train_predicted, y_train, title='Predictions on the training set') +``` + +![](fig/03_regression_predictions_trainset.png){alt='Scatter plot between predictions and true sunshine hours in Basel on the train set showing a concise spread'} + +```python +plot_predictions(y_test_predicted, y_test, title='Predictions on the test set') +``` +![](fig/03_regression_predictions_testset.png){alt='Scatter plot between predictions and true sunshine hours in Basel on the test set showing a wide spread'} + +::: challenge +## Exercise: Reflecting on our results +* Is the performance of the model as you expected (or better/worse)? +* Is there a noteable difference between training set and test set? And if so, any idea why? +* (Optional) When developing a model, you will often vary different aspects of your model like + which features you use, model parameters and architecture. It is important to settle on a + single-number evaluation metric to compare your models. + * What single-number evaluation metric would you choose here and why? + +:::: solution +## Solution +While the performance on the train set seems reasonable, the performance on the test set is much worse. +This is a common problem called **overfitting**, which we will discuss in more detail later. + +#### Optional exercise: +The metric that we are using: RMSE would be a good one. You could also consider Mean Squared Error, that punishes large errors more (because large errors create even larger squared errors). +It is important that if the model improves in performance on the basis of this metric then that should also lead you a step closer to reaching your goal: to predict tomorrow's sunshine hours. +If you feel that improving the metric does not lead you closer to your goal, then it would be better to choose a different metric +:::: +::: + +The accuracy on the training set seems fairly good. +In fact, considering that the task of predicting the daily sunshine hours is really not easy it might even be surprising how well the model predicts that +(at least on the training set). Maybe a little too good? +We also see the noticeable difference between train and test set when calculating the exact value of the RMSE: + +```python +train_metrics = model.evaluate(X_train, y_train, return_dict=True) +test_metrics = model.evaluate(X_test, y_test, return_dict=True) +print('Train RMSE: {:.2f}, Test RMSE: {:.2f}'.format(train_metrics['root_mean_squared_error'], test_metrics['root_mean_squared_error'])) +``` +```output +24/24 [==============================] - 0s 442us/step - loss: 0.7092 - root_mean_squared_error: 0.8421 +6/6 [==============================] - 0s 647us/step - loss: 16.4413 - root_mean_squared_error: 4.0548 +Train RMSE: 0.84, Test RMSE: 4.05 +``` + +For those experienced with (classical) machine learning this might look familiar. +The plots above expose the signs of **overfitting** which means that the model has to some extent memorized aspects of the training data. +As a result, it makes much more accurate predictions on the training data than on unseen test data. + + +Overfitting also happens in classical machine learning, but there it is usually interpreted as the model having more parameters than the training data would justify (say, a decision tree with too many branches for the number of training instances). As a consequence one would reduce the number of parameters to avoid overfitting. +In deep learning the situation is slightly different. It can - as for classical machine learning - also be a sign of having a *too big* model, meaning a model with too many parameters (layers and/or nodes). However, in deep learning higher number of model parameters are often still considered acceptable and models often perform best (in terms of prediction accuracy) when they are at the verge of overfitting. So, in a way, training deep learning models is always a bit like playing with fire... + +### Set expectations: How difficult is the defined problem? + +Before we dive deeper into handling overfitting and (trying to) improving the model performance, let us ask the question: How well must a model perform before we consider it a good model? + +Now that we defined a problem (predict tomorrow's sunshine hours), it makes sense to develop an intuition for how difficult the posed problem is. Frequently, models will be evaluated against a so called **baseline**. A baseline can be the current standard in the field or if such a thing does not exist it could also be an intuitive first guess or toy model. The latter is exactly what we would use for our case. + +Maybe the simplest sunshine hour prediction we can easily do is: Tomorrow we will have the same number of sunshine hours as today. +(sounds very naive, but for many observables such as temperature this is already a fairly good predictor) + + +We can take the `BASEL_sunshine` column of our data, because this contains the sunshine hours from one day before what we have as a label. +```python +y_baseline_prediction = X_test['BASEL_sunshine'] +plot_predictions(y_baseline_prediction, y_test, title='Baseline predictions on the test set') +``` + +![](fig/03_regression_test_5_naive_baseline.png){alt="Scatter plot of predicted vs true sunshine hours in Basel for the test set where today's sunshine hours is considered as the true sunshine hours for tomorrow"} + +It is difficult to interpret from this plot whether our model is doing better than the baseline. +We can also have a look at the RMSE: +```python +from sklearn.metrics import mean_squared_error +rmse_baseline = mean_squared_error(y_test, y_baseline_prediction, squared=False) +print('Baseline:', rmse_baseline) +print('Neural network: ', test_metrics['root_mean_squared_error']) +``` +```output +Baseline: 3.877323350410224 +Neural network: 4.077792167663574 +``` + +Judging from the numbers alone, our neural network prediction would be performing worse than the baseline. + +::: challenge +## Exercise: Baseline +1. Looking at this baseline: Would you consider this a simple or a hard problem to solve? +2. (Optional) Can you think of other baselines? + +:::: solution +## Solution +1. This really depends on your definition of hard! The baseline gives a more accurate prediction than just +randomly predicting a number, so the problem is not impossible to solve with machine learning. However, given the structure of the data and our expectations with respect to quality of prediction, it may remain hard to find a good algorithm which exceeds our baseline by orders of magnitude. +2. There are a lot of possible answers. A slighly more complicated baseline would be to take the average over the last couple of days. +:::: +::: + +## 9. Refine the model + +### Watch your model training closely + +As we saw when comparing the predictions for the training and the test set, deep learning models are prone to overfitting. Instead of iterating through countless cycles of model trainings and subsequent evaluations with a reserved test set, it is common practice to work with a second split off dataset to monitor the model during training. +This is the *validation set* which can be regarded as a second test set. As with the test set, the datapoints of the *validation set* are not used for the actual model training itself. Instead, we evaluate the model with the *validation set* after every epoch during training, for instance to stop if we see signs of clear overfitting. +Since we are adapting our model (tuning our hyperparameters) based on this validation set, it is *very* important that it is kept separate from the test set. If we used the same set, we would not know whether our model truly generalizes or is only overfitting. + +::: callout +## Test vs. validation set +Not everybody agrees on the terminology of test set versus validation set. You might find +examples in literature where these terms are used the other way around. +We are sticking to the definition that is consistent with the Keras API. In there, the validation +set can be used during training, and the test set is reserved for afterwards. +::: + +Let's give this a try! + +We need to initiate a new model -- otherwise Keras will simply assume that we want to continue training the model we already trained above. +```python +model = create_nn() +compile_model(model) +``` + +But now we train it with the small addition of also passing it our validation set: +```python +history = model.fit(X_train, y_train, + batch_size=32, + epochs=200, + validation_data=(X_val, y_val)) +``` + +With this we can plot both the performance on the training data and on the validation data! + +```python +plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error']) +``` + +![](fig/03_training_history_2_rmse.png){alt='Plot of RMSE vs epochs for the training set and the validation set which depicts a divergence between the two around 10 epochs.'} + +::: challenge +## Exercise: plot the training progress. +1. Is there a difference between the training curves of training versus validation data? And if so, what would this imply? +2. (Optional) Take a pen and paper, draw the perfect training and validation curves. + (This may seem trivial, but it will trigger you to think about what you actually would like to see) + +:::: solution +## Solution +The difference in the two curves shows that something is not completely right here. +The error for the model predictions on the validation set quickly seem to reach a plateau while the error on the training set keeps decreasing. +That is a common signature of *overfitting*. + +Optional: + +Ideally you would like the training and validation curves to be identical and slope down steeply +to 0. After that the curves will just consistently stay at 0. +:::: +::: + +### Counteract model overfitting + +Overfitting is a very common issue and there are many strategies to handle it. +Most similar to classical machine learning might to **reduce the number of parameters**. + +::: challenge +## Exercise: Try to reduce the degree of overfitting by lowering the number of parameters +We can keep the network architecture unchanged (2 dense layers + a one-node output layer) and only play with the number of nodes per layer. +Try to lower the number of nodes in one or both of the two dense layers and observe the changes to the training and validation losses. +If time is short: Suggestion is to run one network with only 10 and 5 nodes in the first and second layer. + +1. Is it possible to get rid of overfitting this way? +2. Does the overall performance suffer or does it mostly stay the same? +3. (optional) How low can you go with the number of parameters without notable effect on the performance on the validation set? + +:::: solution +## Solution + +Let's first adapt our `create_nn` function so that we can tweak the number of nodes in the 2 layers +by passing arguments to the function: + +```python +def create_nn(nodes1=100, nodes2=50): + # Input layer + inputs = keras.layers.Input(shape=(X_data.shape[1],), name='input') + # Dense layers + layers_dense = keras.layers.Dense(nodes1, 'relu')(inputs) + layers_dense = keras.layers.Dense(nodes2, 'relu')(layers_dense) + # Output layer + outputs = keras.layers.Dense(1)(layers_dense) + return keras.Model(inputs=inputs, outputs=outputs, name="model_small") +``` + +Let's see if it works by creating a much smaller network with 10 nodes in the first layer, +and 5 nodes in the second layer: + +```python +model = create_nn(10, 5) +model.summary() +``` +``` +Model: "model_small" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input (InputLayer) [(None, 89)] 0 +_________________________________________________________________ +dense_9 (Dense) (None, 10) 900 +_________________________________________________________________ +dense_10 (Dense) (None, 5) 55 +_________________________________________________________________ +dense_11 (Dense) (None, 1) 6 +================================================================= +Total params: 961 +Trainable params: 961 +Non-trainable params: 0 +``` + +Let's compile and train this network: +```python +compile_model(model) +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 200, + validation_data=(X_val, y_val)) +plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error']) +``` + +![](fig/03_training_history_3_rmse_smaller_model.png){alt='Plot of RMSE vs epochs for the training set and the validation set with similar performance across the two sets.'} + +1. With this smaller model we have reduced overfitting a bit, since the training and validation loss are now closer to each other, and the validation loss does now reach a plateau and does not further increase. +We have not completely avoided overfitting though. +2. In the case of this small example model, the validation RMSE seems to end up around 3.2, which is much better than the 4.08 we had before. Note that you can double check the actual score by calling `model.evaluate()` on the test set. +3. In general, it quickly becomes a complicated search for the right "sweet spot", i.e. the settings for which overfitting will be (nearly) avoided but the model still performs equally well. A model with 3 neurons in both layers seems to be around this spot, reaching an RMSE of 3.1 on the validation set. +Reducing the number of nodes further increases the validation RMSE again. +:::: +::: + +We saw that reducing the number of parameters can be a strategy to avoid overfitting. +In practice, however, this is usually not the (main) way to go when it comes to deep learning. +One reason is, that finding the sweet spot can be really hard and time consuming. And it has to be repeated every time the model is adapted, e.g. when more training data becomes available. + +### Early stopping: stop when things are looking best +Arguable **the** most common technique to avoid (severe) overfitting in deep learning is called **early stopping**. +As the name suggests, this technique just means that you stop the model training if things do not seem to improve anymore. +More specifically, this usually means that the training is stopped if the validation loss does not (notably) improve anymore. +Early stopping is both intuitive and effective to use, so it has become a standard addition for model training. + +To better study the effect, we can now safely go back to models with many (too many?) parameters: +```python +model = create_nn() +compile_model(model) +``` + +To apply early stopping during training it is easiest to use Keras `EarlyStopping` class. +This allows to define the condition of when to stop training. In our case we will say when the validation loss is lowest. +However, since we have seen quiet some fluctuation of the losses during training above we will also set `patience=10` which means that the model will stop training if the validation loss has not gone down for 10 epochs. +```python +from tensorflow.keras.callbacks import EarlyStopping + +earlystopper = EarlyStopping( + monitor='val_loss', + patience=10 + ) + +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 200, + validation_data=(X_val, y_val), + callbacks=[earlystopper]) +``` + +As before, we can plot the losses during training: +```python +plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error']) +``` + +![](fig/03_training_history_3_rmse_early_stopping.png){alt='Plot of RMSE vs epochs for the training set and the validation set displaying similar performance across the two sets.'} + +This still seems to reveal the onset of overfitting, but the training stops before the discrepancy between training and validation loss can grow further. +Despite avoiding severe cases of overfitting, early stopping has the additional advantage that the number of training epochs will be regulated automatically. +Instead of comparing training runs for different number of epochs, early stopping allows to simply set the number of epochs to a desired maximum value. + +What might be a bit unintuitive is that the training runs might now end very rapidly. +This might spark the question: have we really reached an optimum yet? +And often the answer to this is "no", which is why early stopping frequently is combined with other approaches to avoid overfitting. +Overfitting means that a model (seemingly) performs better on seen data compared to unseen data. One then often also says that it does not "generalize" well. +Techniques to avoid overfitting, or to improve model generalization, are termed **regularization techniques** and we will come back to this in **episode 4**. + + +### BatchNorm: the "standard scaler" for deep learning + +A very common step in classical machine learning pipelines is to scale the features, for instance by using sckit-learn's `StandardScaler`. +This can in principle also be done for deep learning. +An alternative, more common approach, is to add **BatchNormalization** layers ([documentation of the batch normalization layer](https://keras.io/api/layers/normalization_layers/batch_normalization/)) which will learn how to scale the input values. +Similar to dropout, batch normalization is available as a network layer in Keras and can be added to the network in a similar way. +It does not require any additional parameter setting. + +The `BatchNormalization` can be inserted as yet another layer into the architecture. + +```python +def create_nn(): + # Input layer + inputs = keras.layers.Input(shape=(X_data.shape[1],), name='input') + + # Dense layers + layers_dense = keras.layers.BatchNormalization()(inputs) # This is new! + layers_dense = keras.layers.Dense(100, 'relu')(layers_dense) + layers_dense = keras.layers.Dense(50, 'relu')(layers_dense) + + # Output layer + outputs = keras.layers.Dense(1)(layers_dense) + + # Defining the model and compiling it + return keras.Model(inputs=inputs, outputs=outputs, name="model_batchnorm") + +model = create_nn() +compile_model(model) +model.summary() +``` + +This new layer appears in the model summary as well. + +```output +Model: "model_batchnorm" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input_1 (InputLayer) [(None, 89)] 0 +_________________________________________________________________ +batch_normalization (BatchNo (None, 89) 356 +_________________________________________________________________ +dense (Dense) (None, 100) 9000 +_________________________________________________________________ +dense_1 (Dense) (None, 50) 5050 +_________________________________________________________________ +dense_2 (Dense) (None, 1) 51 +================================================================= +Total params: 14,457 +Trainable params: 14,279 +Non-trainable params: 178 +``` + +We can train the model again as follows: +```python +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 1000, + validation_data=(X_val, y_val), + callbacks=[earlystopper]) + +plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error']) +``` + +![](fig/03_training_history_5_rmse_batchnorm.png){alt='Output of plotting sample'} + +::: callout +## Batchnorm parameters +You may have noticed that the number of parameters of the Batchnorm layers corresponds to +4 parameters per input node. +These are the moving mean, moving standard deviation, additional scaling factor (gamma) and offset factor (beta). +There is a difference in behavior for Batchnorm between training and prediction time. +During training time, the data is scaled with the mean and standard deviation of the batch. +During prediction time, the moving mean and moving standard deviation of the training set is used instead. +The additional parameters gamma and beta are introduced to allow for more flexibility in output values, and are used in both training and prediction. +::: + +### Run on test set and compare to naive baseline + +It seems that no matter what we add, the overall loss does not decrease much further (we at least avoided overfitting though!). +Let us again plot the results on the test set: +```python +y_test_predicted = model.predict(X_test) +plot_predictions(y_test_predicted, y_test, title='Predictions on the test set') +``` + +![](fig/03_regression_test_5_dropout_batchnorm.png){alt='Scatter plot between predictions and true sunshine hours for Basel on the test set'} + +Well, the above is certainly not perfect. But how good or bad is this? Maybe not good enough to plan your picnic for tomorrow. +But let's better compare it to the naive baseline we created in the beginning. What would you say, did we improve on that? + +::: challenge +## Exercise: Simplify the model and add data +You may have been wondering why we are including weather observations from +multiple cities to predict sunshine hours only in Basel. The weather is +a complex phenomenon with correlations over large distances and time scales, +but what happens if we limit ourselves to only one city? + +1. Since we will be reducing the number of features quite significantly, + we could afford to include more data. Instead of using only 3 years, use + 8 or 9 years! +2. Only use the features in the dataset that are for Basel, remove the data for other cities. + You can use something like: + ```python + cols = [c for c in X_data.columns if c[:5] == 'BASEL'] + X_data = X_data[cols] + ``` +3. Now rerun the last model we defined which included the BatchNorm layer. + Recreate the scatter plot comparing your predictions with the true values, + and evaluate the model by computing the RMSE on the test score. + Note that even though we will use many more observations than previously, + the network should still train quickly because we reduce the number of + features (columns). + Is the prediction better compared to what we had before? +4. (Optional) Try to train a model on all years that are available, + and all features from all cities. How does it perform? + + +:::: solution +## Solution +### 1. Use 9 years out of the dataset +```python +nr_rows = 365*9 +# data +X_data = data.loc[:nr_rows].drop(columns=['DATE', 'MONTH']) + +# labels (sunshine hours the next day) +y_data = data.loc[1:(nr_rows + 1)]["BASEL_sunshine"] +``` + +### 2. Only use features for Basel +```python +# only use columns with 'BASEL' +cols = [c for c in X_data.columns if c[:5] == 'BASEL'] +X_data = X_data[cols] +``` +### 3. Rerun the model and evaluate it +Do the train-test-validation split: +```python +X_train, X_holdout, y_train, y_holdout = train_test_split(X_data, y_data, test_size=0.3, random_state=0) +X_val, X_test, y_val, y_test = train_test_split(X_holdout, y_holdout, test_size=0.5, random_state=0) +``` + +Create the network. We can re-use the `create_nn` that we already have. Because we have reduced the number of input features +the number of parameters in the network goes down from 14457 to 6137. +```python +# create the network and view its summary +model = create_nn() +compile_model(model) +model.summary() +``` + +Fit with early stopping and output showing performance on validation set: +```python +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 1000, + validation_data=(X_val, y_val), + callbacks=[earlystopper], + verbose = 2) +plot_history(history, ['root_mean_squared_error', 'val_root_mean_squared_error']) +``` + +Create a scatter plot to compare with true observations: +```python +y_test_predicted = model.predict(X_test) +plot_predictions(y_test_predicted, y_test, title='Predictions on the test set') +``` +![](fig/03_scatter_plot_basel_model.png){alt='Scatterplot of predictions and true number of sunshine hours'} + + +Compute the RMSE on the test set: +```python +test_metrics = model.evaluate(X_test, y_test, return_dict=True) +print(f'Test RMSE: {test_metrics["root_mean_squared_error"]}') +``` +```output +Test RMSE: 3.3761725425720215 +``` + +This RMSE is already a lot better compared to what we had before and certainly better than the baseline. +Additionally, it could be further improved with hyperparameter tuning. + +Note that because we ran `train_test_split()` again, we are evaluating on a different test set than before. +In the real world it is important to always compare results on the exact same test set. + +### 4. (optional) Train a model on all years and all features available. +You can tweak the above code to use all years and all features: +```python +# We cannot take all rows, because we need to be able to take the sunshine hours of the next day +nr_rows = len(data) - 2 + +# data +X_data = data.loc[:nr_rows].drop(columns=['DATE', 'MONTH']) + +# labels (sunshine hours the next day) +y_data = data.loc[1:(nr_rows + 1)]["BASEL_sunshine"] +``` +For the rest you can use the same code as above to train and evaluate the model + +This results in an RMSE on the test set of 3.23 (your result can be different, but should be in the same range). +From this we can conclude that adding more training data results in even better performance! +:::: +::: + +::: callout +## Tensorboard +If we run many different experiments with different architectures, +it can be difficult to keep track of these different models or compare the achieved performance. +We can use *tensorboard*, a framework that keeps track of our experiments and shows graphs like we plotted above. +Tensorboard is included in our tensorflow installation by default. +To use it, we first need to add a *callback* to our (compiled) model that saves the progress of training performance in a logs rectory: +```python +from tensorflow.keras.callbacks import TensorBoard +import datetime +log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") # You can adjust this to add a more meaningful model name +tensorboard_callback = TensorBoard(log_dir=log_dir, histogram_freq=1) +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 200, + validation_data=(X_val, y_val), + callbacks=[tensorboard_callback], + verbose = 2) +``` +You can launch the tensorboard interface from a Jupyter notebook, showing all trained models: + +``` +%load_ext tensorboard +%tensorboard --logdir logs/fit +``` +Which will show an interface that looks something like this: +![](fig/03_tensorboard.png){alt='Screenshot of tensorboard'} +::: + +## 10. Save model + +Now that we have a somewhat acceptable model, let us not forget to save it for future users to benefit from our explorative efforts! + +```python +model.save('my_tuned_weather_model') +``` + +## Outlook +Correctly predicting tomorrow's sunshine hours is apparently not that simple. +Our models get the general trends right, but still predictions vary quite a bit and can even be far off. + +::: challenge +## Open question: What could be next steps to further improve the model? + +With unlimited options to modify the model architecture or to play with the training parameters, deep learning can trigger very extensive hunting for better and better results. +Usually models are "well behaving" in the sense that small changes to the architectures also only result in small changes of the performance (if any). +It is often tempting to hunt for some magical settings that will lead to much better results. But do those settings exist? +Applying common sense is often a good first step to make a guess of how much better results *could* be. +In the present case we might certainly not expect to be able to reliably predict sunshine hours for the next day with 5-10 minute precision. +But how much better our model could be exactly, often remains difficult to answer. + +* What changes to the model architecture might make sense to explore? +* Ignoring changes to the model architecture, what might notably improve the prediction quality? + +:::: solution +## Solution +This is an open question. And we don't actually know how far one could push this sunshine hour prediction (try it out yourself if you like! We're curious!). +But there are a few things that might be worth exploring. +Regarding the model architecture: + +* In the present case we do not see a magical silver bullet to suddenly boost the performance. But it might be worth testing if *deeper* networks do better (more layers). + +Other changes that might impact the quality notably: + +* The most obvious answer here would be: more data! Even this will not always work (e.g. if data is very noisy and uncorrelated, more data might not add much). +* Related to more data: use data augmentation. By creating realistic variations of the available data, the model might improve as well. +* More data can mean more data points (you can test it yourself by taking more than the 3 years we used here!) +* More data can also mean more features! What about adding the month? +* The labels we used here (sunshine hours) are highly biased, many days with no or nearly no sunshine but a few with >10 hours. Techniques such as oversampling or undersampling might handle such biased labels better. + +Another alternative would be to not only look at data from one day, but use the data of a longer period such as a full week. +This will turn the data into time series data which in turn might also make it worth to apply different model architectures... +:::: +::: + + +::: keypoints +- "Separate training, validation, and test sets allows monitoring and evaluating your model." +- "Batchnormalization scales the data as part of the model." +::: diff --git a/4-advanced-layer-types.md b/4-advanced-layer-types.md new file mode 100644 index 00000000..054b6195 --- /dev/null +++ b/4-advanced-layer-types.md @@ -0,0 +1,799 @@ +--- +title: "Advanced layer types" +teaching: 35 +exercises: 70 +--- + +::: questions +- "Why do we need different types of layers?" +- "What are good network designs for image data?" +- "What is a convolutional layer?" +- "How can we use different types of layers to prevent overfitting?" +::: + +::: objectives +- "Understand why convolutional and pooling layers are useful for image data" +- "Implement a convolutional neural network on an image dataset" +- "Use a drop-out layer to prevent overfitting" +::: + + +## Different types of layers +Networks are like onions: a typical neural network consists of many layers. In fact, the word *deep* in *Deep Learning* +refers to the many layers that make the network deep. + +So far, we have seen one type of layer, namely the **fully connected**, or **dense** layer. This layer is called fully connected, because all input neurons are taken into account by each output neuron. The number of parameters that need to be learned by the network, is thus in the order of magnitude of the number of input neurons times the number of hidden neurons. + +However, there are many different types of layers that perform different calculations and take different inputs. In this episode we will take a look at **convolutional layers** and **dropout layers**, which are useful in the context of image data, but also in many other types of (structured) data. + +## 1. Formulate / Outline the problem: Image classification +Keras comes with a few prepared datasets. We have a look at the [CIFAR10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html), +which is a widely known dataset for image classification. +```python +from tensorflow import keras +(train_images, train_labels), (val_images, val_labels) = keras.datasets.cifar10.load_data() +``` + +::: callout +## CERTIFICATE_VERIFY_FAILED error when downloading CIFAR-10 dataset +When loading the CIFAR-10 dataset, you might get the following error: +``` +[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1125) +``` +You can solve this error by adding this to your notebook: +```python +import ssl +ssl._create_default_https_context = ssl._create_unverified_context +``` +::: + +## CIFAR-10 + +The CIFAR-10 dataset consists of images of 10 different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. +It is widely used as a benchmark dataset for image classification. The low resolution of the images in the dataset allows for quick loading and testing models. + +For more information about this dataset and how it was collected you can check out +[Learning Multiple Layers of Features from Tiny Images by Alex Krizhevsky, 2009](https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf). + + +![Sample images from the CIFAR-10 data-set. Each image is labelled with a category, for example: 'frog' or 'horse'](fig/04_cifar10.png){alt="A 5 by 5 grid of 25 sample images from the CIFAR-10 data-set. Each image is labelled with a category, for example: 'frog' or 'horse'."} + +We take a small sample of the data as training set for demonstration purposes. +```python +n = 5000 +train_images = train_images[:n] +train_labels = train_labels[:n] +``` + + +## 2. Identify inputs and outputs + +## Explore the data + +Let's do a quick exploration of the dimensions of the data: +```python +train_images.shape +``` +```output +(5000, 32, 32, 3) +``` + +The first value, `5000`, is the number of training images that we have selected. +The remainder of the shape, namely `32, 32, 3)`, denotes +the dimension of one image. The last value 3 is typical for color images, +and stands for the three color channels **R**ed, **G**reen, **B**lue. + +::: challenge + +## Number of features CIFAR-10 + +How many features does one image in the CIFAR-10 dataset have? + +- A. 32 +- B. 1024 +- C. 3072 +- D. 5000 + + +:::: solution +The correct solution is C: 3072. There are 1024 pixels in one image (32 * 32), +each pixel has 3 channels (RGB). So 1024 * 3 = 3072. +:::: +::: + + +We can find out the range of values of our input data as follows: +```python +train_images.min(), train_images.max() +``` +```output +(0, 255) +``` +So the values of the three channels range between `0` and `255`. +Lastly, we inspect the dimension of the labels: +```python +train_labels.shape +``` + +```output +(5000, 1) +``` +So we have, for each image, a single value denoting the label. +To find out what the possible values of these labels are: +```python +train_labels.min(), train_labels.max() +``` + +```output +(0, 9) +``` + +The values of the labels range between `0` and `9`, denoting 10 different classes. + +## 3. Prepare data + +The training set consists of 50000 images of `32x32` pixels and 3 channels (RGB values). The RGB values are between `0` and `255`. For input of neural networks, it is better to have small input values. So we normalize our data between `0` and `1`: + + +```python +train_images = train_images / 255.0 +val_images = val_images / 255.0 +``` + +## 4. Choose a pretrained model or start building architecture from scratch + +## Convolutional layers +In the previous episodes, we used 'fully connected layers' , that connected all input values of a layer to all outputs of a layer. +This results in many connections, and thus many weights to be learned, in the network. +Note that our input dimension is now quite high (even with small pictures of `32x32` pixels): we have 3072 features. + +::: challenge +## Number of parameters +Suppose we create a single Dense (fully connected) layer with 100 hidden units that connect to the input pixels, how many parameters does this layer have? + +- A. 307200 +- B. 307300 +- C. 100 +- D. 3072 + +:::: solution +## Solution +The correct answer is B: Each entry of the input dimensions, i.e. the `shape` of one single data point, is connected with 100 neurons of our hidden layer, and each of these neurons has a bias term associated to it. So we have `307300` parameters to learn. +```python +width, height = (32, 32) +n_hidden_neurons = 100 +n_bias = 100 +n_input_items = width * height * 3 +n_parameters = (n_input_items * n_hidden_neurons) + n_bias +n_parameters +``` +```output +307300 +``` +We can also check this by building the layer in Keras: +```python +inputs = keras.Input(shape=dim) +outputs = keras.layers.Dense(100)(inputs) +model = keras.models.Model(inputs=inputs, outputs=outputs) +model.summary() +``` +```output +Model: "model" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input_1 (InputLayer) [(None, 3072)] 0 +_________________________________________________________________ +dense (Dense) (None, 100) 307300 +================================================================= +Total params: 307,300 +Trainable params: 307,300 +Non-trainable params: 0 +_________________________________________________________________ +``` +:::: +::: + +We can decrease the number of units in our hidden layer, but this also decreases the number of patterns our network can remember. Moreover, if we increase the image size, the number of weights will 'explode', even though the task of recognizing large images is not necessarily more difficult than the task of recognizing small images. + +The solution is that we make the network learn in a 'smart' way. The features that we learn should be similar both for small and large images, and similar features (e.g. edges, corners) can appear anywhere in the image (in mathematical terms: *translation invariant*). We do this by making use of a concepts from image processing that precede Deep Learning. + +A **convolution matrix**, or **kernel**, is a matrix transformation that we 'slide' over the image to calculate features at each position of the image. For each pixel, we calculate the matrix product between the kernel and the pixel with its surroundings. A kernel is typically small, between 3x3 and 7x7 pixels. We can for example think of the 3x3 kernel: +```output +[[-1, -1, -1], + [0, 0, 0] + [1, 1, 1]] +``` +This kernel will give a high value to a pixel if it is on a horizontal border between dark and light areas. +Note that for RGB images, the kernel should also have a depth of 3. + +In the following image, we see the effect of such a kernel on the values of a single-channel image. The red cell in the output matrix is the result of multiplying and summing the values of the red square in the input, and the kernel. Applying this kernel to a real image shows that it indeed detects horizontal edges. + +![](fig/04_conv_matrix.png){alt='Example of a convolution matrix calculation' style='width:90%'} + +![](fig/04_conv_image.png){alt='Convolution example on an image of a cat to extract features' style='width:100%'} + +In our **convolutional layer** our hidden units are a number of convolutional matrices (or kernels), where the values of the matrices are the weights that we learn in the training process. The output of a convolutional layer is an 'image' for each of the kernels, that gives the output of the kernel applied to each pixel. + +::: callout +## Playing with convolutions +Convolutions applied to images can be hard to grasp at first. Fortunately there are resources out +there that enable users to interactively play around with images and convolutions: + +- [Image kernels explained](https://setosa.io/ev/image-kernels/) shows how different convolutions can achieve certain effects on an image, like sharpening and blurring. +- [The convolutional neural network cheat sheet](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks#) + shows animated examples of the different components of convolutional neural nets +::: + +::: challenge +## Border pixels +What, do you think, happens to the border pixels when applying a convolution? + +:::: solution +## Solution +There are different ways of dealing with border pixels. +You can ignore them, which means that your output image is slightly smaller then your input. +It is also possible to 'pad' the borders, e.g. with the same value or with zeros, so that the convolution can also be applied to the border pixels. +In that case, the output image will have the same size as the input image. + +[This callout in the Data Carpentry: Image Processing with Python curriculum](https://datacarpentry.org/image-processing/06-blurring.html#callout4) +provides more detail about convolution at the boundaries of an image, +in the context of applying a _Gaussian blur_. +:::: +::: + +::: challenge +## Number of model parameters +Suppose we apply a convolutional layer with 100 kernels of size 3 * 3 * 3 (the last dimension applies to the rgb channels) to our images of 32 * 32 * 3 pixels. How many parameters do we have? Assume, for simplicity, that the kernels do not use bias terms. Compare this to the answer of the previous exercise + +:::: solution +## Solution +We have 100 matrices with 3 * 3 * 3 = 27 values each so that gives 27 * 100 = 2700 weights. This is a magnitude of 100 less than the fully connected layer with 100 units! Nevertheless, as we will see, convolutional networks work very well for image data. This illustrates the expressiveness of convolutional layers. +:::: +::: + +So let us look at a network with a few convolutional layers. We need to finish with a Dense layer to connect the output cells of the convolutional layer to the outputs for our classes. + +```python +inputs = keras.Input(shape=train_images.shape[1:]) +x = keras.layers.Conv2D(50, (3, 3), activation='relu')(inputs) +x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) +x = keras.layers.Flatten()(x) +outputs = keras.layers.Dense(10)(x) + +model = keras.Model(inputs=inputs, outputs=outputs, name="cifar_model_small") + +model.summary() +``` + +::: challenge +## Convolutional Neural Network + +Inspect the network above: + +* What do you think is the function of the `Flatten` layer? +* Which layer has the most parameters? Do you find this intuitive? +* (optional) Pick a model from https://paperswithcode.com/sota/image-classification-on-cifar-10 . Try to understand how it works. + +:::: solution +## Solution +* The Flatten layer converts the 28x28x50 output of the convolutional layer into a single one-dimensional vector, that can be used as input for a dense layer. +* The last dense layer has the most parameters. This layer connects every single output 'pixel' from the convolutional layer to the 10 output classes. +That results in a large number of connections, so a large number of parameters. This undermines a bit the expressiveness of the convolutional layers, that have much fewer parameters. +:::: +::: + +::: callout +## Search for existing architectures or pretrained models +So far in this course we have built neural networks from scratch, because we want you to fully understand the basics of Keras. +In the real world however, you would first search for existing solutions to your problem. +The CIFAR10 dataset lends itself extremely well for using existing models, since it is a standard +machine learning problem that is often used in deep learning research. + +You could for example search for 'CIFAR10 state-of-the-art Keras', and see if you can find any Keras implementations +of more advanced architectures that you could reuse. +A lot of the best-performing architectures for the CIFAR10 problem are convolutional neural networks or at least have some elements in common. +Therefore, we will introduce convolutional neural networks here, and the best way to teach you is by +developing a neural network from scratch! +::: + +::: instructor +## Demonstrate searching for existing architectures +At this point it can be nice to apply above callout box and demonstrate searching for state-of-the-art implementations. +If you google for 'CIFAR10 state-of-the-art Keras' one of the top search results links to [a GitHub repository](https://github.com/Adeel-Intizar/CIFAR-10-State-of-the-art-Model) +containing [a Jupyter notebook containing an implementation](https://github.com/Adeel-Intizar/CIFAR-10-State-of-the-art-Model/blob/master/CIFAR-10%20Best.ipynb). + +It can be a nice learning opportunity to go through the notebook and show that the learners should +already be familiar with a lot of the syntax (for example Conv2D, Dense, BatchNorm layers, adam optimizer, the deep learning workflow). +You can show that even though the model is much deeper, the input and output layer are still the same. +The aim is to demonstrate that what we are learning is really the basis for more complex models, +and you do not need to reinvent the wheel. + +Later in this episode when we evaluate the model it can be interesting to show how well +this more complex model performs on this dataset (93.3% accuracy). +::: + +## Pooling layers +Often in convolutional neural networks, the convolutional layers are intertwined with **Pooling layers**. As opposed to the convolutional layer, the pooling layer actually alters the dimensions of the image and reduces it by a scaling factor. It is basically decreasing the resolution of your picture. The rationale behind this is that higher layers of the network should focus on higher-level features of the image. By introducing a pooling layer, the subsequent convolutional layer has a broader 'view' on the original image. + +Let's put it into practice. We compose a Convolutional network with two convolutional layers and two pooling layers. + + +```python +def create_nn(): + inputs = keras.Input(shape=train_images.shape[1:]) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(inputs) + x = keras.layers.MaxPooling2D((2, 2))(x) # a new maxpooling layer + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) + x = keras.layers.MaxPooling2D((2, 2))(x) # a new maxpooling layer (same as maxpool) + x = keras.layers.Flatten()(x) + x = keras.layers.Dense(50, activation='relu')(x) # a new Dense layer + outputs = keras.layers.Dense(10)(x) + + model = keras.Model(inputs=inputs, outputs=outputs, name="cifar_model") + return model + +model = create_nn() +model.summary() +``` +```output +Model: "cifar_model" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + input_6 (InputLayer) [(None, 32, 32, 3)] 0 + + conv2d_13 (Conv2D) (None, 30, 30, 50) 1400 + + max_pooling2d_8 (MaxPooling (None, 15, 15, 50) 0 + 2D) + + conv2d_14 (Conv2D) (None, 13, 13, 50) 22550 + + max_pooling2d_9 (MaxPooling (None, 6, 6, 50) 0 + 2D) + + conv2d_15 (Conv2D) (None, 4, 4, 50) 22550 + + flatten_5 (Flatten) (None, 800) 0 + + dense_9 (Dense) (None, 50) 40050 + + dense_10 (Dense) (None, 10) 510 + +================================================================= +Total params: 87,060 +Trainable params: 87,060 +Non-trainable params: 0 +_________________________________________________________________ +``` +## 5. Choose a loss function and optimizer + +We compile the model using the adam optimizer (other optimizers could also be used here!). +Similar to the penguin classification task, we will use the crossentropy function to calculate the model's loss. +This loss function is appropriate to use when the data has two or more label classes. + +Remember that our target class is represented by a single integer, whereas the output of our network has 10 nodes, one for each class. +So, we should have actually one-hot encoded the targets and used a softmax activation for the neurons in our output layer! +Luckily, there is a quick fix to calculate crossentropy loss for data that +has its classes represented by integers, the `SparseCategoricalCrossentropy()` function. +Adding the argument `from_logits=True` accounts for the fact that the output has a linear activation instead of softmax. +This is what is often done in practice, because it spares you from having to worry about one-hot encoding. + + +```python +def compile_model(model): + model.compile(optimizer='adam', + loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), + metrics=['accuracy']) +compile_model(model) +``` + +## 6. Train the model + +We then train the model for 10 epochs: + +```python +history = model.fit(train_images, train_labels, epochs=10, + validation_data=(val_images, val_labels)) +``` + +## 7. Perform a Prediction/Classification +Here we skip performing a prediction, and continue to measuring the performance. +In practice, you will only do this step once in a while when you actually need to have the individual predictions, +often you know enough based on the evaluation metric scores. +Of course, behind the scenes whenever you measure performance you have to make predictions and compare them to the ground truth. + +## 8. Measure performance + +We can plot the training process using the history: + +```python +import seaborn as sns +import matplotlib.pyplot as plt +import pandas as pd + +def plot_history(history, metrics): + """ + Plot the training history + + Args: + history (keras History object that is returned by model.fit()) + metrics(str, list): Metric or a list of metrics to plot + """ + history_df = pd.DataFrame.from_dict(history.history) + sns.lineplot(data=history_df[metrics]) + plt.xlabel("epochs") + plt.ylabel("metric") +plot_history(history, ['accuracy', 'val_accuracy']) +``` +![](fig/04_training_history_1.png){alt='Plot of training accuracy and validation accuracy vs epochs for the trained model'} + +```python +plot_history(history, ['loss', 'val_loss']) +``` + +![](fig/04_training_history_loss_1.png){alt='Plot of training loss and validation loss vs epochs for the trained model'} + +It seems that the model is overfitting somewhat, because the validation accuracy and loss stagnates. + +::: instructor +## Comparison with a network with only dense layers +The callout box below compares the CNN approach with a network with only dense layers. +Depending on time, the following discussion can be extended in depth up to your liking. You have several options: + +1. It can be used as a good recap exercise. The exercise question is then: +'How does this simple CNN compare to a neural network with only dense layers? +Implement a dense neural network and compare its performance to that of the CNN'. +This will take 30-45 minutes and might deviate the focus away from CNNs. +2. You can demonstrate (no typing along), just to show how the network would look like and make the comparison. +3. You can just mention that a simple network with only dense layers reaches 35% accuracy, considerably worse than our simple CNN. +::: + +::: callout +## Comparison with a network with only dense layers +How does this simple CNN compare to a neural network with only dense layers? + +We can define a neural network with only dense layers: +```python +def create_dense_model(): + inputs = keras.Input(shape=train_images.shape[1:]) + x = keras.layers.Flatten()(inputs) + x = keras.layers.Dense(50, activation='relu')(x) + x = keras.layers.Dense(50, activation='relu')(x) + outputs = keras.layers.Dense(10)(x) + return keras.models.Model(inputs=inputs, outputs=outputs, + name='dense_model') + +dense_model = create_dense_model() +dense_model.summary() +``` +```output +Model: "dense_model" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + input_9 (InputLayer) [(None, 32, 32, 3)] 0 + + flatten_7 (Flatten) (None, 3072) 0 + + dense_21 (Dense) (None, 50) 153650 + + dense_22 (Dense) (None, 50) 2550 + + dense_23 (Dense) (None, 10) 510 + +================================================================= +Total params: 156710 (612.15 KB) +Trainable params: 156710 (612.15 KB) +Non-trainable params: 0 (0.00 Byte) +_________________________________________________________________ +``` +As you can see this model has 1.5x more parameters than our simple CNN, let's train and evaluate it! + +```python +compile_model(dense_model) +history = dense_model.fit(train_images, train_labels, epochs=30, + validation_data=(test_images, test_labels)) +plot_history(['accuracy', 'val_accuracy']) +``` +![](fig/04_dense_model_training_history.png){alt="Plot of training accuracy and validation accuracy vs epochs for a model with only dense layers"} + +As you can see the validation accuracy only reaches about 35%, whereas the CNN reached about 55% accuracy. + +This demonstrates that convolutional layers are a big improvement over dense layers for this kind of datasets. +::: + +## 9. Refine the model + +::: challenge +## Network depth +What, do you think, will be the effect of adding a convolutional layer to your model? Will this model have more or fewer parameters? +Try it out. Create a `model` that has an additional `Conv2d` layer with 50 filters after the last MaxPooling2D layer. Train it for 20 epochs and plot the results. + +**HINT**: +The model definition that we used previously needs to be adjusted as follows: +```python +inputs = keras.Input(shape=train_images.shape[1:]) +x = keras.layers.Conv2D(50, (3, 3), activation='relu')(inputs) +x = keras.layers.MaxPooling2D((2, 2))(x) +x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) +x = keras.layers.MaxPooling2D((2, 2))(x) +# Add your extra layer here +x = keras.layers.Flatten()(x) +x = keras.layers.Dense(50, activation='relu')(x) +outputs = keras.layers.Dense(10)(x) +``` + +:::: solution + +## Solution +We add an extra Conv2D layer after the second pooling layer: +```python +def create_nn_extra_layer(): + inputs = keras.Input(shape=train_images.shape[1:]) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(inputs) + x = keras.layers.MaxPooling2D((2, 2))(x) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) + x = keras.layers.MaxPooling2D((2, 2))(x) # + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) # estra layer + x = keras.layers.Flatten()(x) + x = keras.layers.Dense(50, activation='relu')(x) # a new Dense layer + outputs = keras.layers.Dense(10)(x) + + model = keras.Model(inputs=inputs, outputs=outputs, name="cifar_model") + return model + +model = create_nn_extra_layer() +``` + +With the model defined above, we can inspect the number of parameters: +```python +model.summary() +``` +```output +Model: "cifar_model" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input_7 (InputLayer) [(None, 32, 32, 3)] 0 +conv2d_16 (Conv2D) (None, 30, 30, 50) 1400 +max_pooling2d_10 (MaxPoolin (None, 15, 15, 50) 0 +g2D) +conv2d_17 (Conv2D) (None, 13, 13, 50) 22550 +max_pooling2d_11 (MaxPoolin (None, 6, 6, 50) 0 +g2D) +conv2d_18 (Conv2D) (None, 4, 4, 50) 22550 +flatten_6 (Flatten) (None, 800) 0 +dense_11 (Dense) (None, 50) 40050 +dense_12 (Dense) (None, 10) 510 +================================================================= +Total params: 87,060 +Trainable params: 87,060 +Non-trainable params: 0 +_________________________________________________________________ +``` +The number of parameters has decreased by adding this layer. +We can see that the conv layer decreases the resolution from 6x6 to 4x4, +as a result, the input of the Dense layer is smaller than in the previous network. +To train the network and plot the results: +```python +compile_model(model) +history = model.fit(train_images, train_labels, epochs=20, + validation_data=(val_images, val_labels)) +plot_history(history, ['accuracy', 'val_accuracy']) +``` +![](fig/04_training_history_2.png){alt="Plot of training accuracy and validation accuracy vs epochs for the trained model"} +```python +plot_history(history, ['loss', 'val_loss']) +``` + +![](/fig/04_training_history_loss_2.png){alt="Plot of training loss and validation loss vs epochs for the trained model"} + +:::: +::: + +::: callout +## Other types of data +Convolutional and Pooling layers are also applicable to different types of +data than image data. Whenever the data is ordered in a (spatial) dimension, +and *translation invariant* features are expected to be useful, convolutions +can be used. Think for example of time series data from an accelerometer, +audio data for speech recognition, or 3d structures of chemical compounds. +::: + +::: challenge +## Why and when to use convolutional neural networks +1. Would it make sense to train a convolutional neural network (CNN) on the penguins dataset and why? +2. Would it make sense to train a CNN on the weather dataset and why? +3. (Optional) Can you think of a different machine learning task that would benefit from a + CNN architecture? + +:::: solution +## Solution +1. No that would not make sense. Convolutions only work when the features of the data can be ordered + in a meaningful way. Pixels for example are ordered in a spatial dimension. + This kind of order cannot be applied to the features of the penguin dataset. + If we would have pictures or audio recordings of the penguins as input data + it would make sense to use a CNN architecture. +2. It would make sense, but only if we approach the problem from a different angle then we did before. + Namely, 1D convolutions work quite well on sequential data such as timeseries. If we have as our input a matrix + of the different weather conditions over time in the past x days, a CNN would be suited to quickly grasp + the temporal relationship over days. +3. Some example domains in which CNNs are applied: + - Text data + - Timeseries, specifically audio + - Molecular structures +:::: +::: + +## Dropout + +Note that the training loss continues to decrease, while the validation loss stagnates, and even starts to increase over the course of the epochs. Similarly, the accuracy for the validation set does not improve anymore after some epochs. This means we are overfitting on our training data set. + +Techniques to avoid overfitting, or to improve model generalization, are termed **regularization techniques**. +One of the most versatile regularization technique is **dropout** ([Srivastava et al., 2014](https://jmlr.org/papers/v15/srivastava14a.html)). +Dropout means that during each training cycle (one forward pass of the data through the model) a random fraction of neurons in a dense layer are turned off. +This is described with the dropout rate between 0 and 1 which determines the fraction of nodes to silence at a time. + +![](fig/neural_network_sketch_dropout.png){alt='A sketch of a neural network with and without dropout'} + +The intuition behind dropout is that it enforces redundancies in the network by constantly removing different elements of a network. The model can no longer rely on individual nodes and instead must create multiple "paths". In addition, the model has to make predictions with much fewer nodes and weights (connections between the nodes). +As a result, it becomes much harder for a network to memorize particular features. At first this might appear a quite drastic approach which affects the network architecture strongly. +In practice, however, dropout is computationally a very elegant solution which does not affect training speed. And it frequently works very well. + +**Important to note:** Dropout layers will only randomly silence nodes during training! During a predictions step, all nodes remain active (dropout is off). During training, the sample of nodes that are silenced are different for each training instance, to give all nodes a chance to observe enough training data to learn its weights. + +Let us add one dropout layer towards the end of the network, that randomly drops 80% of the nodes. + +```python +def create_nn_with_dropout(): + inputs = keras.Input(shape=train_images.shape[1:]) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(inputs) + x = keras.layers.MaxPooling2D((2, 2))(x) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) + x = keras.layers.MaxPooling2D((2, 2))(x) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) + x = keras.layers.Dropout(0.8)(x) # This is new! + x = keras.layers.Flatten()(x) + x = keras.layers.Dense(50, activation='relu')(x) + outputs = keras.layers.Dense(10)(x) + model = keras.Model(inputs=inputs, outputs=outputs, name="cifar_model") + return model + +model_dropout = create_nn_with_dropout() +model_dropout.summary() +``` +```output +Model: "cifar_model" +_________________________________________________________________ + Layer (type) Output Shape Param # +================================================================= + input_8 (InputLayer) [(None, 32, 32, 3)] 0 + + conv2d_19 (Conv2D) (None, 30, 30, 50) 1400 + + max_pooling2d_12 (MaxPoolin (None, 15, 15, 50) 0 + g2D) + + conv2d_20 (Conv2D) (None, 13, 13, 50) 22550 + + max_pooling2d_13 (MaxPoolin (None, 6, 6, 50) 0 + g2D) + + conv2d_21 (Conv2D) (None, 4, 4, 50) 22550 + + dropout_2 (Dropout) (None, 4, 4, 50) 0 + + flatten_7 (Flatten) (None, 800) 0 + + dense_13 (Dense) (None, 50) 40050 + + dense_14 (Dense) (None, 10) 510 + +================================================================= +Total params: 87,060 +Trainable params: 87,060 +Non-trainable params: 0 +_________________________________________________________________ +``` + +We can see that the dropout does not alter the dimensions of the image, and has zero parameters. + +We again compile and train the model. +```python +compile_model(model_dropout) + +history = model_dropout.fit(train_images, train_labels, epochs=20, + validation_data=(val_images, val_labels)) +``` + +And inspect the training results: +```python +plot_history(history, ['accuracy', 'val_accuracy']) + +val_loss, val_acc = model_dropout.evaluate(val_images, val_labels, verbose=2) +``` +```output +313/313 - 2s - loss: 1.4683 - accuracy: 0.5307 +``` + +![](fig/04_training_history_3.png){alt="Plot of training accuracy and validation accuracy vs epochs for the trained model"} + +```python +plot_history(history, ['loss', 'val_loss']) +``` + +![](fig/04_training_history_loss_3.png){alt="Plot of training loss and validation loss vs epochs for the trained model"} + + +Now we see that the gap between the training accuracy and validation accuracy is much smaller, and that the final accuracy on the validation set is higher than without dropout. +Nevertheless, there is still some difference between the training loss and validation loss, so we could experiment with regularization even more. + +::: challenge +## Vary dropout rate +1. What do you think would happen if you lower the dropout rate? Try it out, and + see how it affects the model training. +2. You are varying the dropout rate and checking its effect on the model performance, + what is the term associated to this procedure? + +:::: solution +## Solution +### 1. Varying the dropout rate +The code below instantiates and trains a model with varying dropout rates. +You can see from the resulting plot that the ideal dropout rate in this case is around 0.45. +This is where the val loss is lowest. + +Note that it can take a while to train these 5 networks. + +```python +def create_nn_with_dropout(dropout_rate): + inputs = keras.Input(shape=train_images.shape[1:]) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(inputs) + x = keras.layers.MaxPooling2D((2, 2))(x) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) + x = keras.layers.MaxPooling2D((2, 2))(x) + x = keras.layers.Conv2D(50, (3, 3), activation='relu')(x) + x = keras.layers.Dropout(dropout_rate)(x) + x = keras.layers.Flatten()(x) + x = keras.layers.Dense(50, activation='relu')(x) + outputs = keras.layers.Dense(10)(x) + model = keras.Model(inputs=inputs, outputs=outputs, name="cifar_model") + return model + +dropout_rates = [0.15, 0.3, 0.45, 0.6, 0.75] +val_losses = [] +for dropout_rate in dropout_rates: + model_dropout = create_nn_with_dropout(dropout_rate) + compile_model(model_dropout) + model_dropout.fit(train_images, train_labels, epochs=20, + validation_data=(val_images, val_labels)) + + val_loss, val_acc = model_dropout.evaluate(val_images, val_labels) + val_losses.append(val_loss) + +loss_df = pd.DataFrame({'dropout_rate': dropout_rates, 'val_loss': val_losses}) + +sns.lineplot(data=loss_df, x='dropout_rate', y='val_loss') +``` + +![](fig/04_vary_dropout_rate.png){alt="Plot of vall loss vs dropout rate used in the model. The val loss varies between 1.26 and 1.40 and is lowest with a dropout_rate around 0.45."} + + +### 2. Term associated to this procedure +This is called hyperparameter tuning. +:::: +::: + +## 10. Share model +Let's save our model + +```python +model.save('cnn_model') +``` + +::: keypoints +- "Convolutional layers make efficient reuse of model parameters." +- "Pooling layers decrease the resolution of your input" +- "Dropout is a way to prevent overfitting" +::: diff --git a/5-outlook.md b/5-outlook.md new file mode 100644 index 00000000..1c46c1aa --- /dev/null +++ b/5-outlook.md @@ -0,0 +1,161 @@ +--- +title: "Outlook" +teaching: 15 +exercises: 15 +--- + +::: questions +- "How does what I learned in this course translate to real-world problems?" +- "How do I organise a deep learning project?" +- "What are next steps to take after this course?" +::: + +::: objectives +- "Understand that what we learned in this course can be applied to real-world problems" +- "Use best practices for organising a deep learning project" +- "Identify next steps to take after this course" +::: + +You have come to the end of this course. +In this episode we will look back at what we have learned so far, how to apply that to real-world problems, and identify +next steps to take to start applying deep learning in your own projects. + +## Real-world application +To introduce the core concepts of deep learning we have used quite simple machine learning problems. +But how does what we learned so far apply to real-world applications? + +To illustrate that what we learned is actually the basis of succesful applications in research, +we will have a look at an example from the field of cheminformatics. + +We will have a look at [this notebook](https://github.com/matchms/ms2deepscore/blob/0.4.0/notebooks/MS2DeepScore_tutorial.ipynb). +It is part of the codebase for [this paper](https://doi.org/10.1186/s13321-021-00558-4). + +In short, the deep learning problem is that of finding out how similar two molecules are in terms of their molecular properties, +based on their mass spectrum. +You can compare this to comparing two pictures of animals, and predicting how similar they are. +A siamese neural network is used to solve the problem. +In a siamese neural network you have two input vectors, let's say two images of animals or two mass spectra. +They pass through a base network. Instead of outputting a class or number with one or a few output neurons, the output layer +of the base network is a whole vector of for example 100 neurons. After passing through the base network, you end up with two of these +vectors representing the two inputs. The goal of the base network is to output a meaningful representation of the input (this is called an embedding). +The next step is to compute the cosine similarity between these two output vectors, +cosine similarity is a measure for how similar two vectors are to each other, ranging from 0 (completely different) to 1 (identical). +This cosine similarity is compared to the actual similarity between the two inputs and this error is used to update the weights in the network. + +Don't worry if you do not fully understand the deep learning problem and the approach that is taken here. +We just want you to appreciate that you already learned enough to be able to do this yourself in your own domain. + +::: instructor +You don't have to use this project as an example. +It works best to use a suitable deep learning project that you know well and are passionate about. +::: +::: challenge +## Exercise: A real-world deep learning application + +1. Looking at the 'Model training' section of the notebook, what do you recognize from what you learned in this course? +2. Can you identify the different steps of the deep learning workflow in this notebook? +3. (Optional): Try to understand the neural network architecture from the first figure of [the paper](https://doi.org/10.1186/s13321-021-00558-4). + a. Why are there 10.000 neurons in the input layer? + b. What do you think would happen if you would decrease the size of spectral embedding layer drastically, to for example 5 neurons? + +:::: solution +## Solution +1. The model summary for the Siamese model is more complex than what we have seen so far, +but it is basically a repetition of Dense, BatchNorm, and Dropout layers. +The syntax for training and evaluating the model is the same as what we learned in this course. +EarlyStopping as well as the Adam optimizer is used. +2. The different steps are not as clearly defined as in this course, but you should be able to identify '3: Data preparation', +'4: Choose a pretrained model or start building architecture from scratch', '5: Choose a loss function and optimizer', '6: Train the model', +'7: Make predictions' (which is called 'Model inference' in this notebook), and '10: Save model'. +3. (optional) + a. Because the shape of the input is 10.000. More specifically, the spectrum is binned into a size 10.000 vector, + apparently this is a good size to represent the mass spectrum. + b. This would force the neural network to have a representation of the mass spectrum in only 5 numbers. + This representation would probably be more generic, but might fail to capture all the characteristics found in the spectrum. + This would likely result in underfitting. +:::: +::: + +Hopefully you can appreciate that what you learned in this course, can be applied to real-world problems as well. + +::: callout +## Extensive data preparation +You might have noticed that the data preparation for this example is much more extensive than what we have done so far +in this course. This is quite common for applied deep learning projects. It is said that 90% of the time in a +deep learning problem is spent on data preparation, and only 10% on modeling! +::: + +::: discussion +## Discussion: Large Language Models and prompt engineering +Large Language Models (LLMs) are deep learning models that are able to perform general-purpose language generation. +They are trained on large amounts of texts, such all pages of Wikipedia. +In recent years the quality of LLMs language understanding and generation has increased tremendously, and since the launch of generative chatbot ChatGPT in 2022 the power of LLMs is now appreciated by the general public. + +It is becoming more and more feasible to unleash this power in scientific research. For example, the authors of [Zheng et al. (2023)](https://doi.org/10.1021/jacs.3c05819) guided ChatGPT in the automation of extracting chemical information from a large amount of research articles. The authors did not implement a deep learning model themselves, but instead they designed the right input for ChatGPT (called a 'prompt') that would produce optimal outputs. This is called prompt engineering. A highly simplified example of such a prompt would be: "Given compounds X and Y and context Z, what are the chemical details of the reaction?" + +Developments in LLM research are moving fast, at the end of 2023 the newest ChatGPT version [could take images and sound as input](https://openai.com/blog/chatgpt-can-now-see-hear-and-speak). +In theory, this means that you can solve the Cifar-10 image classification problem from the previous episode by prompt engineering, with prompts similar to "Which out of these categories: [LIST OF CATEGORIES] is depicted in the image". + +**Discuss the following statement with your neighbors:** + +_In a few years most machine learning problems in scientific research can be solved with prompt engineering._ +::: + +## Organising deep learning projects +As you might have noticed already in this course, deep learning projects can quickly become messy. +Here follow some best practices for keeping your projects organized: + +### 1. Organise experiments in notebooks +Jupyter notebooks are a useful tool for doing deep learning experiments. +You can very easily modify your code bit by bit, and interactively look at the results. +In addition you can explain why you are doing things in markdown cells. +- As a rule of thumb do one approach or experiment in one notebook. +- Give consistent and meaningful names to notebooks, such as: `01-all-cities-simple-cnn.ipynb` +- Add a rationale on top and a conclusion on the bottom of each notebook + +[_Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks_](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007007) provides further advice on how to maximise the usefulness and reproducibility of experiments captured in a notebook. + +### 2. Use Python modules +Code that is repeatedly used should live in a Python module and not be copied to multiple notebooks. +You can import functions and classes from the module(s) in the notebooks. +This way you can remove a lot of code definition from your notebooks and have a focus on the actual experiment. + +### 3. Keep track of your results in a central place +Always evaluate your experiments in the same way, on the exact same test set. +Document the results of your experiments in a consistent and meaningful way. +You can use a simple spreadsheet such as this: + +| MODEL NAME | MODEL DESCRIPTION | RMSE | TESTSET NAME | GITHUB COMMIT | COMMENTS | +|-------------------------|--------------------------------------------|------|---------------|---------------|----------| +| weather_prediction_v1.0 | Basel features only, 10 years. nn: 100-50 | 3.21 | 10_years_v1.0 | ed28d85 | | +| weather_prediction_v1.1 | all features, 10 years. nn: 100-50 | 3.35 | 10_years_v1.0 | 4427b78 | | + +You could also use a tool such as [Weights and Biases](https://wandb.ai/site) for this. + +::: callout +## Cookiecutter data science +If you want to get more pointers for organising deep learning, or data science projects in general, +we recommend [Cookiecutter data science](https://drivendata.github.io/cookiecutter-data-science/). +It is a template for initiating an organized data science project folder structure +that you can adapt to your own needs. +::: +## Next steps +You now understand the basic principles of deep learning and are able to implement your own deep learning pipelines in Python. +But there is still so much to learn and do! + +Here are some suggestions for next steps you can take in your endeavor to become a deep learning expert: + +* Learn more by going through a few of [the learning resources we have compiled for you](learners/reference.md#external-references) +* Apply what you have learned to your own projects. Use the deep learning workflow to structure your work. +Start as simple as possible, and incrementally increase the complexity of your approach. +* Compete in a [Kaggle competition](https://www.kaggle.com/competitions) to practice what you have learned. +* Get access to a GPU. Your deep learning experiments will progress much quicker if you have to wait for your network to train +in a few seconds instead of hours (which is the order of magnitude of speedup you can expect from training on a GPU instead of CPU). +Tensorflow/Keras will automatically detect and use a GPU if it is available on your system without any code changes. +A simple and quick way to get access to a GPU is to use [Google Colab](https://colab.google/) + +::: keypoints +- "Although the data preparation and model architectures are somewhat more complex, +what we have learned in this course can directly be applied to real-world problems" +- "Use what you have learned in this course as a basis for your own learning trajectory in the world of deep learning" +::: diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000..f19b8049 --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,13 @@ +--- +title: "Contributor Code of Conduct" +--- + +As contributors and maintainers of this project, +we pledge to follow the [The Carpentries Code of Conduct][coc]. + +Instances of abusive, harassing, or otherwise unacceptable behavior +may be reported by following our [reporting guidelines][coc-reporting]. + + +[coc-reporting]: https://docs.carpentries.org/topic_folders/policies/incident-reporting.html +[coc]: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html diff --git a/LICENSE.md b/LICENSE.md new file mode 100644 index 00000000..7632871f --- /dev/null +++ b/LICENSE.md @@ -0,0 +1,79 @@ +--- +title: "Licenses" +--- + +## Instructional Material + +All Carpentries (Software Carpentry, Data Carpentry, and Library Carpentry) +instructional material is made available under the [Creative Commons +Attribution license][cc-by-human]. The following is a human-readable summary of +(and not a substitute for) the [full legal text of the CC BY 4.0 +license][cc-by-legal]. + +You are free: + +- to **Share**---copy and redistribute the material in any medium or format +- to **Adapt**---remix, transform, and build upon the material + +for any purpose, even commercially. + +The licensor cannot revoke these freedoms as long as you follow the license +terms. + +Under the following terms: + +- **Attribution**---You must give appropriate credit (mentioning that your work + is derived from work that is Copyright (c) The Carpentries and, where + practical, linking to ), provide a [link to the + license][cc-by-human], and indicate if changes were made. You may do so in + any reasonable manner, but not in any way that suggests the licensor endorses + you or your use. + +- **No additional restrictions**---You may not apply legal terms or + technological measures that legally restrict others from doing anything the + license permits. With the understanding that: + +Notices: + +* You do not have to comply with the license for elements of the material in + the public domain or where your use is permitted by an applicable exception + or limitation. +* No warranties are given. The license may not give you all of the permissions + necessary for your intended use. For example, other rights such as publicity, + privacy, or moral rights may limit how you use the material. + +## Software + +Except where otherwise noted, the example programs and other software provided +by The Carpentries are made available under the [OSI][osi]-approved [MIT +license][mit-license]. + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +## Trademark + +"The Carpentries", "Software Carpentry", "Data Carpentry", and "Library +Carpentry" and their respective logos are registered trademarks of [Community +Initiatives][ci]. + +[cc-by-human]: https://creativecommons.org/licenses/by/4.0/ +[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode +[mit-license]: https://opensource.org/licenses/mit-license.html +[ci]: https://communityin.org/ +[osi]: https://opensource.org diff --git a/bonus-material.md b/bonus-material.md new file mode 100644 index 00000000..ad8296e8 --- /dev/null +++ b/bonus-material.md @@ -0,0 +1,199 @@ +--- +title: Bonus material +--- + +## ML Pipeline Visualisation + +To apply Deep Learning to a problem there are several steps we need to go through: + + +![A visualisation of the Machine Learning Pipeline](../episodes/fig/graphviz/pipeline.png){alt="A flow diagram illustrating the ten-step deep learning pipeline described in the lesson introduction."} + +Feel free to use this figure as [png](../episodes/fig/graphviz/pipeline.png). The figure is contained in `fig/graphviz/` of this repository. Use the `Makefile` there in order to reproduce it in different output formats. + +## Optional part - prediction uncertainty using Monte-Carlo Dropout +Depending on the data and the question asked, model predictions can be highly accuracte. Or, as in the present case, show a high degree of error. +In both cases, however, it often is highly relevant to not get model predictions but also an estimate of how reliable those predictions are. +The last years, this has been a very dynamic, rapidly growing area and there are many different ways to do uncertainty evaluation in deep learning. +Here we want to present a very versatile and easy-to-implement method: **Monte-Carlo Dropout** (original reference: https://arxiv.org/abs/1506.02142). + +The name of the technique refers to a very common regularization technique: **Dropout**. So let's first introduce this: + +## Dropout: make it harder to memorize things + +One of the most versatile regularization technique is **dropout**. +Dropout essentially means that during each training cycle a random fraction of the dense layer nodes are turned off. This is described with the dropout rate between 0 and 1 which determines the fraction of nodes to silence at a time. + +![Dropout sketch](../episodes/fig/neural_network_sketch_dropout.png){alt='A sketch of a neural network with and without dropout'} + +The intuition behind dropout is that it enforces redundancies in the network by constantly removing different elements of a network. The model can no longer rely on individual nodes and instead must create multiple "paths". In addition, the model has to make predictions with much fewer nodes and weights (connections between the nodes). +As a result, it becomes much harder for a network to memorize particular features. At first this might appear a quiet drastic approach which affects the network architecture strongly. +In practice, however, dropout is computationally a very elegant solution which does not affet training speed. And it frequently works very well. + +**Important to note:** Dropout layers will only randomly silence nodes during training! During a predictions step, all nodes remain active (dropout is off). + +Let's add dropout to our neural network which we will do by using keras `Dropout` layer (documentation & reference: https://keras.io/api/layers/regularization_layers/dropout/). +One additional change that we will make here is to lower the learning rate because in the last training example the losses seemed to fluctuate a lot. +```python +def create_nn(n_features, n_predictions): + # Input layer + layers_input = keras.layers.Input(shape=(n_features,), name='input') + + # Dense layers + layers_dense = keras.layers.Dense(100, 'relu')(layers_input) + layers_dense = keras.layers.Dropout(rate=0.2)(layers_dense) + layers_dense = keras.layers.Dense(50, 'relu')(layers_dense) + layers_dense = keras.layers.Dropout(rate=0.2)(layers_dense) + + # Output layer + layers_output = keras.layers.Dense(n_predictions)(layers_dense) + + # Defining the model and compiling it + return keras.Model(inputs=layers_input, outputs=layers_output, name="model_dropout") + +model = create_nn(X_data.shape[1], 1) +model.compile(loss='mse', optimizer=keras.optimizers.Adam(1e-4), metrics=[keras.metrics.RootMeanSquaredError()]) +model.summary() +``` + +```output +Model: "model_dropout" +_________________________________________________________________ +Layer (type) Output Shape Param # +================================================================= +input (InputLayer) [(None, 163)] 0 +_________________________________________________________________ +dense_12 (Dense) (None, 100) 16400 +_________________________________________________________________ +dropout (Dropout) (None, 100) 0 +_________________________________________________________________ +dense_13 (Dense) (None, 50) 5050 +_________________________________________________________________ +dropout_1 (Dropout) (None, 50) 0 +_________________________________________________________________ +dense_14 (Dense) (None, 1) 51 +================================================================= +Total params: 21,501 +Trainable params: 21,501 +Non-trainable params: 0 +_________________________________________________________________ +``` + +Compared to the models above, this required little changes. We add two `Dropout` layers, one after each dense layer and specify the dropout rate. +Here we use `rate=0.2` which means that at any training step 20% of all nodes will be turned off. +You can also see that Dropout layers do not add additional parameters. +Now, let's train our new model and plot the losses: + +```python +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 1000, + validation_data=(X_val, y_val), + callbacks=[earlystopper], + verbose = 2) + +history_df = pd.DataFrame.from_dict(history.history) +sns.lineplot(data=history_df[['root_mean_squared_error', 'val_root_mean_squared_error']]) +plt.xlabel("epochs") +plt.ylabel("RMSE") +``` + +![Output of plotting sample](../episodes/fig/03_training_history_4_rmse_dropout.png){alt='Output of plotting sample'} + + +In this setting overfitting seems to be pervented succesfully. The overall results though have not improved (at least not by much). +Above we have used dropout to randomly turn off network nodes during training. +When doing predictions, dropout is automatically deactivated and all nodes stay active. +Each time you run the same input data through the same trained model, the prediciton will be exactly the same. + +Monte-Carlo Dropout relies on a simply change: dropout will remain active during prediction! +This means that each time a prediction step is done, the model will look differently because a fraction of all nodes will be turned off randomly. +One can interpret all of those random variations as individual models. +Monte-Carlo Dropout now makes use of this fact and collects many different predictions instead of only one. +At the end this collection of predictions can be combined to a mean (or a median) prediction. +And the variation of all predictions can tell something about the model's uncertainty. + +A simple (and a bit hacky) way to enforce dropout layers to remain active is to add `training=True` to the model: +```python +def create_nn(n_features, n_predictions): + # Input layer + layers_input = keras.layers.Input(shape=(n_features,), name='input') + + # Dense layers + layers_dense = keras.layers.BatchNormalization()(layers_input) + layers_dense = keras.layers.Dense(100, 'relu')(layers_dense) + layers_dense = keras.layers.Dropout(rate=0.2)(layers_dense, training=True) + layers_dense = keras.layers.Dense(50, 'relu')(layers_dense) + layers_dense = keras.layers.Dropout(rate=0.2)(layers_dense, training=True) + + # Output layer + layers_output = keras.layers.Dense(n_predictions)(layers_dense) + + # Defining the model and compiling it + return keras.Model(inputs=layers_input, outputs=layers_output, name="model_monte_carlo_dropout") + +model = create_nn(X_data.shape[1], 1) +model.compile(loss='mse', optimizer=Adam(1e-4), metrics=[keras.metrics.RootMeanSquaredError()]) +``` + +Model training remains entirely unchanged: +```python +history = model.fit(X_train, y_train, + batch_size = 32, + epochs = 1000, + validation_data=(X_val, y_val), + callbacks=[earlystopper], + verbose = 2) +``` + +But when now doing predictions, things will look different. +Let us do two predictions an compare the results. + +```python +y_test_predicted1 = model.predict(X_test) +y_test_predicted2 = model.predict(X_test) + +y_test_predicted1[:10], y_test_predicted2[:10] +``` + +This should give two arrays with different float numbers. + +We can now compute predictions for a larger ensemble, say 100 random variations of the same model: +``` +from tqdm.notebook import tqdm # optional: to add progress bar + +n_ensemble = 100 +y_test_predicted_ensemble = np.zeros((X_test.shape[0], n_ensemble)) + +for i in tqdm(range(n_ensemble)): # or: for i in range(n_ensemble): + y_test_predicted_ensemble[:, i] = model.predict(X_test)[:,0] +``` + +This will give an array of predictions, 100 different predictions for each datapoint in `X_test`. +We can inspect an example distribution, for instance by plotting a histrogram: + +```python +plt.hist(y_test_predicted_ensemble[0,:], rwidth=0.9) +plt.xlabel("predicted sunshine hours") +``` + +![Output of plotting sample](../episodes/fig/03_monte_carlo_dropout_distribution_example.png){alt='bar plot summarising distribution of frequencies of predictions with different numbers of hours of sunshine'} + + +Instead of full distributions for every datapoint we might also just want to extract the mean and standard deviation. +``` +y_test_predicted_mean = np.mean(y_test_predicted_ensemble, axis=1) +y_test_predicted_std = np.std(y_test_predicted_ensemble, axis=1) +``` + +This can then be plotted again as a scatter plot, but now with added information on the model uncertainty. +```python +plt.figure(figsize=(5, 5), dpi=100) +plt.scatter(y_test_predicted_mean, y_test, s=40*y_test_predicted_std, + c=y_test_predicted_std, alpha=0.5) +plt.xlabel("predicted") +plt.ylabel("true values") +``` + +![Output of plotting sample](../episodes/fig/03_scatter_plot_model_uncertainty.png){alt='scatter plot of mean predicted hours of sunshine against true values, colored by standard deviation of the predictions, showing some correlation between the predictions from the model and the observed data'} + diff --git a/design.md b/design.md new file mode 100644 index 00000000..76df54d8 --- /dev/null +++ b/design.md @@ -0,0 +1,120 @@ +--- +title: Lesson design +--- + +This page documents the design process and motivation of this lesson material. + +**Lesson Title: An Introduction to Deep Learning** + +## Target audience + +The main audience of this carpentry lesson is PhD students that have little to no experience with deep learning. In addition, we expect them to know basics of statistics and machine learning. + +### Notes + +- Probably have overhyped expectations of deep learning. +- They don’t know if it’s the right tool for their situations. +- They have no idea what it takes to actually do deep learning. +- Want to quickly have some useful skills for their own data. + +#### Required Pre-Knowledge + +- **Python** – Previous programming experience in Python is required (Refer to Python Data Carpentry Lesson) +- **Pandas** – Knowledge of the Pandas Python package +- **Basic Machine Learning Knowledge** – Data cleaning, train & test split, overfitting & underfitting, metrics (accuracy, recall, etc.), + +## Learning objectives + +> ## Overview +> After following this lesson, learners will be able to: +> +> - Prepare input data for use for deep learning +> - Design and train a Deep Neural Network +> - Troubleshoot the learning process +> - Measure the performance of the network +> - Visualizing data and results +> - Re-use existing network architectures with and without pre-trained weights +> + + +The following offers more details to each learning objective based on Bloom's Taxonomy. For hints on how to use this approach, see [lesson 15 of the instructor training](https://carpentries.github.io/instructor-training/15-lesson-study/index.html) + +### Prepare input data for use for deep learning + +This includes cleaning data, filling missing values, normalizing, and transforming categorical columns into dummy encoding. + +After this module, learners can ... + +- define a checklist for data analysis steps before applying Deep Learning to the data +- describe criteria by which to judge good or bad data, e.g. how a column's values should be distributed +- execute a min-max normalization on floating point data +- sketch how to insert missing timestamps or literal values (i.e. factors or non-numeric entries) +- implement a transformation of categorical values into a numerical encoding (`int8`) +- argue for or against strategies to normalize data +- formulate techniques to prepare (clean) data for training a deep learning network + +### Design and train a Deep Neural Network + +This includes knowledge of when to different types of layers + +After this module, learners can ... + +- list/repeat the three ingredients to a feed forward network: input, hidden layers, output +- classify/categorize parts of a feed forward network when presented a network architecture (as from `keras.model.summary()`) +- describe a fully connected (dense) layer +- describe a convolutional layer +- describe a max pooling layer +- describe an activation function +- describe a softmax layer +- argue against abundant use of the sigmoid function (exploding/vanishing gradients) +- calculate the output data shape of an image when transformed by a fixed convolutional layer +- interpret errors with convolutional layers +- execute a 3 layer network on the MNIST data (or similar) +- differentiate a dense layer and a convolutional layer +- experiment with values of dense layer and a convolutional layer +- select a layer type depending on the input data +- develop a 5 layer network that comprises both layer types + +### Monitoring and Troubleshooting the learning process + +Often when designing neural networks training will not automatically work very well. This requires setting the parameters of the training algorithm correctly, modifying the design of the network or changing the data pre-processing. After training, the performance of the network should be checked to prevent overfitting. + +After this module, learners can ... + +- define precision and recall/accuracy for a classification task +- state that cross-validation is used in Deep Learning too +- describe how to split a dataset into training/test/validation set +- describe how Drop-Out Layers work +- execute a plot to draw the loss per epoch for training and test set +- compare values of precision and recall +- differentiate a overfitting network from a well-behaved network +- detect when a network is underfitting or overfitting +- design countermeasures for overfitting (e.g. more dropout layers, reduce model size) +- design countermeasures for underfitting (e.g. larger model) +- critique a provided network design + +### Visualizing Data and Results + +Within each episode how to visualize data and results + +After this module, learners can ... + +- identify important plots to create at the end of training (provide selected samples and their prediction) +- execute plotting of important variables during training (loss, ROC) +- use tensorboard and related callbacks during training +- examine the results of a partners network +- critique the results of a partners network + +### Re-use existing network architectures with and without pre-trained weights + +Re-use of architectures is common in deep learning. Especially when using pre-trained weights (transfer-learning) it can also be very powerful. + +After this module, learners can ... + +- describe what transfer learning stands for +- explain in what situations transfer learning is beneficial +- solve common issues of transfer learning (such as different resolutions of the original training and the training at hand) +- test training under different data shape mitigation strategies +- relate training time of a de-novo network and a pretrained one +- relate prediction quality of a de-novo network and a pretrained one + diff --git a/fig/.gitkeep b/fig/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/fig/01-xor-exercise.svg b/fig/01-xor-exercise.svg new file mode 100644 index 00000000..31b132bb --- /dev/null +++ b/fig/01-xor-exercise.svg @@ -0,0 +1,560 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + X1 + X2 + h1 + h2 + y1 + b1 = 0ReLU + b2 = -1ReLU + b1 = 0ReLU + + + 1 + + + + 1 + + + + 1 + + + + 1 + + + + 1 + + + + -2 + + + diff --git a/fig/01_AI_ML_DL_differences.png b/fig/01_AI_ML_DL_differences.png new file mode 100644 index 00000000..7daeb8a5 Binary files /dev/null and b/fig/01_AI_ML_DL_differences.png differ diff --git a/fig/01_AI_ML_DL_differences.svg b/fig/01_AI_ML_DL_differences.svg new file mode 100644 index 00000000..915b4c8b Binary files /dev/null and b/fig/01_AI_ML_DL_differences.svg differ diff --git a/fig/01_deep_network.png b/fig/01_deep_network.png new file mode 100644 index 00000000..4ef50246 Binary files /dev/null and b/fig/01_deep_network.png differ diff --git a/fig/01_huber_loss.png b/fig/01_huber_loss.png new file mode 100644 index 00000000..8642ae44 Binary files /dev/null and b/fig/01_huber_loss.png differ diff --git a/fig/01_identity_function.svg b/fig/01_identity_function.svg new file mode 100644 index 00000000..4d8b2985 --- /dev/null +++ b/fig/01_identity_function.svg @@ -0,0 +1,97 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/01_neural_net.png b/fig/01_neural_net.png new file mode 100644 index 00000000..d1f93e07 Binary files /dev/null and b/fig/01_neural_net.png differ diff --git a/fig/01_neuron.png b/fig/01_neuron.png new file mode 100644 index 00000000..185a58b3 Binary files /dev/null and b/fig/01_neuron.png differ diff --git a/fig/01_relu.svg b/fig/01_relu.svg new file mode 100644 index 00000000..7116b4a5 --- /dev/null +++ b/fig/01_relu.svg @@ -0,0 +1,97 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/01_sigmoid.svg b/fig/01_sigmoid.svg new file mode 100644 index 00000000..a839e421 --- /dev/null +++ b/fig/01_sigmoid.svg @@ -0,0 +1,97 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/01_xor_exercise.png b/fig/01_xor_exercise.png new file mode 100644 index 00000000..bcc56a0b Binary files /dev/null and b/fig/01_xor_exercise.png differ diff --git a/fig/02_bad_training_history_1.png b/fig/02_bad_training_history_1.png new file mode 100644 index 00000000..3acf943a Binary files /dev/null and b/fig/02_bad_training_history_1.png differ diff --git a/fig/02_sex_pairplot.png b/fig/02_sex_pairplot.png new file mode 100644 index 00000000..18cf2ada Binary files /dev/null and b/fig/02_sex_pairplot.png differ diff --git a/fig/02_training_curve.png b/fig/02_training_curve.png new file mode 100644 index 00000000..6721254b Binary files /dev/null and b/fig/02_training_curve.png differ diff --git a/fig/03_exploration_basel_sunshine_graph.png b/fig/03_exploration_basel_sunshine_graph.png new file mode 100644 index 00000000..f2576836 Binary files /dev/null and b/fig/03_exploration_basel_sunshine_graph.png differ diff --git a/fig/03_gradient_descent.png b/fig/03_gradient_descent.png new file mode 100644 index 00000000..11957e5f Binary files /dev/null and b/fig/03_gradient_descent.png differ diff --git a/fig/03_monte_carlo_dropout_distribution_example.png b/fig/03_monte_carlo_dropout_distribution_example.png new file mode 100644 index 00000000..68fa4a5a Binary files /dev/null and b/fig/03_monte_carlo_dropout_distribution_example.png differ diff --git a/fig/03_regression_compare_training_test_5_dropout_batchnorm.png b/fig/03_regression_compare_training_test_5_dropout_batchnorm.png new file mode 100644 index 00000000..3b5e54f6 Binary files /dev/null and b/fig/03_regression_compare_training_test_5_dropout_batchnorm.png differ diff --git a/fig/03_regression_compare_training_test_naive_baseline.png b/fig/03_regression_compare_training_test_naive_baseline.png new file mode 100644 index 00000000..55a5528f Binary files /dev/null and b/fig/03_regression_compare_training_test_naive_baseline.png differ diff --git a/fig/03_regression_predictions_testset.png b/fig/03_regression_predictions_testset.png new file mode 100644 index 00000000..6a1e2d94 Binary files /dev/null and b/fig/03_regression_predictions_testset.png differ diff --git a/fig/03_regression_predictions_trainset.png b/fig/03_regression_predictions_trainset.png new file mode 100644 index 00000000..e141fdb0 Binary files /dev/null and b/fig/03_regression_predictions_trainset.png differ diff --git a/fig/03_regression_test_5_dropout_batchnorm.png b/fig/03_regression_test_5_dropout_batchnorm.png new file mode 100644 index 00000000..9bc4d7d6 Binary files /dev/null and b/fig/03_regression_test_5_dropout_batchnorm.png differ diff --git a/fig/03_regression_test_5_naive_baseline.png b/fig/03_regression_test_5_naive_baseline.png new file mode 100644 index 00000000..fea9ef75 Binary files /dev/null and b/fig/03_regression_test_5_naive_baseline.png differ diff --git a/fig/03_scatter_plot_basel_model.png b/fig/03_scatter_plot_basel_model.png new file mode 100644 index 00000000..f2f10fcc Binary files /dev/null and b/fig/03_scatter_plot_basel_model.png differ diff --git a/fig/03_scatter_plot_model_uncertainty.png b/fig/03_scatter_plot_model_uncertainty.png new file mode 100644 index 00000000..cf207367 Binary files /dev/null and b/fig/03_scatter_plot_model_uncertainty.png differ diff --git a/fig/03_tensorboard.png b/fig/03_tensorboard.png new file mode 100755 index 00000000..312137a6 Binary files /dev/null and b/fig/03_tensorboard.png differ diff --git a/fig/03_training_history_1_rmse.png b/fig/03_training_history_1_rmse.png new file mode 100644 index 00000000..fa6acb2e Binary files /dev/null and b/fig/03_training_history_1_rmse.png differ diff --git a/fig/03_training_history_2_rmse.png b/fig/03_training_history_2_rmse.png new file mode 100644 index 00000000..0e6becce Binary files /dev/null and b/fig/03_training_history_2_rmse.png differ diff --git a/fig/03_training_history_3_rmse_early_stopping.png b/fig/03_training_history_3_rmse_early_stopping.png new file mode 100644 index 00000000..eb8e92b7 Binary files /dev/null and b/fig/03_training_history_3_rmse_early_stopping.png differ diff --git a/fig/03_training_history_3_rmse_smaller_model.png b/fig/03_training_history_3_rmse_smaller_model.png new file mode 100644 index 00000000..e99e80c6 Binary files /dev/null and b/fig/03_training_history_3_rmse_smaller_model.png differ diff --git a/fig/03_training_history_4_rmse_dropout.png b/fig/03_training_history_4_rmse_dropout.png new file mode 100644 index 00000000..0041e6d6 Binary files /dev/null and b/fig/03_training_history_4_rmse_dropout.png differ diff --git a/fig/03_training_history_5_rmse_batchnorm.png b/fig/03_training_history_5_rmse_batchnorm.png new file mode 100644 index 00000000..95a3e78c Binary files /dev/null and b/fig/03_training_history_5_rmse_batchnorm.png differ diff --git a/fig/03_weather_prediction_dataset_map.png b/fig/03_weather_prediction_dataset_map.png new file mode 100644 index 00000000..ac0ef343 Binary files /dev/null and b/fig/03_weather_prediction_dataset_map.png differ diff --git a/fig/04_cifar10.png b/fig/04_cifar10.png new file mode 100644 index 00000000..c3a22621 Binary files /dev/null and b/fig/04_cifar10.png differ diff --git a/fig/04_conv_image.png b/fig/04_conv_image.png new file mode 100755 index 00000000..2f7cd2e4 Binary files /dev/null and b/fig/04_conv_image.png differ diff --git a/fig/04_conv_matrix.png b/fig/04_conv_matrix.png new file mode 100644 index 00000000..e3345647 Binary files /dev/null and b/fig/04_conv_matrix.png differ diff --git a/fig/04_dense_model_training_history.png b/fig/04_dense_model_training_history.png new file mode 100644 index 00000000..86c3545a Binary files /dev/null and b/fig/04_dense_model_training_history.png differ diff --git a/fig/04_training_history_1.png b/fig/04_training_history_1.png new file mode 100644 index 00000000..5aa32a3a Binary files /dev/null and b/fig/04_training_history_1.png differ diff --git a/fig/04_training_history_2.png b/fig/04_training_history_2.png new file mode 100644 index 00000000..893464b3 Binary files /dev/null and b/fig/04_training_history_2.png differ diff --git a/fig/04_training_history_3.png b/fig/04_training_history_3.png new file mode 100644 index 00000000..760b539f Binary files /dev/null and b/fig/04_training_history_3.png differ diff --git a/fig/04_training_history_loss_1.png b/fig/04_training_history_loss_1.png new file mode 100644 index 00000000..90bdc31a Binary files /dev/null and b/fig/04_training_history_loss_1.png differ diff --git a/fig/04_training_history_loss_2.png b/fig/04_training_history_loss_2.png new file mode 100644 index 00000000..1fb6c32e Binary files /dev/null and b/fig/04_training_history_loss_2.png differ diff --git a/fig/04_training_history_loss_3.png b/fig/04_training_history_loss_3.png new file mode 100644 index 00000000..e74e5c46 Binary files /dev/null and b/fig/04_training_history_loss_3.png differ diff --git a/fig/04_vary_dropout_rate.png b/fig/04_vary_dropout_rate.png new file mode 100644 index 00000000..b5b11308 Binary files /dev/null and b/fig/04_vary_dropout_rate.png differ diff --git a/fig/AI_ML_DL_bubble_square_draft.png b/fig/AI_ML_DL_bubble_square_draft.png new file mode 100644 index 00000000..2b8e3223 Binary files /dev/null and b/fig/AI_ML_DL_bubble_square_draft.png differ diff --git a/fig/ML_DL_draft.png b/fig/ML_DL_draft.png new file mode 100644 index 00000000..b12f534d Binary files /dev/null and b/fig/ML_DL_draft.png differ diff --git a/fig/confusion_matrix.png b/fig/confusion_matrix.png new file mode 100644 index 00000000..01766035 Binary files /dev/null and b/fig/confusion_matrix.png differ diff --git a/fig/culmen_depth.png b/fig/culmen_depth.png new file mode 100644 index 00000000..3fe2147f Binary files /dev/null and b/fig/culmen_depth.png differ diff --git a/fig/graphviz/Makefile b/fig/graphviz/Makefile new file mode 100644 index 00000000..7c8da39a --- /dev/null +++ b/fig/graphviz/Makefile @@ -0,0 +1,15 @@ +DOTFILES=$(wildcard *.dot) #for directed graphs +PDFOUTPUTS=$(DOTFILES:.dot=.pdf) +PNGOUTPUTS=$(DOTFILES:.dot=.png) +SVGOUTPUTS=$(DOTFILES:.dot=.svg) + +all : $(PDFOUTPUTS) $(SVGOUTPUTS) $(PNGOUTPUTS) + +%.pdf : %.dot + @dot -Tpdf -o$@ $< + +%.png : %.dot + @dot -Tpng -o$@ $< + +%.svg : %.dot + @dot -Tsvg -o$@ $< diff --git a/fig/graphviz/README.md b/fig/graphviz/README.md new file mode 100644 index 00000000..7c1c7de2 --- /dev/null +++ b/fig/graphviz/README.md @@ -0,0 +1,13 @@ +# Generating Infographics + +this folder contains the code to generate infographics. For the time being and lacking artistic talent, `graphviz` is used to render simple charts. For more information on graphviz, see [graphviz.org](https://graphviz.org/). + +# Building the charts + +I assume you have the `dot` utility available on your command line. If not, consider [installing it](https://graphviz.org/download/). To build all charts, do + +``` +$ make +``` + +This should produce 3 rendered versions of every chart: `png`, `svg` and `pdf`. I suggest to use `png` in the rendered website. diff --git a/fig/graphviz/pipeline.dot b/fig/graphviz/pipeline.dot new file mode 100644 index 00000000..31cdbac8 --- /dev/null +++ b/fig/graphviz/pipeline.dot @@ -0,0 +1,22 @@ +digraph { + #configs + rankdir=LR; + node [shape=rect, style=rounded] + + #special nodes + formulate [label=<Formulate
task>] + i_o [label=<Identify
inputs and outputs>] + prepare [label=<Prepare
data>] + create_model [label=<Create model
or
use pretrained model>] + loss [label=<Choose
loss and optimizer>] + train [label=<Train
the model>] + predict [label=<Perform
Prediction>] + quality [label=<Measure
Performance>] + refine [label=<Refine
the model>] + share [label=<Share
the model>] + + #the graph + formulate -> i_o -> prepare + prepare -> create_model -> loss + loss -> train -> predict -> quality -> refine -> share +} \ No newline at end of file diff --git a/fig/graphviz/pipeline.png b/fig/graphviz/pipeline.png new file mode 100644 index 00000000..2d199fa1 Binary files /dev/null and b/fig/graphviz/pipeline.png differ diff --git a/fig/neural_network_sketch_dropout.png b/fig/neural_network_sketch_dropout.png new file mode 100644 index 00000000..5b1a2694 Binary files /dev/null and b/fig/neural_network_sketch_dropout.png differ diff --git a/fig/pairplot.png b/fig/pairplot.png new file mode 100644 index 00000000..80793354 Binary files /dev/null and b/fig/pairplot.png differ diff --git a/fig/palmer_penguins.png b/fig/palmer_penguins.png new file mode 100644 index 00000000..736ae89b Binary files /dev/null and b/fig/palmer_penguins.png differ diff --git a/fig/plot_training.py b/fig/plot_training.py new file mode 100644 index 00000000..d5c8ea93 --- /dev/null +++ b/fig/plot_training.py @@ -0,0 +1,35 @@ +import seaborn +import pandas +import matplotlib.pyplot as plt +import matplotlib.ticker as ticker + + +def plot_graph(data, filename): + + res = seaborn.lineplot(x=data.Epoch, y=data.Loss) + + # stop the graph having extra whitespace around the edge + res.set(xlim=(data.Epoch.min()-1, data.Epoch.max())) + res.set(ylim=(0, data.Loss.max())) + + # uncomment these lines to get a logarithmic y axis + # this shows a lot more detail after the 250th epoch + + # res.set(yscale='log') + # res.set_yticklabels([1,1,10,100,1000]) + + res.yaxis.set_minor_locator(plt.NullLocator()) + + plt.grid() + plt.savefig(filename) + plt.show() + + +# load data +data = pandas.read_csv("training.csv") + +# draw graph of first 1500 epochs +plot_graph(data[0:1501], "training-0_to_1500.svg") + +# draw graph of the 500th to 1500th epoch +plot_graph(data[500:1501], "training-500_to_1500.svg") diff --git a/fig/training-0_to_1500.svg b/fig/training-0_to_1500.svg new file mode 100644 index 00000000..6f8e96ba --- /dev/null +++ b/fig/training-0_to_1500.svg @@ -0,0 +1,948 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/training-500_to_1500.svg b/fig/training-500_to_1500.svg new file mode 100644 index 00000000..533740ba --- /dev/null +++ b/fig/training-500_to_1500.svg @@ -0,0 +1,1516 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/fig/training.csv b/fig/training.csv new file mode 100644 index 00000000..eee614b3 --- /dev/null +++ b/fig/training.csv @@ -0,0 +1,2001 @@ +Epoch,Loss +1,1374.498291 +2,1374.805054 +3,1374.668823 +4,1376.282349 +5,1376.962524 +6,1378.372803 +7,1379.375610 +8,1378.825684 +9,1379.147705 +10,1379.481445 +11,1378.533203 +12,1379.392090 +13,1378.594849 +14,1379.686279 +15,1380.902100 +16,1380.133057 +17,1379.908203 +18,1378.894409 +19,1378.856201 +20,1378.561401 +21,1460.345215 +22,1531.905029 +23,1597.162354 +24,1656.811646 +25,1708.168579 +26,1754.009399 +27,1798.385010 +28,1836.449951 +29,1870.364746 +30,1903.006958 +31,1833.334229 +32,1768.718628 +33,1712.954102 +34,1661.302734 +35,1616.026123 +36,1574.836792 +37,1537.317383 +38,1506.763062 +39,1475.609253 +40,1448.720947 +41,1424.671143 +42,1403.087402 +43,1382.956787 +44,1364.006958 +45,1347.718628 +46,1332.555786 +47,1318.611206 +48,1305.488037 +49,1293.814331 +50,1282.539673 +51,1289.215698 +52,1294.881104 +53,1300.620483 +54,1305.744019 +55,1309.781494 +56,1312.948486 +57,1315.473633 +58,1317.665039 +59,1318.965942 +60,1318.611206 +61,1355.384766 +62,1387.143188 +63,1414.869385 +64,1438.412354 +65,1459.323853 +66,1476.125977 +67,1491.044678 +68,1503.160034 +69,1512.880981 +70,1520.581787 +71,1462.747925 +72,1411.716675 +73,1363.191528 +74,1319.301514 +75,1277.875977 +76,1239.381958 +77,1202.165405 +78,1169.285889 +79,1139.287476 +80,1112.086060 +81,1085.927124 +82,1061.319214 +83,1038.309326 +84,1016.530640 +85,996.860413 +86,977.170471 +87,958.045288 +88,941.321960 +89,925.165710 +90,908.458008 +91,940.979065 +92,968.551392 +93,992.615601 +94,1009.669312 +95,1022.747498 +96,1032.449097 +97,1037.604980 +98,1039.878906 +99,1038.505615 +100,1036.517578 +101,1010.776001 +102,982.999268 +103,957.574341 +104,931.642151 +105,904.730225 +106,878.916748 +107,853.350830 +108,827.332214 +109,801.553284 +110,777.908020 +111,725.151550 +112,678.500488 +113,634.799500 +114,594.414429 +115,557.280151 +116,523.269043 +117,492.824066 +118,463.720245 +119,438.861694 +120,413.826782 +121,406.923584 +122,398.885681 +123,389.183777 +124,379.633484 +125,371.040527 +126,360.820374 +127,353.005005 +128,343.402893 +129,334.365234 +130,324.525879 +131,310.246338 +132,297.056091 +133,284.712616 +134,272.226471 +135,260.921875 +136,250.045181 +137,240.440323 +138,229.806305 +139,221.394058 +140,211.641617 +141,203.307953 +142,195.761642 +143,188.398285 +144,181.727600 +145,175.404602 +146,168.508301 +147,162.311722 +148,156.486038 +149,151.007706 +150,145.782715 +151,148.845184 +152,150.394760 +153,151.334824 +154,151.562088 +155,151.577713 +156,150.764160 +157,148.436234 +158,146.192490 +159,142.953033 +160,140.763687 +161,132.389343 +162,126.169678 +163,119.689163 +164,113.739464 +165,108.036644 +166,102.654427 +167,98.097099 +168,93.557777 +169,89.649605 +170,86.555557 +171,84.920151 +172,82.867126 +173,80.393433 +174,78.172470 +175,75.905998 +176,74.408112 +177,72.823318 +178,71.623451 +179,69.293526 +180,67.664047 +181,65.182831 +182,62.583530 +183,59.996063 +184,58.605797 +185,56.872402 +186,55.016266 +187,53.933186 +188,52.421051 +189,51.249424 +190,49.597233 +191,49.273193 +192,48.781319 +193,48.301090 +194,49.033428 +195,47.941723 +196,47.098293 +197,46.633453 +198,45.901489 +199,45.571350 +200,45.533916 +201,45.565178 +202,45.683563 +203,45.461559 +204,45.168282 +205,45.111237 +206,44.044651 +207,43.978157 +208,43.725018 +209,43.144035 +210,41.979675 +211,42.007618 +212,41.466545 +213,41.982185 +214,41.063316 +215,40.155483 +216,39.896965 +217,39.382252 +218,39.303196 +219,38.649967 +220,38.536839 +221,38.244492 +222,37.246223 +223,36.053814 +224,35.310074 +225,34.988579 +226,34.562027 +227,34.199844 +228,34.522869 +229,33.269436 +230,33.323406 +231,32.996712 +232,32.919292 +233,32.786148 +234,33.068642 +235,32.630966 +236,32.926125 +237,32.503227 +238,31.890472 +239,31.966631 +240,31.944262 +241,31.370384 +242,31.172215 +243,31.302299 +244,31.550468 +245,31.465483 +246,31.123970 +247,31.395992 +248,31.340839 +249,31.019901 +250,31.514215 +251,31.015234 +252,30.762934 +253,30.081236 +254,29.601322 +255,29.733448 +256,29.822102 +257,29.065500 +258,29.671911 +259,29.711288 +260,29.910002 +261,29.196280 +262,29.059269 +263,29.335377 +264,28.987417 +265,28.560158 +266,28.371004 +267,28.169895 +268,27.838018 +269,27.923016 +270,27.899778 +271,28.601463 +272,28.446476 +273,28.403261 +274,29.167997 +275,29.186573 +276,28.940781 +277,29.294827 +278,28.672129 +279,28.630571 +280,28.466902 +281,28.614138 +282,28.484198 +283,28.527830 +284,28.698311 +285,28.835075 +286,28.122196 +287,28.012329 +288,27.585014 +289,27.815853 +290,28.000231 +291,27.899002 +292,28.199184 +293,28.145884 +294,27.672459 +295,28.129190 +296,27.768568 +297,26.962534 +298,26.699102 +299,26.796705 +300,26.661489 +301,26.118675 +302,25.868505 +303,25.762270 +304,25.425041 +305,25.074471 +306,25.134947 +307,25.044756 +308,25.086227 +309,25.326103 +310,24.872829 +311,24.546495 +312,24.227226 +313,24.879488 +314,25.009420 +315,24.515833 +316,24.201765 +317,23.530020 +318,23.720575 +319,23.949987 +320,24.110653 +321,24.582972 +322,24.414253 +323,24.689997 +324,24.450640 +325,24.618454 +326,24.974295 +327,25.141785 +328,24.892166 +329,25.133772 +330,24.902876 +331,24.505527 +332,24.309187 +333,24.138449 +334,23.985643 +335,23.640720 +336,23.424568 +337,23.152170 +338,22.839954 +339,22.674843 +340,22.684612 +341,22.798325 +342,22.942127 +343,23.167627 +344,23.062918 +345,22.740644 +346,22.266546 +347,22.022217 +348,21.440172 +349,21.533581 +350,21.026119 +351,21.059681 +352,21.446743 +353,21.783691 +354,22.000643 +355,21.863640 +356,21.890152 +357,21.908257 +358,21.873482 +359,21.750233 +360,21.374317 +361,21.818222 +362,21.195559 +363,21.110188 +364,21.406578 +365,21.415873 +366,21.386948 +367,20.961657 +368,21.093502 +369,20.686192 +370,20.474331 +371,20.120764 +372,20.181269 +373,20.006546 +374,19.907190 +375,19.947739 +376,19.985241 +377,20.405735 +378,20.091831 +379,20.206427 +380,20.534830 +381,20.196844 +382,19.944321 +383,20.103045 +384,19.742989 +385,19.369036 +386,19.259901 +387,19.370028 +388,18.926991 +389,18.723476 +390,18.661102 +391,18.509880 +392,18.457212 +393,18.778738 +394,18.719906 +395,18.732601 +396,18.959127 +397,19.234375 +398,19.257896 +399,18.876238 +400,19.153244 +401,18.942047 +402,18.925713 +403,18.684181 +404,18.624861 +405,18.885509 +406,18.873135 +407,19.095974 +408,18.923347 +409,18.683167 +410,18.723104 +411,18.278463 +412,18.222200 +413,18.186836 +414,18.199221 +415,18.154808 +416,17.729277 +417,17.693501 +418,17.835802 +419,17.662325 +420,17.316414 +421,17.222221 +422,17.046955 +423,16.680691 +424,16.754385 +425,16.566854 +426,16.261059 +427,16.342993 +428,16.447071 +429,16.500809 +430,16.650827 +431,16.538673 +432,16.418261 +433,16.862494 +434,16.675779 +435,16.643827 +436,16.413420 +437,16.188934 +438,15.986749 +439,16.008276 +440,16.001711 +441,15.961498 +442,15.789307 +443,15.804173 +444,15.719963 +445,15.532331 +446,15.370521 +447,15.135298 +448,15.197640 +449,15.363064 +450,15.217176 +451,15.353881 +452,15.116059 +453,15.057023 +454,15.091691 +455,14.822784 +456,14.712196 +457,14.348763 +458,14.515583 +459,14.353411 +460,14.402632 +461,14.336987 +462,14.453767 +463,14.213535 +464,14.259311 +465,14.433374 +466,14.425079 +467,14.215776 +468,14.217493 +469,14.104543 +470,13.997130 +471,13.739892 +472,13.868729 +473,13.715973 +474,13.610991 +475,13.492097 +476,13.828811 +477,13.746676 +478,14.079518 +479,14.109861 +480,14.141515 +481,13.926465 +482,13.951665 +483,13.878917 +484,14.116443 +485,14.062672 +486,13.719117 +487,13.720631 +488,13.893667 +489,13.717352 +490,13.755810 +491,13.698488 +492,13.641547 +493,13.446980 +494,13.262247 +495,13.076581 +496,13.048693 +497,13.076493 +498,12.875031 +499,12.988921 +500,12.916889 +501,12.828332 +502,13.013668 +503,12.749817 +504,12.527393 +505,12.860335 +506,12.749287 +507,12.527930 +508,12.423266 +509,12.365322 +510,12.185971 +511,12.240110 +512,12.062885 +513,12.057194 +514,11.957329 +515,11.859437 +516,11.972334 +517,11.773843 +518,11.477535 +519,11.386943 +520,11.439286 +521,11.175791 +522,11.194227 +523,11.186645 +524,11.133003 +525,11.101112 +526,11.094332 +527,11.029557 +528,10.806835 +529,10.880412 +530,10.681334 +531,10.711309 +532,10.803456 +533,10.559134 +534,10.732418 +535,10.770899 +536,10.865629 +537,11.012922 +538,10.910156 +539,11.021115 +540,10.842081 +541,10.836539 +542,10.751328 +543,10.750305 +544,10.732277 +545,10.539867 +546,10.435895 +547,10.272541 +548,10.235063 +549,10.264650 +550,10.264292 +551,10.578313 +552,10.626166 +553,10.744364 +554,10.836078 +555,10.649129 +556,10.546059 +557,10.538865 +558,10.621449 +559,10.650966 +560,10.504442 +561,10.492582 +562,10.241368 +563,10.300997 +564,10.334422 +565,10.454257 +566,10.320463 +567,10.048654 +568,10.088480 +569,10.241507 +570,10.157356 +571,10.232349 +572,10.134785 +573,9.946362 +574,10.125463 +575,10.066825 +576,9.871403 +577,9.754103 +578,10.073285 +579,9.994349 +580,9.978636 +581,9.909082 +582,10.040063 +583,10.102195 +584,9.966031 +585,10.105556 +586,10.087389 +587,9.968660 +588,9.682261 +589,9.829532 +590,9.895962 +591,10.093482 +592,10.028061 +593,9.856496 +594,9.800557 +595,9.657151 +596,9.692078 +597,9.631509 +598,9.623174 +599,9.518063 +600,9.290259 +601,9.301841 +602,9.342398 +603,9.332742 +604,9.257756 +605,9.342449 +606,9.174910 +607,9.133312 +608,9.200399 +609,9.135901 +610,9.060892 +611,9.130193 +612,9.116171 +613,9.266198 +614,9.202970 +615,9.226515 +616,9.215258 +617,9.096909 +618,8.941067 +619,8.975898 +620,9.026808 +621,8.872161 +622,8.921633 +623,8.877859 +624,8.978781 +625,8.875222 +626,8.860581 +627,8.990347 +628,8.886523 +629,8.962356 +630,8.910904 +631,9.012157 +632,9.041655 +633,9.155932 +634,9.200422 +635,9.144471 +636,9.130208 +637,8.966585 +638,8.791987 +639,8.775876 +640,8.748869 +641,8.894048 +642,8.764425 +643,8.579565 +644,8.526963 +645,8.797653 +646,8.896246 +647,8.790136 +648,8.907518 +649,8.929955 +650,8.801861 +651,8.745885 +652,8.633650 +653,8.749567 +654,8.724166 +655,8.740247 +656,8.781264 +657,8.730422 +658,8.633897 +659,8.767325 +660,8.599024 +661,8.609896 +662,8.529889 +663,8.315526 +664,8.347843 +665,8.249565 +666,8.194859 +667,8.084901 +668,8.051661 +669,8.011766 +670,7.784699 +671,7.894304 +672,8.118559 +673,8.120572 +674,8.341699 +675,8.185135 +676,8.204810 +677,8.240892 +678,8.284090 +679,8.195371 +680,8.064952 +681,8.003163 +682,8.047774 +683,8.174196 +684,8.185165 +685,8.074470 +686,7.970840 +687,7.857278 +688,7.856229 +689,7.998568 +690,7.972342 +691,7.872748 +692,7.766281 +693,7.688918 +694,7.828521 +695,7.798302 +696,7.843188 +697,7.774496 +698,7.691062 +699,7.839015 +700,7.735585 +701,7.594492 +702,7.711720 +703,7.637129 +704,7.521418 +705,7.583173 +706,7.597631 +707,7.658422 +708,7.547267 +709,7.584037 +710,7.608601 +711,7.662730 +712,7.694701 +713,7.640910 +714,7.732150 +715,7.725509 +716,7.837871 +717,7.791322 +718,7.776506 +719,7.659534 +720,7.677261 +721,7.631284 +722,7.426704 +723,7.421527 +724,7.357550 +725,7.413136 +726,7.291620 +727,7.315535 +728,7.345714 +729,7.279267 +730,7.222077 +731,7.252364 +732,7.108936 +733,7.043839 +734,7.123646 +735,7.072635 +736,7.127377 +737,7.174259 +738,7.306606 +739,7.297363 +740,7.314019 +741,7.431134 +742,7.531954 +743,7.692874 +744,7.807797 +745,7.871971 +746,7.908401 +747,7.791269 +748,7.814354 +749,7.899908 +750,7.770968 +751,7.980874 +752,7.972983 +753,8.003383 +754,7.995797 +755,7.977086 +756,7.918373 +757,7.937500 +758,8.044216 +759,8.059059 +760,7.859106 +761,7.944158 +762,7.747785 +763,7.699263 +764,7.669347 +765,7.675321 +766,7.617654 +767,7.705381 +768,7.662820 +769,7.613462 +770,7.493739 +771,7.467026 +772,7.550239 +773,7.440277 +774,7.362132 +775,7.387891 +776,7.309035 +777,7.334690 +778,7.627427 +779,7.735224 +780,7.402503 +781,7.342384 +782,7.450618 +783,7.340502 +784,7.541285 +785,7.440575 +786,7.388396 +787,7.459248 +788,7.557502 +789,7.541560 +790,7.405540 +791,7.676413 +792,7.596228 +793,7.685439 +794,7.743281 +795,7.789150 +796,7.889060 +797,7.854471 +798,8.017610 +799,8.036789 +800,7.802950 +801,7.666759 +802,7.673119 +803,7.554038 +804,7.530649 +805,7.546625 +806,7.494461 +807,7.636127 +808,7.706562 +809,7.712176 +810,7.536484 +811,7.447600 +812,7.378068 +813,7.310967 +814,7.197355 +815,7.088063 +816,6.946691 +817,7.025550 +818,6.993579 +819,7.034208 +820,7.050381 +821,6.935043 +822,6.944975 +823,6.965489 +824,6.839306 +825,6.697062 +826,6.705737 +827,6.769357 +828,6.679467 +829,6.588467 +830,6.492127 +831,6.473095 +832,6.631835 +833,6.677351 +834,6.579249 +835,6.634909 +836,6.555057 +837,6.650442 +838,6.822701 +839,6.809702 +840,6.745516 +841,6.799317 +842,7.090203 +843,6.996156 +844,6.967347 +845,7.015949 +846,6.967165 +847,6.920763 +848,7.018403 +849,6.893118 +850,6.943412 +851,6.928400 +852,6.830757 +853,6.818256 +854,6.772833 +855,6.739820 +856,6.757825 +857,6.605058 +858,6.466042 +859,6.620945 +860,6.460297 +861,6.489261 +862,6.413146 +863,6.516134 +864,6.486453 +865,6.551382 +866,6.771745 +867,6.850500 +868,6.932117 +869,6.863608 +870,6.824660 +871,6.842965 +872,6.873303 +873,6.874015 +874,6.739906 +875,6.770917 +876,6.766083 +877,6.739223 +878,6.664633 +879,6.771399 +880,6.674447 +881,6.577929 +882,6.413688 +883,6.498822 +884,6.415868 +885,6.426705 +886,6.437997 +887,6.499057 +888,6.570544 +889,6.640012 +890,6.707980 +891,6.880388 +892,6.954874 +893,6.799106 +894,6.710801 +895,6.774893 +896,6.823499 +897,6.750772 +898,6.802409 +899,6.779453 +900,7.155225 +901,7.009870 +902,6.919036 +903,6.763488 +904,6.835303 +905,6.814762 +906,6.812494 +907,6.942641 +908,7.004689 +909,6.810939 +910,6.866879 +911,6.795392 +912,6.707561 +913,6.812883 +914,6.826987 +915,6.651381 +916,6.795594 +917,6.904506 +918,6.792099 +919,6.747748 +920,6.623670 +921,6.624151 +922,6.660959 +923,6.562440 +924,6.549587 +925,6.674745 +926,6.742172 +927,6.778629 +928,6.978299 +929,6.873043 +930,6.996594 +931,6.943276 +932,6.844021 +933,6.883629 +934,6.812976 +935,6.784673 +936,6.698911 +937,6.641959 +938,6.726078 +939,6.704654 +940,6.665960 +941,6.571610 +942,6.576989 +943,6.732165 +944,6.614737 +945,6.684222 +946,6.553089 +947,6.561264 +948,6.506638 +949,6.497371 +950,6.433681 +951,6.389149 +952,6.484454 +953,6.415309 +954,6.592763 +955,6.670002 +956,6.667997 +957,6.662915 +958,6.646025 +959,6.648377 +960,6.557814 +961,6.502997 +962,6.427775 +963,6.434495 +964,6.417837 +965,6.541040 +966,6.413414 +967,6.256880 +968,6.185068 +969,6.187174 +970,6.194600 +971,6.270979 +972,6.209398 +973,6.163664 +974,6.277663 +975,6.235686 +976,6.246117 +977,6.285762 +978,6.337676 +979,6.364928 +980,6.368982 +981,6.231222 +982,6.367335 +983,6.422050 +984,6.285488 +985,6.456542 +986,6.376166 +987,6.356477 +988,6.422524 +989,6.484308 +990,6.435282 +991,6.544049 +992,6.518953 +993,6.521383 +994,6.710225 +995,6.749399 +996,6.801138 +997,6.642703 +998,6.562694 +999,6.582812 +1000,6.442608 +1001,6.439344 +1002,6.435999 +1003,6.332999 +1004,6.354021 +1005,6.220318 +1006,6.237721 +1007,6.181149 +1008,6.251740 +1009,6.234731 +1010,6.107778 +1011,6.376264 +1012,6.379924 +1013,6.422470 +1014,6.495723 +1015,6.525430 +1016,6.576216 +1017,6.677105 +1018,6.811397 +1019,6.821353 +1020,6.730720 +1021,6.603624 +1022,6.521520 +1023,6.606088 +1024,6.567419 +1025,6.543751 +1026,6.376338 +1027,6.393572 +1028,6.474055 +1029,6.454451 +1030,6.472912 +1031,6.306080 +1032,6.304145 +1033,6.505245 +1034,6.552995 +1035,6.560799 +1036,6.465525 +1037,6.472528 +1038,6.480180 +1039,6.470748 +1040,6.592838 +1041,6.570863 +1042,6.515164 +1043,6.437633 +1044,6.420989 +1045,6.399369 +1046,6.347955 +1047,6.490945 +1048,6.363403 +1049,6.383363 +1050,6.330942 +1051,6.201466 +1052,6.147924 +1053,6.101267 +1054,6.157115 +1055,6.126856 +1056,6.119923 +1057,6.170587 +1058,6.376727 +1059,6.413621 +1060,6.438974 +1061,6.442010 +1062,6.557923 +1063,6.450252 +1064,6.316498 +1065,6.348899 +1066,6.313694 +1067,6.322994 +1068,6.226937 +1069,6.248522 +1070,6.260513 +1071,6.354286 +1072,6.349842 +1073,6.360824 +1074,6.268047 +1075,6.223223 +1076,6.232430 +1077,6.213148 +1078,6.180077 +1079,6.157039 +1080,6.229326 +1081,6.350971 +1082,6.530764 +1083,6.633931 +1084,6.619654 +1085,6.630130 +1086,6.684445 +1087,6.663589 +1088,6.558344 +1089,6.497303 +1090,6.605778 +1091,6.596481 +1092,6.594203 +1093,6.477692 +1094,6.724203 +1095,6.831979 +1096,6.865751 +1097,6.935800 +1098,6.846954 +1099,6.911072 +1100,7.010683 +1101,6.989880 +1102,7.105876 +1103,7.132953 +1104,6.999532 +1105,6.918442 +1106,6.949057 +1107,6.903245 +1108,6.931241 +1109,6.774357 +1110,6.799541 +1111,6.766072 +1112,6.682360 +1113,6.644166 +1114,6.727942 +1115,6.685715 +1116,6.643539 +1117,6.603801 +1118,6.532522 +1119,6.397010 +1120,6.334980 +1121,6.251853 +1122,6.327714 +1123,6.277497 +1124,6.236387 +1125,6.310122 +1126,6.248084 +1127,6.163797 +1128,6.112802 +1129,5.963519 +1130,6.095540 +1131,6.142482 +1132,6.216326 +1133,6.188474 +1134,6.155334 +1135,6.159892 +1136,6.003009 +1137,5.929615 +1138,5.939716 +1139,5.970450 +1140,6.045334 +1141,6.142622 +1142,6.125344 +1143,6.170360 +1144,6.266078 +1145,6.335866 +1146,6.345912 +1147,6.354756 +1148,6.357787 +1149,6.384558 +1150,6.359938 +1151,6.374567 +1152,6.459209 +1153,6.459041 +1154,6.531491 +1155,6.596208 +1156,6.542371 +1157,6.656836 +1158,6.747847 +1159,6.711862 +1160,6.780706 +1161,6.704702 +1162,6.810393 +1163,6.791195 +1164,6.718200 +1165,6.588590 +1166,6.478579 +1167,6.532331 +1168,6.526577 +1169,6.527705 +1170,6.451031 +1171,6.387073 +1172,6.401379 +1173,6.114200 +1174,6.094125 +1175,6.138361 +1176,6.073714 +1177,6.050356 +1178,6.047432 +1179,6.072165 +1180,6.169246 +1181,6.250165 +1182,6.145898 +1183,6.213829 +1184,6.241041 +1185,6.306028 +1186,6.360470 +1187,6.205961 +1188,6.057460 +1189,6.098699 +1190,6.050977 +1191,5.988426 +1192,5.859860 +1193,5.831581 +1194,5.808954 +1195,5.807893 +1196,5.727350 +1197,5.649493 +1198,5.808966 +1199,5.789200 +1200,5.876521 +1201,5.839233 +1202,5.776556 +1203,5.816639 +1204,5.841223 +1205,5.684397 +1206,5.952810 +1207,5.943608 +1208,5.892297 +1209,6.046802 +1210,5.926857 +1211,5.890026 +1212,5.923345 +1213,5.817206 +1214,5.695252 +1215,5.797610 +1216,5.738883 +1217,5.802722 +1218,5.837785 +1219,5.811055 +1220,5.816777 +1221,5.798779 +1222,5.985903 +1223,6.051119 +1224,5.824876 +1225,5.932167 +1226,5.926841 +1227,6.033635 +1228,5.922408 +1229,5.931894 +1230,5.873885 +1231,6.024476 +1232,6.028495 +1233,5.958850 +1234,5.868358 +1235,5.771854 +1236,5.826481 +1237,5.922488 +1238,6.009573 +1239,6.161847 +1240,6.133149 +1241,6.146017 +1242,6.151646 +1243,6.187539 +1244,6.072775 +1245,6.180627 +1246,6.306909 +1247,6.407207 +1248,6.364841 +1249,6.387785 +1250,6.393055 +1251,6.401000 +1252,6.334511 +1253,6.385494 +1254,6.442705 +1255,6.482635 +1256,6.555256 +1257,6.543581 +1258,6.542776 +1259,6.504472 +1260,6.334089 +1261,6.425186 +1262,6.377357 +1263,6.227343 +1264,6.127888 +1265,6.185518 +1266,6.271831 +1267,6.204370 +1268,6.159753 +1269,6.109406 +1270,6.108151 +1271,6.146109 +1272,6.099982 +1273,6.206445 +1274,6.176697 +1275,6.018995 +1276,6.071810 +1277,6.011763 +1278,5.955041 +1279,6.001135 +1280,5.914068 +1281,5.837190 +1282,5.904474 +1283,6.004179 +1284,5.973534 +1285,5.907937 +1286,5.821018 +1287,5.827989 +1288,5.857507 +1289,5.809466 +1290,5.783323 +1291,5.794741 +1292,5.864555 +1293,6.002592 +1294,6.188207 +1295,5.979492 +1296,6.030072 +1297,6.047961 +1298,6.096779 +1299,6.279390 +1300,6.305977 +1301,6.194992 +1302,6.234148 +1303,6.166954 +1304,6.250664 +1305,6.220945 +1306,6.373837 +1307,6.369489 +1308,6.380820 +1309,6.463583 +1310,6.406712 +1311,6.326350 +1312,6.356907 +1313,6.291983 +1314,6.172676 +1315,6.202286 +1316,6.201359 +1317,6.193679 +1318,6.195468 +1319,6.080903 +1320,5.994471 +1321,6.129896 +1322,5.957391 +1323,5.884926 +1324,5.770946 +1325,5.791914 +1326,5.914927 +1327,5.879824 +1328,5.767657 +1329,5.902514 +1330,6.015096 +1331,6.009869 +1332,6.104529 +1333,6.095973 +1334,6.080962 +1335,6.096925 +1336,6.135934 +1337,6.285238 +1338,6.280352 +1339,6.151881 +1340,6.119306 +1341,6.003889 +1342,5.875171 +1343,5.877693 +1344,5.907688 +1345,5.880562 +1346,6.054270 +1347,6.033473 +1348,5.996729 +1349,6.127472 +1350,6.187172 +1351,6.238533 +1352,6.347714 +1353,6.222658 +1354,6.105702 +1355,6.122635 +1356,6.017705 +1357,6.075583 +1358,6.156390 +1359,6.140584 +1360,6.063806 +1361,5.982214 +1362,6.000641 +1363,5.896500 +1364,5.997882 +1365,5.955853 +1366,5.873896 +1367,5.867899 +1368,5.840592 +1369,5.899236 +1370,5.929643 +1371,6.061865 +1372,6.006273 +1373,6.030555 +1374,6.031890 +1375,5.909042 +1376,6.015301 +1377,6.035360 +1378,6.031718 +1379,6.030582 +1380,6.059337 +1381,6.092334 +1382,6.009163 +1383,6.037365 +1384,5.998651 +1385,6.022525 +1386,5.913095 +1387,5.897440 +1388,5.900122 +1389,5.837917 +1390,5.722423 +1391,5.911746 +1392,5.930686 +1393,5.946970 +1394,6.038640 +1395,6.065026 +1396,6.052888 +1397,6.206179 +1398,6.130066 +1399,6.182829 +1400,6.223921 +1401,6.196719 +1402,6.225604 +1403,6.136237 +1404,6.122355 +1405,6.008796 +1406,5.916007 +1407,5.966097 +1408,5.960837 +1409,5.984749 +1410,5.964914 +1411,6.123602 +1412,6.121427 +1413,6.189229 +1414,6.177895 +1415,6.422249 +1416,6.492997 +1417,6.429866 +1418,6.421455 +1419,6.393493 +1420,6.403382 +1421,6.381194 +1422,6.176427 +1423,6.135464 +1424,6.204777 +1425,6.014576 +1426,6.121116 +1427,6.121506 +1428,6.077289 +1429,5.987414 +1430,5.997005 +1431,6.065321 +1432,5.986619 +1433,5.986955 +1434,5.979443 +1435,6.097420 +1436,6.184395 +1437,6.184289 +1438,6.229539 +1439,6.108100 +1440,6.188603 +1441,6.127478 +1442,6.001185 +1443,5.980703 +1444,5.942537 +1445,5.853700 +1446,5.778754 +1447,5.790628 +1448,5.836023 +1449,5.752117 +1450,5.880812 +1451,5.771803 +1452,5.743523 +1453,5.729163 +1454,5.840360 +1455,5.805879 +1456,5.692481 +1457,5.821489 +1458,5.723386 +1459,5.676526 +1460,5.620245 +1461,5.761251 +1462,5.778191 +1463,5.996655 +1464,6.008436 +1465,5.976694 +1466,5.900143 +1467,6.016748 +1468,6.047198 +1469,5.977229 +1470,6.017324 +1471,5.868594 +1472,5.878687 +1473,5.774144 +1474,5.701383 +1475,5.723128 +1476,5.838493 +1477,5.784944 +1478,5.862370 +1479,5.904051 +1480,5.930784 +1481,5.952882 +1482,5.963614 +1483,5.836658 +1484,5.684737 +1485,5.687573 +1486,5.753400 +1487,5.725806 +1488,5.560956 +1489,5.530530 +1490,5.496478 +1491,5.488739 +1492,5.429164 +1493,5.370187 +1494,5.329331 +1495,5.342934 +1496,5.535989 +1497,5.365108 +1498,5.418020 +1499,5.446450 +1500,5.441685 +1501,5.362073 +1502,5.313814 +1503,5.427585 +1504,5.362589 +1505,5.290816 +1506,5.246617 +1507,5.189040 +1508,5.280823 +1509,5.368432 +1510,5.338021 +1511,5.453434 +1512,5.485481 +1513,5.674140 +1514,5.717573 +1515,5.905109 +1516,5.858102 +1517,5.682548 +1518,5.617911 +1519,5.534393 +1520,5.569960 +1521,5.716780 +1522,5.716593 +1523,5.934406 +1524,6.144421 +1525,6.097580 +1526,6.098997 +1527,6.127030 +1528,6.097169 +1529,6.174995 +1530,6.169226 +1531,6.114409 +1532,5.933978 +1533,5.938186 +1534,5.903234 +1535,5.849997 +1536,5.991848 +1537,5.935248 +1538,5.881074 +1539,5.882042 +1540,5.785933 +1541,5.878244 +1542,5.950929 +1543,5.859207 +1544,5.757756 +1545,5.694890 +1546,5.697054 +1547,5.575569 +1548,5.696601 +1549,5.664536 +1550,5.683178 +1551,5.675872 +1552,5.722741 +1553,5.824469 +1554,5.884604 +1555,5.985712 +1556,5.875546 +1557,5.912907 +1558,5.988892 +1559,6.111245 +1560,6.270424 +1561,6.224355 +1562,6.269248 +1563,6.240141 +1564,6.311470 +1565,6.260812 +1566,6.226641 +1567,6.228118 +1568,6.250703 +1569,6.168064 +1570,6.070696 +1571,5.988645 +1572,5.914649 +1573,5.837356 +1574,5.782128 +1575,5.778143 +1576,5.871176 +1577,5.914878 +1578,5.814021 +1579,5.838972 +1580,5.838047 +1581,5.765857 +1582,5.637176 +1583,5.736560 +1584,5.744198 +1585,5.738245 +1586,5.710307 +1587,5.723628 +1588,5.752468 +1589,5.710629 +1590,5.684230 +1591,5.846174 +1592,5.820045 +1593,5.841832 +1594,5.921510 +1595,6.043101 +1596,6.052030 +1597,6.285580 +1598,6.221522 +1599,6.166491 +1600,6.193491 +1601,6.170745 +1602,6.070088 +1603,6.014682 +1604,6.172933 +1605,6.270771 +1606,6.249138 +1607,6.120873 +1608,6.132311 +1609,6.272882 +1610,6.207709 +1611,6.186254 +1612,6.115885 +1613,6.112662 +1614,6.116177 +1615,6.044075 +1616,6.134187 +1617,6.139128 +1618,6.028507 +1619,6.084663 +1620,6.082244 +1621,6.179062 +1622,6.018664 +1623,5.873857 +1624,5.839235 +1625,5.895005 +1626,6.121657 +1627,6.086910 +1628,5.978684 +1629,5.972018 +1630,5.977770 +1631,6.127084 +1632,6.039453 +1633,6.023767 +1634,5.913238 +1635,5.817300 +1636,5.727566 +1637,5.633830 +1638,5.577467 +1639,5.428782 +1640,5.368309 +1641,5.401795 +1642,5.470629 +1643,5.461559 +1644,5.374981 +1645,5.298627 +1646,5.317070 +1647,5.349905 +1648,5.312775 +1649,5.296228 +1650,5.308448 +1651,5.413348 +1652,5.378646 +1653,5.462132 +1654,5.404710 +1655,5.605974 +1656,5.662704 +1657,5.626168 +1658,5.626152 +1659,5.763885 +1660,5.711644 +1661,5.681952 +1662,5.506419 +1663,5.550465 +1664,5.571020 +1665,5.635105 +1666,5.525033 +1667,5.453365 +1668,5.523139 +1669,5.392005 +1670,5.364464 +1671,5.430301 +1672,5.413244 +1673,5.453144 +1674,5.475481 +1675,5.571454 +1676,5.374273 +1677,5.371068 +1678,5.397971 +1679,5.389449 +1680,5.411558 +1681,5.305213 +1682,5.345002 +1683,5.405515 +1684,5.394529 +1685,5.348305 +1686,5.420692 +1687,5.382881 +1688,5.404040 +1689,5.547363 +1690,5.483240 +1691,5.295712 +1692,5.229867 +1693,5.209170 +1694,5.164886 +1695,5.180471 +1696,5.174586 +1697,5.121049 +1698,5.271905 +1699,5.248629 +1700,5.237916 +1701,5.223523 +1702,5.345480 +1703,5.402226 +1704,5.401584 +1705,5.363302 +1706,5.369137 +1707,5.268664 +1708,5.313021 +1709,5.385558 +1710,5.400934 +1711,5.379242 +1712,5.526114 +1713,5.544674 +1714,5.438145 +1715,5.464215 +1716,5.435156 +1717,5.333475 +1718,5.314496 +1719,5.358776 +1720,5.381500 +1721,5.341889 +1722,5.360727 +1723,5.356915 +1724,5.302460 +1725,5.282283 +1726,5.274886 +1727,5.331388 +1728,5.328461 +1729,5.343682 +1730,5.366178 +1731,5.340579 +1732,5.297452 +1733,5.393939 +1734,5.416774 +1735,5.435394 +1736,5.478847 +1737,5.509949 +1738,5.585136 +1739,5.612142 +1740,5.588289 +1741,5.524858 +1742,5.507807 +1743,5.438298 +1744,5.415031 +1745,5.435163 +1746,5.397722 +1747,5.573534 +1748,5.557884 +1749,5.549714 +1750,5.633351 +1751,5.615154 +1752,5.570358 +1753,5.564402 +1754,5.417658 +1755,5.478490 +1756,5.515805 +1757,5.525962 +1758,5.572886 +1759,5.520725 +1760,5.458918 +1761,5.463400 +1762,5.601523 +1763,5.473835 +1764,5.522640 +1765,5.506366 +1766,5.601785 +1767,5.656073 +1768,5.621727 +1769,5.599124 +1770,5.593485 +1771,5.523515 +1772,5.676262 +1773,5.645109 +1774,5.728371 +1775,5.637874 +1776,5.659000 +1777,5.631021 +1778,5.579397 +1779,5.500447 +1780,5.497042 +1781,5.403808 +1782,5.309447 +1783,5.301871 +1784,5.420419 +1785,5.345658 +1786,5.285506 +1787,5.351684 +1788,5.305759 +1789,5.261954 +1790,5.320803 +1791,5.451636 +1792,5.465768 +1793,5.397100 +1794,5.436419 +1795,5.428020 +1796,5.491534 +1797,5.449684 +1798,5.496794 +1799,5.497958 +1800,5.369090 +1801,5.422168 +1802,5.365843 +1803,5.545600 +1804,5.555302 +1805,5.560935 +1806,5.493273 +1807,5.467402 +1808,5.500151 +1809,5.546596 +1810,5.600263 +1811,5.624579 +1812,5.579428 +1813,5.579465 +1814,5.568617 +1815,5.600099 +1816,5.605785 +1817,5.657260 +1818,5.705598 +1819,5.661956 +1820,5.743810 +1821,5.659265 +1822,5.750033 +1823,5.867268 +1824,5.960683 +1825,6.022107 +1826,6.071951 +1827,5.992725 +1828,5.947396 +1829,6.013046 +1830,5.893191 +1831,5.957982 +1832,5.971041 +1833,5.910211 +1834,5.924244 +1835,5.878286 +1836,5.887827 +1837,5.833025 +1838,5.814375 +1839,5.652466 +1840,5.638757 +1841,5.631778 +1842,5.599031 +1843,5.486864 +1844,5.390465 +1845,5.356829 +1846,5.458804 +1847,5.433914 +1848,5.345696 +1849,5.414965 +1850,5.491844 +1851,5.466870 +1852,5.518402 +1853,5.556911 +1854,5.643837 +1855,5.855062 +1856,6.010836 +1857,6.020306 +1858,6.025012 +1859,6.061854 +1860,6.118640 +1861,6.056462 +1862,6.029673 +1863,5.956877 +1864,5.948664 +1865,6.012575 +1866,5.845860 +1867,5.844109 +1868,5.801373 +1869,5.693962 +1870,5.574132 +1871,5.515520 +1872,5.527696 +1873,5.507592 +1874,5.478223 +1875,5.480880 +1876,5.473550 +1877,5.486089 +1878,5.434454 +1879,5.416177 +1880,5.423130 +1881,5.472458 +1882,5.416166 +1883,5.404234 +1884,5.335645 +1885,5.422615 +1886,5.327505 +1887,5.247088 +1888,5.218171 +1889,5.195150 +1890,5.152994 +1891,5.091176 +1892,5.068766 +1893,5.192358 +1894,5.269700 +1895,5.336396 +1896,5.303112 +1897,5.375825 +1898,5.336978 +1899,5.440629 +1900,5.335108 +1901,5.311260 +1902,5.510517 +1903,5.484244 +1904,5.458576 +1905,5.564707 +1906,5.502676 +1907,5.455092 +1908,5.372478 +1909,5.366843 +1910,5.284214 +1911,5.273167 +1912,5.379169 +1913,5.295874 +1914,5.307658 +1915,5.214939 +1916,5.328482 +1917,5.339113 +1918,5.376632 +1919,5.403990 +1920,5.476057 +1921,5.574432 +1922,5.602716 +1923,5.534964 +1924,5.610819 +1925,5.509564 +1926,5.587098 +1927,5.512258 +1928,5.464959 +1929,5.494195 +1930,5.539137 +1931,5.597445 +1932,5.734981 +1933,5.684690 +1934,5.599648 +1935,5.545751 +1936,5.484200 +1937,5.357862 +1938,5.500627 +1939,5.630598 +1940,5.810221 +1941,5.608714 +1942,5.687624 +1943,5.724624 +1944,5.830440 +1945,5.849516 +1946,5.817165 +1947,5.821496 +1948,5.819975 +1949,5.974882 +1950,5.980151 +1951,5.852273 +1952,5.928349 +1953,5.938927 +1954,5.816403 +1955,5.778008 +1956,5.704109 +1957,5.729092 +1958,5.810228 +1959,5.821467 +1960,5.915904 +1961,5.895460 +1962,5.841891 +1963,5.833857 +1964,5.892910 +1965,5.905058 +1966,6.017112 +1967,5.923193 +1968,5.924405 +1969,5.868347 +1970,5.823358 +1971,5.794363 +1972,5.781208 +1973,5.794843 +1974,5.912558 +1975,5.917296 +1976,5.899738 +1977,5.794760 +1978,5.654480 +1979,5.643144 +1980,5.823927 +1981,5.864416 +1982,5.859811 +1983,5.916952 +1984,5.951710 +1985,5.946963 +1986,5.819739 +1987,5.865587 +1988,5.724421 +1989,5.609321 +1990,5.449010 +1991,5.480712 +1992,5.479604 +1993,5.567510 +1994,5.653727 +1995,5.694788 +1996,5.695266 +1997,5.748044 +1998,5.859313 +1999,5.800744 +2000,5.792133 diff --git a/fig/training_curve.png b/fig/training_curve.png new file mode 100644 index 00000000..d5070d59 Binary files /dev/null and b/fig/training_curve.png differ diff --git a/index.md b/index.md new file mode 100644 index 00000000..458d91af --- /dev/null +++ b/index.md @@ -0,0 +1,35 @@ +--- +site: sandpaper::sandpaper_site +--- + +This is a hands-on introduction to the first steps in Deep Learning, intended for researchers who are familiar with (non-deep) Machine Learning. + +The use of Deep Learning has seen a sharp increase of popularity and applicability over the last decade. While Deep Learning can be a useful tool for researchers from a wide range of domains, taking the first steps in the world of Deep Learning can be somewhat intimidating. This introduction aims to cover the basics of Deep Learning in a practical and hands-on manner, so that upon completion, you will be able to train your first neural network and understand what next steps to take to improve the model. + +We start with explaining the basic concepts of neural networks, and then go through the different steps of a Deep Learning workflow. Learners will learn how to prepare data for deep learning, how to implement a basic Deep Learning model in Python with Keras, how to monitor and troubleshoot the training process and how to implement different layer types such as convolutional layers. + +:::::::::::::::::: checklist + +## Prerequisites +Learners are expected to have the following knowledge: + +- [x] Basic Python programming skills and familiarity with the Pandas package. +- [x] Basic knowledge on Machine learning, including the following concepts: Data cleaning, train & test split, type of problems (regression, classification), overfitting & underfitting, metrics (accuracy, recall, etc.). + +:::::::::::::::::::::::::::: + +::: instructor + +## Looking for Beta Testers! +**We are currently looking for volunteers to test this lesson!** +If you would like to teach this lesson in a pilot workshop, +please let the lesson developers know by +[opening a new issue on the lesson repository](https://github.com/carpentries-incubator/deep-learning-intro/issues/new) +or posting to the [`#machine_learning` Slack channel](https://swcarpentry.slack.com/archives/CKLUYLY2F) +on [The Carpentries Slack](https://swc-slack-invite.herokuapp.com/). +We would love to help you prepare to teach the lesson and +receive feedback on how it could be further improved, +based on your experience in the workshop. + +::: + diff --git a/instructor-notes.md b/instructor-notes.md new file mode 100644 index 00000000..6013d030 --- /dev/null +++ b/instructor-notes.md @@ -0,0 +1,36 @@ +--- +title: Instructor Notes +--- + +## Setup before the lesson +The required python packages for this lesson often result in installation issues, +so it is advisable to organize a pre-workshop setup session where learners can show their installation and get help with problems. + +Installations on learners' devices have the advantage of lowering the threshold to continue with the material beyond the workshop. Note though, that this lesson can also be taught on a cloud environment such as [Google colab](https://colab.research.google.com/) or [My Binder](https://github.com/carpentries/scaffolds/blob/master/instructions/workshop-coordination.md#my-binder). This can serve as a backup environment if local installations fail. Some cloud environments offer the possibility to run the code on a GPU, which significantly increases the runtime of deep learning code. + +## Deep learning workflow +The episodes are quite long, because they cover a full cycle of the deep learning workflow. It really helps to structure your teaching by making it clear where in the 10-step deep learning workflow we are. You can for example use headers in your notebook for each of the steps in the workflow. + +## Episode 3: Monitor the training process +When episode 3 is taught on a different day then episode 2, it is very useful to start with a recap of episode 2. The Key Points of episode 2 can be iterated, and you can go through the code of the previous session (without actually running it). This will help learners in the big exercise on creating a neural network. + +If learners did not download the data yet, they can also load the data directly from zenodo (instead of first downloading and saving): +```python +data = pd.read_csv("https://zenodo.org/record/5071376/files/weather_prediction_dataset_light.csv?download=1") +``` + +The following exercises work well to do in groups / break-out rooms: +- Split data into training, validation, and test set +- Create the neural network. Note that this is a fairly challenging exercise, but learners should be able to do this based on their experiences in episode 2 (see also remark about recap). +- Predict the labels for both training and test set and compare to the true values +- Try to reduce the degree of overfitting by lowering the number of parameters +- Create a similar scatter plot for a reasonable baseline +- Open question: What could be next steps to further improve the model? +All other exercises are small and can be done individually. + +## Presentation slides +There are no official presentation slides for this workshop, but this material does include some example +slides from when this course was taught by different institutions. These slides can be found in +the +[slides](https://github.com/carpentries-incubator/deep-learning-intro/tree/main/instructors/slides) +folder. diff --git a/learner-profiles.md b/learner-profiles.md new file mode 100644 index 00000000..a17ab03d --- /dev/null +++ b/learner-profiles.md @@ -0,0 +1,19 @@ +--- +title: Learner Profiles +--- + +#### Ann from Meteorology + +Ann has collected 2-3 GB of structured image data from several autonomous microscope on baloon expeditions into the atmostphere within her PhD programme. Each image has a time stamp to it which can be related to the height of the baloon at this point and the current weather conditions. The images are unstructured and she would like to detect from the images if the baloon traversed a cloud or not. She has tried to do that with standard image processing methods, but the image artifacts to descriminate are somewhat diverse. Ann has used machine learning on tabular data before and would like to use Deep Learning for the images at hand. She saw collaborators in another lab do that and would like to pick up this skill. + +#### Barbara from Material Science + +Barbara just started her PostDoc in Material Science. Her new group has a large amount of scanning electron miscroscope images stored which exhibit several metals when exposed to a plasma. The team also made the effort to highlight solid deposits in these images and thus obtained 20,000 images with such annotations. Barbara performed some image analysis before and hence has the feeling that Deep Learning may help her in this task. She saw her labmates use ML algorithms for this and is motivated to finally understand these approaches. + +#### Dan from Life Sciences + +Dan produced a large population of bacteria that were subject to genetic alterations resulting in 10 different phenotypes. The latter can be identified by different colors, shapes and movement speed under a fluorescence microscope. Dan has not a lot of experience with image processing techniques to segment these different objects, but used GUI based tools like [fiji](https://fiji.sc) and others. He has recorded 50-60 movies of 30 minutes each. 10 of these movies have been produced with one type of phenotype only. Dan doesn't consider himself a strong coder, but would need to identify bacteria of the phenotypes in the dataset. He is interested to learn if Deep Learning can help. + +#### Eric from Pediatrics Science + +Eric ran a large array of clinical trials in his hospital to improve children pharmaceutics for treating a common (non-lethal) virus. He obtained a table that lists the progression of the treatment for each patient, the dose of the drug given, whether the patient was in the placebo group or not, etc. As the table has more than 100 000 rows, Eric is certain that he can use ML to cluster the rows in one column where the data taking was inconsistent. Eric has touched coding here and there where necessary, but never saw it necessary to learn coding. His cheatsheet is his core wisdom with code. So his supervisor invited him to take a course on ML as "this is the tech of these days!" as his boss said. diff --git a/links.md b/links.md new file mode 100644 index 00000000..4c5cd2f9 --- /dev/null +++ b/links.md @@ -0,0 +1,10 @@ + + +[pandoc]: https://pandoc.org/MANUAL.html +[r-markdown]: https://rmarkdown.rstudio.com/ +[rstudio]: https://www.rstudio.com/ +[carpentries-workbench]: https://carpentries.github.io/sandpaper-docs/ + diff --git a/md5sum.txt b/md5sum.txt new file mode 100644 index 00000000..b4707ae6 --- /dev/null +++ b/md5sum.txt @@ -0,0 +1,20 @@ +"file" "checksum" "built" "date" +"CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-01-24" +"LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-01-24" +"config.yaml" "e6eff4d2bc0f8b84f5735d6e6f3dfd38" "site/built/config.yaml" "2024-01-24" +"index.md" "300b6c379788fc91a9ff4aa886b1e96a" "site/built/index.md" "2024-01-24" +"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-01-24" +"paper.md" "bb2f0edaf5f729d9801fc2398650571b" "site/built/paper.md" "2024-01-24" +"episodes/1-introduction.Rmd" "e6cb5167810faaa801c598ceddd0e9c6" "site/built/1-introduction.md" "2024-01-24" +"episodes/2-keras.Rmd" "9e35ec651717f7323c01c1f4625bace1" "site/built/2-keras.md" "2024-01-24" +"episodes/3-monitor-the-model.Rmd" "65a1408b6774e38b951aaa50630ba08a" "site/built/3-monitor-the-model.md" "2024-01-24" +"episodes/4-advanced-layer-types.Rmd" "63dfc90c0b2ee952cf0a362c47000454" "site/built/4-advanced-layer-types.md" "2024-01-24" +"episodes/5-outlook.Rmd" "1f0e86b2cd274c5bbe7f4982e03c5733" "site/built/5-outlook.md" "2024-01-24" +"instructors/bonus-material.md" "d5b6aaee56986ab74e33bb95894cdc0e" "site/built/bonus-material.md" "2024-01-24" +"instructors/design.md" "6c13db77f9d69a294398a77da7e9883f" "site/built/design.md" "2024-01-24" +"instructors/instructor-notes.md" "b516f8e213b07224e85073bfe47ed3aa" "site/built/instructor-notes.md" "2024-01-24" +"instructors/survey-templates.md" "ea5d46e7b54d335f79e57a7bc31d1c5c" "site/built/survey-templates.md" "2024-01-24" +"learners/reference.md" "ae95aeca6d28f5f0f994d053dc10d67c" "site/built/reference.md" "2024-01-24" +"learners/setup.md" "53746145baf2b44786a48b001aeca69f" "site/built/setup.md" "2024-01-24" +"profiles/learner-profiles.md" "698c27136a1a320b0c04303403859bdc" "site/built/learner-profiles.md" "2024-01-24" +"renv/profiles/lesson-requirements/renv.lock" "2ad3064a33ab4898010b481abbf0ffdb" "site/built/renv.lock" "2024-01-24" diff --git a/paper.md b/paper.md new file mode 100644 index 00000000..a25a170f --- /dev/null +++ b/paper.md @@ -0,0 +1,200 @@ +--- +title: 'Introduction to deep learning: Carpentries-style hands-on lesson material for introducing researchers to deep learning' +tags: + - Python + - deep learning + - machine learning + - Keras + - neural networks +authors: + - name: Sven A. van der Burg + orcid: 0000-0003-1250-6968 + affiliation: 1 # (Multiple affiliations must be quoted, like "1, 2") + - name: Anne Fouilloux + orcid: 0000-0002-1784-2920 + affiliation: 2 + - name: Florian Huber + orcid: 0000-0002-3535-9406 + affiliation: "1, 3" + - name: Dafne van Kuppevelt + orcid: 0000-0002-2662-1994 + affiliation: 1 + - name: Peter Steinbach + orcid: 0000-0002-4974-230X + affiliation: 4 + - name: Berend Weel + orcid: 0000-0002-9693-9332 + affiliation: 1 + - name: Colin Sauze + orcid: 0000-0001-5368-9217 + affiliation: 5 + - name: Samantha Wittke + orcid: 0000-0002-9625-7235 + affiliation: "6,7" + - name: Djura Smits + orcid: 0000-0003-4096-0260 + affiliation: 1 + - name: Cunliang Geng + orcid: 0000-0002-1409-8358 + affiliation: 1 + - name: Pranav Chandramouli + orcid: 0000-0002-7896-2969 + affiliation: 1 + - name: Toby Hodges + orcid: 0000-0003-1766-456X + affiliation: 8 + + +affiliations: + - name: Netherlands eScience Center, Amsterdam, The Netherlands + index: 1 + - name: Simula Research Laboratory, Oslo, Norway + index: 2 + - name: Düsseldorf University of Applied Sciences, Düsseldorf, Germany + index: 3 + - name: Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Germany + index: 4 + - name: National Oceanography Centre, Liverpool, Great-Britain + index: 5 + - name: CSC - IT center for Science, Espoo, Finland + index: 6 + - name: Aalto University, Espoo, Finland + index: 7 + - name: The Carpentries, USA + index: 8 +date: 8 August 2023 +bibliography: paper.bib + +--- + +# Summary +This article describes a hands-on introduction to the first steps in deep learning, +intended for researchers who are familiar with (non-deep) machine learning. + +The use of deep learning has seen a sharp increase in popularity and applicability over the last decade. +While deep learning can be a useful tool for researchers from a wide range of domains, +taking the first steps in the world of deep learning can be somewhat intimidating. +This introduction aims to cover the fundamentals of deep learning in a practical and hands-on manner. By the end of the course, students will be able to train their first neural network and understand the subsequent steps needed to improve the model. + +The lesson starts by explaining the basic concepts of neural networks, +and then guides learners through the different steps of a deep learning workflow. +After following this lesson, +learners will be able to prepare data for deep learning, +implement a basic deep learning model in Python with Keras, +monitor and troubleshoot the training process, and implement different layer types, +such as convolutional layers. + +# Statement of Need +There are many free online course materials on deep learning, +see for example: @noauthor_fastai_nodate; @noauthor_udemy_nodate; @noauthor_udemy_nodate-1; @noauthor_udemy_nodate-2; @noauthor_coursera_nodate; @noauthor_freecodecamporg_2022. + +Nonetheless, these resources are often not available open-source and can thus not be easily adapted to the students' needs. +Also, these resources are intended to use for self-study. Our material can be used for self-study, but it is primarily developed for instructors to use in a workshop. +In addition, although a diverse range of online courses already exists, few are targeted towards academic researchers. + +Many computing centers offer (local) deep learning courses, such as @noauthor_csc-_nodate. +But the lesson material, if it is available, is not easily adopted outside the course organisation. + +What works well for learners is to both make them familiar with the key concepts, and also let them +practice with how to implement it. Eventually resulting in an increase in confidence and the conviction that 'I can do this myself'. +The key to getting there is live coding: before the course, learners have to setup a working environment on their own computer. +During the course, learners type in the commands that are explained by the instructor on their own computer. +This design is based on the Software Carpentry [@wilson_software_2006] philosophy. +Live coding ensures that learners master the programmatic implementation of deep learning at the end of the course. +We believe that this makes our lesson a unique and crucial resource. + +Researchers can often only free a limited amount of time (maximum 5 consecutive days), since they are so involved in their daily work. +To accomplish this, we created a lesson that can be taught in 2 consecutive days or 4 half days. + +Demand for our workshops and feedback gathered from students demonstrated the +need for a low-threshold lesson that lets researchers take the first steps in the field of deep learning. +This impression was validated by other instructors who taught the lesson independently to their own audiences and provided us with feedback on their experience. + +# Instructional design +This lesson material was designed using the concepts from The Carpentries Curriculum Development Handbook [@becker_carpentries_nodate]. +Most importantly, we used 'backward design': we started with identifying learning objectives, the core skills and concepts that learners should acquire as a result of the lesson. +Next, exercises were designed to assess whether these objectives are met. +Eventually, the content is written to teach the skills and concepts learners need to successfully complete the exercises and, it follows, meet the learning objectives. + +Live coding is central to this approach: +the lesson is built up of small blocks. In each block first the instructor demonstrates how to do something, +and students follow along on their own computer. Then, the students work independently on exercises individually +or in groups to test their skills. +This approach integrates opportunities for guided practice throughout the lesson, +promoting learning by +helping learners build up a functioning mental model of the domain and +transfer new knowledge from working memory to long-term memory. +This is in accordance with research-based successful teaching strategies [@lang_small_2021]. + +The lesson material is built in the new lesson template: Carpentries Workbench [@noauthor_carpentries_nodate]. +This makes the lesson material a complete self-study resource. +But it also serves as lesson material for the instructor teaching the lesson through live-coding, +in that case the lesson material is only shared with students after the workshop as a reference. +The lesson material can be toggled to the 'instructor view'. This allows to provide instructor notes on how to approach teaching the lesson, +and these can even be included at the level of the lesson content. +In addition, the Carpentries Workbench prioritises accessibility of the content, for example by having clearly visible figure captions +and promoting alt-texts for pictures. + +The lesson is split into a general introduction, and 3 episodes that cover 3 distinct increasingly more complex deep learning problems. +Each of the deep learning problems is approached using the same 10-step deep learning workflow (https://carpentries-incubator.github.io/deep-learning-intro/1-introduction.html#deep-learning-workflow). +By going through the deep learning cycle three times with different problems, learners become increasingly confident in applying this deep learning workflow to their own projects. + +# Feedback +This course was taught 12 times over the course of 3 years, both online and in-person, by the Netherlands eScience Center +(Netherlands, https://www.esciencecenter.nl/) and Helmholz-Zentrum Dresden-Rossendorf (Germany, https://www.hzdr.de/). +Apart from the core group of contributors, the workshop was also taught at 3 independent institutes, namely: +University of Wisconson-Madison (US, https://www.wisc.edu/), University of Auckland (New Zealand, https://www.auckland.ac.nz/), +and EMBL Heidelberg (Germany, https://www.embl.org/sites/heidelberg/). +In general, adoption of the lesson material by the instructors not involved in the project went well. +The feedback gathered from our own and others' teachings was used to polish the lesson further. + +## Student responses +The feedback we gathered from students is in general very positive, +with some responses from students to the question 'What was your favourite or most useful part of the workshop. Why?' further confirming our statement of need: + +> _I enjoyed the live coding and playing with the models to see how it would effect the results. +> It felt hands on and made it easy for me to understand the concepts._ + +> _Well-defined steps to be followed in training a model is very useful. Examples we worked on are quite nice._ + +> _The doing part, that really helps to get the theory into practice._ + +Below are two tables summarizing results from our post-workshop survey. We use the students' feedback to continuously improve the lesson. + +| | STRONGLY DISAGREE | DISAGREE | UNDECIDED | AGREE | STRONGLY AGREE | TOTAL | WEIGHTED AVERAGE | +|--------------------------------------------------------------------------------------------|-------------------------------|-----------------|------------------|--------------|-----------------------|--------------|-------------------------| +| I can immediately apply what I learned at this workshop. | 0 | 5 | 6 | 19 | 8 | 38 | 3,8 | +| The setup and installation instructions for the lesson were complete and easy to follow. | 0 | 0 | 4 | 13 | 21 | 38 | 4,4 | +| Examples and tasks in the lesson were relevant and authentic | 0 | 0 | 5 | 19 | 14 | 38 | 4,2 | + +Table 1: Agreement on statements by students from 2 workshops taught at the Netherlands eScience Center. +The results from these 2 workshops are a good representation of the general feedback we get when teaching this workshop. + +| | POOR | FAIR | GOOD | VERY GOOD | EXCELLENT | N/A | TOTAL | WEIGHTED AVERAGE | +|---------------------------------------------------------------------------|------------------|-------------|-------------|------------------|------------------|------------|--------------|-------------------------| +| Introduction into Deep Learning | 0 (0%) | 2 (5%) | 10 (27%) | 8 (22%) | 17 (46%) | 0 (0%) | 37 | 4,1 | +| Classification by a Neural Network using Keras (penguins dataset) | 0 (0%) | 1 (3%) | 5 (13%) | 16 (42%) | 16 (42%) | 0 (0%) | 38 | 4,2 | +| Monitoring and Troubleshooting the learning process (weather dataset) | 0 (0%) | 0 (0%) | 4 (11%) | 18 (47%) | 16 (42%) | 0 (0%) | 38 | 4,3 | +| Advanced layer types (CIFAR-10 dataset) | 0 (0%) | 2 (5%) | 5 (13%) | 7 (18%) | 16 (42%) | 8 (21%) | 38 | 4,2 | + +Table 2: Quality of the different episodes of the workshop as rated by students from 2 workshops taught at the Netherlands eScience Center. +The results from these 2 workshops are a good representation of the general feedback we get when teaching this workshop. + +# Conclusion +This lesson can be taught as a stand-alone workshop to students already familiar with machine learning and Python. +It can also be taught in a broader curriculum after an introduction to Python programming (for example: @azalee_bostroem_software_2016) +and an introduction to machine learning (for example: @noauthor_scikit-learn_2023). +Concluding, the described lesson material is a unique and essential resource aimed at researchers and designed specifically for a live-coding teaching style. +Hopefully, it will help many researchers to set their first steps in a successful application of deep learning to their own domain. + +# Acknowledgements +We would like to thank all instructors and helpers that taught the course, +and the community of people that left contributions to the project, no matter how big or small. +Also, we thank Chris Endemann (University of Wisconson-Madison, US, https://www.wisc.edu/), +Nidhi Gowdra (University of Auckland, New Zealand, https://www.auckland.ac.nz/), +,Renato Alves and Lisanna Paladin (EMBL Heidelberg, Germany, https://www.embl.org/sites/heidelberg/), +that piloted this workshop at their institutes. +We thank the Carpentries for providing such a great framework for developing this lesson material. +We thank all students enrolled in the workshops that were taught using this lesson material for providing us with feedback. + +# References diff --git a/reference.md b/reference.md new file mode 100644 index 00000000..0b1d60aa --- /dev/null +++ b/reference.md @@ -0,0 +1,48 @@ +--- +title: Reference +--- + +## Glossary + +* [artificial intelligence](https://glosario.carpentries.org/en/#artificial_intelligence) +* [machine learning](https://glosario.carpentries.org/en/#machine_learning) +* [deep learning](https://glosario.carpentries.org/en/#deep_learning) +* [neural network](https://glosario.carpentries.org/en/#neural_network) +* [convolutional neural network (CNN)](https://glosario.carpentries.org/en/#cnn) +* [recurrent neural network (RNN)](https://glosario.carpentries.org/en/#rnn) +* [accuracy](https://glosario.carpentries.org/en/#accuracy) +* [epoch](https://glosario.carpentries.org/en/#epoch_dl) +* [learning rate](https://glosario.carpentries.org/en/#learning_rate) +* [confusion matrix](https://glosario.carpentries.org/en/#confusion_matrix) +* [class imbalance](https://glosario.carpentries.org/en/#class_imbalance) +* [overfitting](https://glosario.carpentries.org/en/#overfitting) +* [hidden layer](https://glosario.carpentries.org/en/#hidden_layer) + +## External references +Here is a (non exhaustive) list of external resources for further study after this lesson: + +### Miscellaneous resources +- [the difference between validation data and test data](https://machinelearningmastery.com/difference-test-validation-datasets/) +- [underfitting and overfitting](https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/) +- [Unbalanced data](https://towardsdatascience.com/handling-imbalanced-datasets-in-deep-learning-f48407a0e758) +- [Unbalanced data in Keras](https://www.tensorflow.org/tutorials/structured_data/imbalanced_data) +- [Tensorflow Playground, for visualizing neural networks](http://playground.tensorflow.org/) +- [ChatGPT prompt engineering course](https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/1/lesson_1) + +### Some ML challenges or benchmarks +- https://mlcontests.com/ +- [Kaggle, machine learning competitions](https://www.kaggle.com/) +- [protein structure prediction](https://predictioncenter.org/) +- [prediction of protein-protein interactions](https://www.ebi.ac.uk/msd-srv/capri/) + + +### Some courses for deeper learning: +- [Fast AI course: making neural nets uncool again](https://www.fast.ai/) +- [Intro to Deep Learning with PyTorch](https://www.udacity.com/course/deep-learning-pytorch--ud188), the course is quite intuitive +- Coursera courses by Andrew Ng: + - [AI for everyone](https://www.coursera.org/learn/ai-for-everyone), for beginners who won't do ML projects but are courious about what AI really is and what AI can do + - [ML course](https://www.coursera.org/learn/machine-learning) and [DL course](https://www.coursera.org/specializations/deep-learning), quite intensive courses for beginner/intermediate-level researchers who will do ML/DL projects + - [Structuring Machine Learning Projects](https://www.coursera.org/learn/machine-learning-projects), how to conduct ML projects with useful ML engineering strategies + - [Book on Machine Learning](https://databricks.com/p/ebook/big-book-of-machine-learning-use-cases?utm_medium=paid+search&utm_source=google&utm_campaign=15631674924&utm_adgroup=130078635494&utm_content=ebook&utm_offer=big-book-of-machine-learning-use-cases&utm_ad=587637991591&utm_term=machine%20learning&gclid=CjwKCAjw9qiTBhBbEiwAp-GE0WaK3IrtfBeDWjb7L2ZDQg5_YgevbwoD288bq0sGgYNhcTlnjZfLaBoCC_EQAvD_BwE) +- [Book: Ian Goodfellow and Yoshua Bengio and Aaron Courville - Deep Learning](https://www.deeplearningbook.org/). A really thorough, detailed (though math-heavy) book on everything (for example Generative Adverserial Networks or Autoencoders) you want to know about deep learning + diff --git a/setup.md b/setup.md new file mode 100644 index 00000000..d3d9ec55 --- /dev/null +++ b/setup.md @@ -0,0 +1,211 @@ +--- +title: Setup +--- +## Software Setup + +::::::::::::::::::::::::::::::::::::::: discussion + +### Installing Python using Anaconda + +[Python][python] is a popular language for scientific computing, and a frequent choice +for machine learning as well. Installing all of its scientific packages +individually can be a bit difficult, however, so we recommend the installer [Anaconda][anaconda] +which includes most (but not all) of the software you will need. + +Regardless of how you choose to install it, please make sure you install Python +version 3.x (e.g., 3.4 is fine). Also, please set up your python environment at +least a day in advance of the workshop. If you encounter problems with the +installation procedure, ask your workshop organizers via e-mail for assistance so +you are ready to go as soon as the workshop begins. + +::::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::: solution + +### Windows + +Checkout the [video tutorial][video-windows] or: + +1. Open [https://www.anaconda.com/products/distribution][anaconda-distribution] +with your web browser. +2. Download the Python 3 installer for Windows. +3. Double-click the executable and install Python 3 using _MOST_ of the + default settings. The only exception is to check the + **Make Anaconda the default Python** option. + +::::::::::::::::::::::::: + +:::::::::::::::: solution + +### MacOS + +Checkout the [video tutorial][video-mac] or: + +1. Open [https://www.anaconda.com/products/distribution][anaconda-distribution] + with your web browser. +2. Download the Python 3 installer for OS X. +3. Install Python 3 using all of the defaults for installation. + +::::::::::::::::::::::::: + + +:::::::::::::::: solution + +### Linux + +Note that the following installation steps require you to work from the shell. +If you run into any difficulties, please request help before the workshop begins. + +1. Open [https://www.anaconda.com/products/distribution][anaconda-distribution] with your web browser. +2. Download the Python 3 installer for Linux. +3. Install Python 3 using all of the defaults for installation. + a. Open a terminal window. + b. Navigate to the folder where you downloaded the installer + c. Type + ```bash + bash Anaconda3- + ``` + and press tab. The name of the file you just downloaded should appear. + d. Press enter. + e. Follow the text-only prompts. When the license agreement appears (a colon + will be present at the bottom of the screen) hold the down arrow until the + bottom of the text. Type `yes` and press enter to approve the license. Press + enter again to approve the default location for the files. Type `yes` and + press enter to prepend Anaconda to your `PATH` (this makes the Anaconda + distribution the default Python). + +::::::::::::::::::::::::: + +## Installing the required packages + +[Conda](https://docs.conda.io/projects/conda/en/latest/) is the package management system associated with [Anaconda](https://anaconda.org) and runs on Windows, macOS and Linux. +Conda should already be available in your system once you installed Anaconda successfully. Conda thus works regardless of the operating system. +Make sure you have an up-to-date version of Conda running. +See [these instructions](https://docs.anaconda.com/anaconda/install/update-version/) for updating Conda if required. +{: .callout} + +To create a conda environment called `dl_workshop` with the required packages, open a terminal (Mac/Linux) or Anaconda prompt (Windows) and type the command: +```bash +conda create --name dl_workshop python jupyter seaborn scikit-learn pandas +``` + +Activate the newly created environment: +``` +conda activate dl_workshop +``` + +Install tensorflow using pip (python's package manager): +```bash +pip install tensorflow +``` + +Note that modern versions of Tensorflow make Keras available as a module. + +[pip](https://pip.pypa.io/en/stable/) is the package management system for Python software packages. +It is integrated into your local Python installation and runs regardless of your operating system too. + +::::::::::::::::::::::::::::::::::::::: discussion + +### Python package installation troubleshooting + + + +::::::::::::::::::::::::::::::::::::::::::::::::::: + +:::::::::::::::: solution + +### Troubleshooting for Windows +It is possible that Windows users will run into version conflicts. If you are on Windows and get +errors running the command, you can try installing the packages using pip within a conda environment: + +```bash +conda create -n dl_workshop python jupyter +conda activate dl_workshop +pip install tensorflow>=2.5 seaborn scikit-learn pandas +``` + +::::::::::::::::::::::::: + +::::::::::::::::::: solution + +### Troubleshooting for Macs with Apple silicon chip +Newer Macs (from 2020 onwards) often have a different kind of chip, manufactured by Apple instead of Intel. +This can lead to problems installing Tensorflow . +If you get errors running the installation command or conda hangs endlessly, +you can try installing Tensorflow for Mac with pip: + +```bash +pip install tensorflow-macos +``` + +:::::::::::::::::::::::::::: + +## Starting Jupyter Lab + +We will teach using Python in [Jupyter lab][jupyter], a +programming environment that runs in a web browser. Jupyter requires a reasonably +up-to-date browser, preferably a current version of Chrome, Safari, or Firefox +(note that Internet Explorer version 9 and below are *not* supported). If you +installed Python using Anaconda, Jupyter should already be on your system. If +you did not use Anaconda, use the Python package manager pip +(see the [Jupyter website][jupyter-install] for details.) + +To start jupyter lab, open a terminal (Mac/Linux) or Anaconda prompt (Windows) and type the command: + +```bash +jupyter lab +``` + +To start the Python interpreter without jupyter lab, open a terminal (Mac/Linux) or Anaconda prompt (Windows) +or git bash and type the command: + +```bash +python +``` + +## Check your setup +To check whether all packages installed correctly, start a jupyter notebook in jupyter lab as +explained above. Run the following lines of code: +```python +import sklearn +print('sklearn version: ', sklearn.__version__) + +import seaborn +print('seaborn version: ', seaborn.__version__) + +import pandas +print('pandas version: ', pandas.__version__) + +from tensorflow import keras +print('Keras version: ', keras.__version__) + +import tensorflow +print('Tensorflow version: ', tensorflow.__version__) +``` + +This should output the versions of all required packages without giving errors. +Most versions will work fine with this lesson, but: +- For Keras and Tensorflow, the minimum version is 2.12.0 +- For sklearn, the minimum version is 1.2.2 + +## Fallback option: cloud environment +If a local installation does not work for you, it is also possible to run this lesson in [Binder Hub](https://mybinder.org/v2/gh/carpentries-incubator/deep-learning-intro/scaffolds). This should give you an environment with all the required software and data to run this lesson, nothing which is saved will be stored, please copy any files you want to keep. Note that if you are the first person to launch this in the last few days it can take several minutes to startup. The second person who loads it should find it loads in under a minute. Instructors who intend to use this option should start it themselves shortly before the workshop begins. + +Alternatively you can use [Google colab](https://colab.research.google.com/). If you open a jupyter notebook here, the required packages are already pre-installed. Note that google colab uses jupyter notebook instead of jupyter lab. + +## Downloading the required datasets + +Download the [weather dataset prediction csv][weatherdata] and [BBQ labels][weatherbbqdata]. + +[anaconda]: https://www.anaconda.com/products/individual +[anaconda-distribution]: https://www.anaconda.com/products/distribution +[jupyter]: http://jupyter.org/ +[jupyter-install]: http://jupyter.readthedocs.io/en/latest/install.html#optional-for-experienced-python-developers-installing-jupyter-with-pip +[python]: https://python.org +[video-mac]: https://www.youtube.com/watch?v=TcSAln46u9U +[video-windows]: https://www.youtube.com/watch?v=xxQ0mzZ8UvA +[penguindata]: https://zenodo.org/record/3960218/files/allisonhorst/palmerpenguins-v0.1.0.zip?download=1 +[weatherdata]: https://zenodo.org/record/5071376/files/weather_prediction_dataset_light.csv?download=1 +[weatherbbqdata]: https://zenodo.org/record/4980359/files/weather_prediction_bbq_labels.csv?download=1 + + diff --git a/survey-templates.md b/survey-templates.md new file mode 100644 index 00000000..89dbe688 --- /dev/null +++ b/survey-templates.md @@ -0,0 +1,77 @@ +--- +title: Workshop survey templates +--- + +This page lists possible questions for both pre-workshop and post-workshop surveys, that instructors are free to use. Note, the nature of questions might change throughout the course of development of this lesson module. + +In that sense, please provide feedback or experiences with these questions! + +## pre-workshop survey + +The pre-workshop survey is meant to serve two goals: + +- provide the instructors with a way to estimate the knowledge background of learners +- provide the learners another source of evidence, what is expected from them + +What worked well was that each participant is asked 2-3 questions for each of the following concepts: + +1. self-estimated proficiency in python coding +2. self-estimated proficiency in data science methods +3. self-estimated proficiency in ML methods + +Sticking to questions that relate to problem solving rather than focusing on libraries or advanced concepts can help to diversify your learners. But as a whole, the following is meant to support instructors to find out ['what to teach'](https://cdh.carpentries.org/deciding-what-to-teach.html#target-audience). So you'd first start out to describe your target audience and then select subsets of questions that serve this audience. + +The following questions all are meant to offer four answers learners can choose from: + +> - I know how to do this. +> - I'd consult code I've written. +> - I'd ask a colleague or search for this online. +> - I am not sure what this question is talking about. + +This way, the implications of right/wrong answers on the learners are avoided, i.e. no learner needs to feel bad prior to the workshop for not-knowing something. These questions rather aim to probe the self-estimated profiency of learners. + +### Questions about python coding + +**The following questions range from simple to more advanced.** + +- You are provided with a python list of integer values. The list has length 1024 and you would like to obtain all entries from index 50 to 101. How would you do this? + +- You need to open 102 data files and extract an object from them. For this, you compose a small function to open a single file which requires 3 input parameters. The parameters are a file location, the name of the object to retrieve and a parameter that controls the verbosity of the function. The latter parameter has the default value “False”. + +- Consider a 4 dimensional numpy array of uint32 type numbers. The shape of the array is '(512,3,224,224)' and it is stored in row-major memory ordering. Your goal is to set every other row to 42, i.e. every other entry in dimension 2 to 42. + +- You discovered the numpy 'rot90' method. You observe yourself calling it over and over again with the same parameter for the axis option. You'd like to write a wrapper function which calls 'rot90' having axis set to '(0,1)'. + +- A new version of your favorite package `foo` has just been released on github with a bug fix that you are interested in. You'd like to try this new version out in an environment that is independent of your local user environment which can be populated with 'python -m pip install --user foo'. + +### Questions about data science + +**The following questions range from simple to more advanced.** + +- You are provided a list of 512 random float values. These values range between 0 and 100. You would like to remove any entry in this list that is larger than 90 or smaller than 10. + +- You are provided with a CSV file. The file has 35000 rows. The file has 45 columns. Each column is separated by the “,” symbol from the other. You would like to open the file in python, calculate the arithmetic mean, the minimum and maximum of column number 5, 12 and 39. + +- You are provided with a CSV file. The file has 35000 rows. The file has 45 columns. Each column is separated by the “,” symbol from the other. You would like to open the file in python, remove all entries where the value of column 21 is larger than 50. The values removed are to be replaced by 0. + +- You are provided with a CSV file. The file has 35000 rows. The file has 45 columns. Each column is separated by the “,” symbol from the other. When you load the file and plot the histogram of column 40, you are suspicious that the floating point values are not normally distributed. But, the producer of the CSV file assures you that all columns are normally distributed. To make sure, you sit down to code a function which tests any given column if it is normally distributed. + +- A probability density function (PDF) in statistics is something very distinct from a probability mass function (PMF). What is the difference? + +- You are given samples of two sets of observations with type float32. Both arrays have the same shape: "(1024, 1)" You'd like to compare both ensembles if they are produced by the same PDF. How do you do this? + +### Questions about machine learning + +**The following questions range from simple to more advanced.** + +- You are given a dataset from experiments that you want to use for machine learning (13 columns with 25000 rows). One column is particularly useful and is encoded as real numbers in a range from -15 to 12. You would like to normalize this data so that it fits into the range of real numbers between 0 and 1. How would you do this? + +- You are helping to organize a conference of more than 1000 attendants. All participants have already paid and are expecting to pick up their conference t-shirt on the first day. Your team is in shock as it discovers that t-shirt sizes have not been recorded during online registration. However, all participants were asked to provide their age, gender, body height and weight. To help out, you sit down to write a python script that predicts the t-shirt size for each participant using a clustering algorithm. + +- You are presented a training data set of 10000 samples. You would like to train a classifier on this datasets. However, you observe that class 0 dominates the training set at 60%, the other classes equally share 20% of the rows each. How do you prepare your training procedure? + +- You are given a trained classification model by a colleague. Your peer trained it on a very large dataset. You need it for inference on your data. You reassure yourself if the model you loaded from disk really reproduces the weights by running the `predict` function on an unseen validation data set. The network doesn't make the same predictions as your colleague documented it for you. What could be the cause? + +- You are provided a table of observations with 23 columns and 5000 rows. For half of the rows, there is data available for column 24 as float32 between '[-1, +1]'. You'd like to write a predictor for these missing rows. + +