Skip to content

Project 2: Grace Gilbert #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 41 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
6adaf04
completed cpu impl and setup naive gpu scan
gracelgilbert Sep 12, 2019
1f7ff94
completed naive scan
gracelgilbert Sep 13, 2019
d753adb
completed work efficient scan
gracelgilbert Sep 13, 2019
ab168c4
completed gpu Stream Compaction
gracelgilbert Sep 13, 2019
c137368
thrust scan, setup mlp project
gracelgilbert Sep 15, 2019
78e4ce9
in progress mlp
gracelgilbert Sep 16, 2019
3a80fbd
base implementation of training done
gracelgilbert Sep 17, 2019
119033e
Update README.md
gracelgilbert Sep 17, 2019
d0043f1
Update README.md
gracelgilbert Sep 17, 2019
124361b
changed random value range
gracelgilbert Sep 17, 2019
69a2c0a
Update README.md
gracelgilbert Sep 17, 2019
13f5846
Update README.md
gracelgilbert Sep 17, 2019
5688aea
Update README.md
gracelgilbert Sep 17, 2019
539864d
Update README.md
gracelgilbert Sep 17, 2019
179fe9b
Update README.md
gracelgilbert Sep 17, 2019
2965dc1
Update README.md
gracelgilbert Sep 17, 2019
2785b87
progress fixing error propogation
gracelgilbert Sep 17, 2019
8eb072e
Merge branch 'master' of https://github.com/gracelgilbert/Project2-Nu…
gracelgilbert Sep 17, 2019
405c99d
converging for xor
gracelgilbert Sep 17, 2019
7160348
added screenshots of algorithm diagrams
gracelgilbert Sep 17, 2019
66b0661
changed to only 2 character input
gracelgilbert Sep 17, 2019
9b08ec4
Update README.md
gracelgilbert Sep 17, 2019
0d942c6
Update README.md
gracelgilbert Sep 17, 2019
aed1d93
Update README.md
gracelgilbert Sep 17, 2019
3f27050
image of pseudocode for naive scan
gracelgilbert Sep 17, 2019
cf90d24
Update README.md
gracelgilbert Sep 17, 2019
20a4bb0
Update README.md
gracelgilbert Sep 17, 2019
ced7274
added performance charts
gracelgilbert Sep 18, 2019
e8ecc36
Merge branch 'master' of https://github.com/gracelgilbert/Project2-Nu…
gracelgilbert Sep 18, 2019
0dcd702
Update README.md
gracelgilbert Sep 18, 2019
2c10cdb
resetting array size in stream compaction to small value
gracelgilbert Sep 18, 2019
ff533ed
cleaned up code so that user can run xor or character, training or te…
gracelgilbert Sep 18, 2019
c242ad5
Update README.md
gracelgilbert Sep 18, 2019
3f1f1be
Update README.md
gracelgilbert Sep 18, 2019
9b9f9cb
added math for mlp
gracelgilbert Sep 18, 2019
fa8c20e
Merge branch 'master' of https://github.com/gracelgilbert/Project2-Nu…
gracelgilbert Sep 18, 2019
3544a21
Update README.md
gracelgilbert Sep 18, 2019
b22275c
Update README.md
gracelgilbert Sep 18, 2019
abff22d
adjusted prediction weights to be more converged weights
gracelgilbert Sep 18, 2019
51f44bf
Update README.md
gracelgilbert Sep 18, 2019
cdf83ff
training turned on
gracelgilbert Sep 18, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 56 additions & 6 deletions Project2-Character-Recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,62 @@ CUDA Character Recognition

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Grace Gilbert
* gracelgilbert.com
* Tested on: Windows 10, i9-9900K @ 3.60GHz 64GB, GeForce RTX 2080 40860MB

## Overview
In this project I implemented a multilayer perceptron (mlp), a simple neural network that is used for machine learning. There are two states of the project, the training state and the predicting state. In the training state, we are given a set of data inputs and their desired output values. In this case, the main data set is 52 English characters (upper and lower case), each with a .txt file containing gray scale values of an image of the character. We train the mlp using the .txt file data and the expected values, adjusting weights based on error calculations until they converge on good weights that match the output value with high consistency. Once we have these good weights, we can then use the mlp in the predicting state with these weights to predict the output value of a .txt file data without knowing it's desired output in advance.

### Project Run Options
At the top of main.cpp, there are multiple setting that the user can adjust.
- The first, training, indicates if we are in the training (1) or predicting state (0).
- Next is the accepted error, which determines what error value is low enough that we can consider the weights that achieved that error good enough weights. The lower this value is, the longer it will take for the weights to converge, but the more accurate the predictions will be.
- Next is the number of random weight attempts. To find good weights, we start with randomly assigned weights. We then see if these weights converge. If they do not converge after a certain number of iterations of training on the input data, then we reset and try new random weights. The number of random weight attempts determines how many times we will reset and try new weights before giving up on convergence.
- Related is the number of convergence iterations. When testing certain weights, we repeatedly run the data through the mlp network and refine the weights until the error is low enough. This number determines how many times we will refine the weights before giving up on this set of weights and resetting.
- The final option is the use xor value. I had some trouble training the character recognition data, but was successfully able to get weights for the simpler xor example to work, where the input is a pair of 0's and 1's, and the output is the xor of the pair. If this value is set to 1, the training and prediction will run on the xor example rather than the character data. If set to 0, it will read the character data from the text files and run training and prediction on that.

## Training
The overall training pipeline is to generate a set of random weights, run each input data through the mlp network, and iteratively adjust the weights based on the output and error returned from the mlp network until the error is sufficiently low. It is not guaranteed that all sets of initial weight guesses will converge to good final weight values, so after some number of iterations of testing the input data and adjusting weights, we reset and make new random weight guesses. If after many resets of the guesses, we still haven't gotten convergence, we stop to avoid an infinitely running program. Once the converged weights are found, we output these weights.
### Random Weight Generation
The random weights are generated on the GPU, as the weights do not depend on each other, so can be generated in parallel. For the random seed, I take the elapsed time between the start of training and the time at which the weights are generated, so that each set of random weights has a different seed.
### MLP Network
The mlp network takes in out piece of input data at a time. This input data is make up of a certain number of values. In the case of the xor data, the data is made up of two numbers, whereas in the case of the character data, the input is made of up 10201 numbers, 101 by 101 numbers. There is then a second layer of values called the hidden layer. The number of hidden layers is somewhere between 1 and the number of input numbers. For each pair of input numbers and hidden layer numbers, there is a corresponding. predetermined weight. The value of the hidden layer values is found by taking a dot product between the input values and their weights corresponding to that hidden layer index. That dot product value is then put into an equation to find the hidden layer value:

```
hidden layer value = 1/(1 + exp(-dot product value))
```

This completes the first layer of the mlp. Each hidden layer value also has a weight. The final layer is just one value, as this will be our output. To find that final value, we fist sum the products of each hidden layer value and its weight. Then we perform the same operation described above on this product we just found. This is our output. If we were in the prediction state, this would be the conclusion of the mlp, as we would return this output as the prediction based on the given weights.

Below is a diagram illustrating the layers and weights in the mlp network:

![](img/MLP.png)

### Error propagation
In training mode, we want to use the mlp output to calculate error and adjust the weights to lower error, a process called error propagation. To calculate the error of an indivual data input, we find:
```
error = (output - expected value)^2
```
The expected value is a numerical value representing what the input data's correct value is. For the xor example, it is 1 for true, 0 for false, and for the characters, the characters are number 1 through 52.

Next, we calculate the partial derivative of the individual data value's error over the weight for each weight:

![](img/PartialDeriv.PNG)

We then take the accumulation of all of the error values from each input's mlp run. Using this value, combined with the partial derivative mentioned above, we come out with a delta that gets added to each weight:

![](img/Delta.PNG)

In my implementation, I run the mlp on each piece of input data, where I calculate the output, the error, and the partial derivatives for each weight. During this iteratation of all the data, I accumulate the total error, which I then use to calculate the delta values that I then add these to the weights, modifying the weights for the next iteration over the data. If the total accumulated error is low enough, no more error propagation is needed, as the weights are sufficiently converged.

## Predicting
In the predicting phase, there are no iterative loops of running the mlp. Instead, we assign predetermined weights, ideally weights that have been found as good predictors in training. We then run the mlp on the input data with these weights, and just find the output, no error propagation. This output becomes our prediction. In both the xor and character data, the expected values are all discrete integers, so when outputting the prediction, I round the mlp result to the nearest integer. In my network, I was unable to find converged weights for the character data, so the predictor will almost always output all 0's, not close to the expected values ranging from 1 to 52. However for the xor data, given the weights I found that converged, the predictor is accurate on all inputs.

## Challenges
I ran into multiple challenges working on this project. Initially, when calculating the error propagation, instead of using the total accumulated error over all inputs to find the delta value for the weights, I was finding an error value per input. This threw off my weight adjustments, so my weights were never converging. Once I caught this, I reorganized my code to output the partial derivatives and output value for each input, and then after accumulating all of the error, finally calculate the delta.

Another challenge I faced and was unable to fix was that once I changed the data to the character data, I found that it never converged and all of the data points kept outputting the same value even though they contained different data and different expected values. There may have been an error in how I was reading in and passing along the larger data sets. Another possibility is that I did not let the weights converge long enough. Once I increased the data to be 10201 values per input, and 52 inputs, the mlp loops ran significantly slower and I ran out of time to let them run long enough to potentially converge. I am somewhat skeptical that they would have converged, however, as it seemed incorrect that they would all output the same value on every iteration. Due to this challenge, I was unable to find good weights for the character data, so my prediction for that data set is almost meaningless, just using random weights on a potentially buggy input data set.

### (TODO: Your README)

Include analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)

Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ set(SOURCE_FILES

cuda_add_library(character_recognition
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_75
)
Loading