Skip to content

Project 2: Klayton Wittler #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified Project2-Character-Recognition/2x2_XOR_excel_example.xlsx
Binary file not shown.
3 changes: 3 additions & 0 deletions Project2-Character-Recognition/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
endif()

include_directories(.)
link_directories(${CUDA_TOOLKIT_ROOT_DIR}/lib/x64)
add_subdirectory(character_recognition)

cuda_add_executable(${CMAKE_PROJECT_NAME}
Expand All @@ -32,4 +33,6 @@ cuda_add_executable(${CMAKE_PROJECT_NAME}
target_link_libraries(${CMAKE_PROJECT_NAME}
character_recognition
${CORELIBS}
cublas
)

43 changes: 37 additions & 6 deletions Project2-Character-Recognition/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,43 @@ CUDA Character Recognition

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Klayton Wittler
* [LinkedIn](https://www.linkedin.com/in/klayton-wittler/)
* Tested on: Windows 10 Pro, i7-7700K @ 4.20GHz 16.0GB, GTX 1070 8.192GB (my PC)

### (TODO: Your README)
## Sections

Include analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
* [Introduction](#introduction)
* [Impelmentation](#implementation)
* [Additions](#additions)

# Introduction
This project is an attempt at developing a multi-layer perceptron from scratch that utilizes some of the parallelization offered by the GPU.

![MLP](img/MLP.png)

The architecture for the network is:

* Input Layer: Number of nodes in this layer is equal to the number of features. Two for XOR and 10201 for the images in the dataset

* Hidden Layer: This layer is a linear transformation with some weighted matrix activated by some non-linear function (sigmoid)

* Output Layer: The number of nodes in this layer is equat to the number of labels. One for XOR and 52 for the images in the dataset

The loss function for the network is currently the sum of squared error, but other losses for the image classification would be better (for example cross entropy).

# Implementation
My implemenation contains a flag ```XOR``` which switches from the XOR test to the image dataset. I currently have the ability to load the images, store them into an arary and give it to training or testing functions, however the backprogation for training does not work. Below is an example of forward propagation wiht loss calculation on the XOR dataset.

![XOR](img/XORoutput.png)

Inside ```mlp.cu``` the main functions are ```train``` and ```test```. Each of these call ```forward``` to run the network utilizing cublas matrix multiplication in ```matrixMultiply``` for the weights and calls ```kernSigmoid``` to run the activations in parallel. This is repeated twice for the two layers in which the prediction is then returned. The loss is then calculated in ```MSE``` which is also set up to be done in parallel but for the XOR example is trivial. The loss could then be fed into backpropagation to adjust the weights in order perform gradient descent on the loss function. In ```main.cpp``` have given an option to specify the number of desired iterations and learning rate to the user as well as the number of hidden nodes.

# Additions
## Matrix multiplication
```CMakeLists.txt``` has been edited to include ```link_directories(${CUDA_TOOLKIT_ROOT_DIR}/lib/x64)``` as well as ```cublas``` in ```target_link_libraries```.

Each layer of the network is handle with a matrix multiplication through cublas in the ```matrixMultiply``` function then feed through the sigmoid activation in parallel with ```kernSigmoid```. Below is the outputs of each layer in the XOR example. There is a flag for testing matrix multiply ```MULT``` which will trigger a console prompt of what size matrix to test, fill in each element with its index and return the result for verification.

![XOR-matrix](img/XORforwardProp.png)

Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,8 @@ set(SOURCE_FILES

cuda_add_library(character_recognition
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_61
)

target_link_libraries(character_recognition ${CUDA_LIBRARIES})
target_link_libraries(character_recognition ${CUDA_CUBLAS_LIBRARIES})
Loading