CIS565-Fall-2019 · clach · Sep 17, 2019 · Sep 17, 2019 · Sep 17, 2019 · Sep 18, 2019
diff --git a/Project2-Character-Recognition/README.md b/Project2-Character-Recognition/README.md
@@ -1,14 +1,58 @@
-CUDA Character Recognition
-======================
+# Project 2b: CUDA Character Recognition
+**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
+Project 2 - Character Recognition**
 
-**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**
+Caroline Lachanski: [LinkedIn](https://www.linkedin.com/in/caroline-lachanski/), [personal website](http://carolinelachanski.com/)
 
-* (TODO) YOUR NAME HERE
-  * (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+Tested on: Windows 10, i5-6500 @ 3.20GHz 16GB, GTX 1660 (personal computer)
 
-### (TODO: Your README)
+## Project Description
 
-Include analysis, etc. (Remember, this is public, so don't put
-anything here that you don't want to share with the world.)
+The goal of this project was to implement a small neural network that would recognize characters from black-and-white image inputs. I specifically implemented a multi-layer perceptron, with one hidden layer. In this project, I let the size of the hidden layer be the average of the size of the input layer and the size of the output layer (which is always 1).
 
+![](./img/MLP.png)
+
+The training starts when a set of data in input into the peceptron, becoming the first layer. This data is multiplied by a series of weights (initially created randomly) that each correspond to one input node and one hidden layer node. These weights are used to determine the importance or the "weight" each specific input will have in determining the output. All of the products with the same hidden layer node position are summed, then put into the activation function, which, in this project, is a sigmoid function, f(x) = 1/(1+e^-x). The results of this function then become the "input" to the next layer, which has its own set of weights and follows a similar weighting and summing process. Since our final layer is one node, we are left with one output, which is also input into the activation function.
+
+Each input into the network results in a singular output value. This expected value for this input is subtracted from the actual output, squared, and then summed with the corresponding result from every other input. This sum is divided by 2 to give us the total error for this round of training.
+
+![](./img/error_equation.png)
+
+We then calculate the error in the difference between the actual and theoretical output due to each specific weight (the error's partial derivative with respect to each weight). Each of these values is multiplied by a negative lambda (here defined as (total error / 5)) to get the delta to be added to that specific weight. This is what is known as back-propogation and should ultimately result in a reduction of error in the overall system.
+
+![](./img/delta_weight.png)
+
+We can continue this reduction of error until it reaches a certain threshold or we reach our maximum number of iterations. Once training is complete, we record the final weights to a text file, which can then be used as the weights for when we actually want to run our system on a single input.
+
+I initially trained and tested my data on a small XOR example, then moved onto to training and testing for the much larger character recognition example (52 characters, each with an input array of 101x101 = 10,201 floats).
+
+## Output
+
+The weights produced by training for both the XOR and character recognition examples can be found in text files in this repository. After training, I tested the network on each possible input. Here is the result for XOR:
+
+```
+*********************************
+*********** XOR TESTS ***********
+*********************************
+Expected output: 0.000000
+Actual output:   0.001515
+
+Expected output: 1.000000
+Actual output:   0.977023
+
+Expected output: 1.000000
+Actual output:   0.976850
+
+Expected output: 0.000000
+Actual output:   0.019346
+
+Total error:   0.000720
+```
+
+There seems to be an issue with my character recognition test, as the actual output for each tested character is 0.
+
+## Challenges
+
+This project was my first foray into machine learning, and only my second project using CUDA/GPU programming, so this project was a bit of a challenge. I struggled a bit with choosing how to represent and pass around data, particularly when it came to error calculation and back-propogation. For example, when it came to calculating the partial error derivatives for each weight and summing them to get a delta w for each weight, I intially made one large float * buffer that held all the partial derivatives for all weights and for all inputs, and worked with this using various indexing schemes. This worked fine for the very small XOR example, but when it came to the actual character recognition example, which had over 50 million weights, I had trouble fitting everything one buffer. I had to think of a way of dividing the data into pieces while also allowing me to run various calculations on it in parallel, and I still don't think what I did was the best choice.
+
+In the end, I don't believe my back-propogation/weight update functionality works fully, as my network stops reducing error after only one iteration for the character recognition example. Nonetheless, I learned a lot about neural networks through this project.
diff --git a/Project2-Character-Recognition/character_recognition/CMakeLists.txt b/Project2-Character-Recognition/character_recognition/CMakeLists.txt
@@ -7,5 +7,5 @@ set(SOURCE_FILES
 
 cuda_add_library(character_recognition
     ${SOURCE_FILES}
-    OPTIONS -arch=sm_20
+    OPTIONS -arch=sm_75
     )
diff --git a/Project2-Character-Recognition/character_recognition/common.cu b/Project2-Character-Recognition/character_recognition/common.cu
@@ -1,15 +1,15 @@
-#include "common.h"
-
-void checkCUDAErrorFn(const char *msg, const char *file, int line) {
-    cudaError_t err = cudaGetLastError();
-    if (cudaSuccess == err) {
-        return;
-    }
-
-    fprintf(stderr, "CUDA error");
-    if (file) {
-        fprintf(stderr, " (%s:%d)", file, line);
-    }
-    fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err));
-    exit(EXIT_FAILURE);
-}
+#include "common.h"
+
+void checkCUDAErrorFn(const char *msg, const char *file, int line) {
+    cudaError_t err = cudaGetLastError();
+    if (cudaSuccess == err) {
+        return;
+    }
+
+    fprintf(stderr, "CUDA error");
+    if (file) {
+        fprintf(stderr, " (%s:%d)", file, line);
+    }
+    fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err));
+    exit(EXIT_FAILURE);
+}
diff --git a/Project2-Character-Recognition/character_recognition/common.h b/Project2-Character-Recognition/character_recognition/common.h
@@ -1,126 +1,128 @@
-#pragma once
-
-#include <cuda.h>
-#include <cuda_runtime.h>
-
-#include <cstdio>
-#include <cstring>
-#include <cmath>
-#include <algorithm>
-#include <chrono>
-#include <stdexcept>
-
-#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
-#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
-
-/**
- * Check for CUDA errors; print and exit if there was a problem.
- */
-void checkCUDAErrorFn(const char *msg, const char *file = NULL, int line = -1);
-
-inline int ilog2(int x) {
-    int lg = 0;
-    while (x >>= 1) {
-        ++lg;
-    }
-    return lg;
-}
-
-inline int ilog2ceil(int x) {
-    return x == 1 ? 0 : ilog2(x - 1) + 1;
-}
-
-
-namespace Common {
-    /**
-    * This class is used for timing the performance
-    * Uncopyable and unmovable
-    *
-    * Adapted from WindyDarian(https://github.com/WindyDarian)
-    */
-    class PerformanceTimer
-    {
-    public:
-	    PerformanceTimer()
-	    {
-		    cudaEventCreate(&event_start);
-		    cudaEventCreate(&event_end);
-	    }
-
-	    ~PerformanceTimer()
-	    {
-		    cudaEventDestroy(event_start);
-		    cudaEventDestroy(event_end);
-	    }
-
-	    void startCpuTimer()
-	    {
-		    if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); }
-		    cpu_timer_started = true;
-
-		    time_start_cpu = std::chrono::high_resolution_clock::now();
-	    }
-
-	    void endCpuTimer()
-	    {
-		    time_end_cpu = std::chrono::high_resolution_clock::now();
-
-		    if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); }
-
-		    std::chrono::duration<double, std::milli> duro = time_end_cpu - time_start_cpu;
-		    prev_elapsed_time_cpu_milliseconds =
-			    static_cast<decltype(prev_elapsed_time_cpu_milliseconds)>(duro.count());
-
-		    cpu_timer_started = false;
-	    }
-
-	    void startGpuTimer()
-	    {
-		    if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); }
-		    gpu_timer_started = true;
-
-		    cudaEventRecord(event_start);
-	    }
-
-	    void endGpuTimer()
-	    {
-		    cudaEventRecord(event_end);
-		    cudaEventSynchronize(event_end);
-
-		    if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); }
-
-		    cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end);
-		    gpu_timer_started = false;
-	    }
-
-	    float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015
-	    {
-		    return prev_elapsed_time_cpu_milliseconds;
-	    }
-
-	    float getGpuElapsedTimeForPreviousOperation() //noexcept
-	    {
-		    return prev_elapsed_time_gpu_milliseconds;
-	    }
-
-	    // remove copy and move functions
-	    PerformanceTimer(const PerformanceTimer&) = delete;
-	    PerformanceTimer(PerformanceTimer&&) = delete;
-	    PerformanceTimer& operator=(const PerformanceTimer&) = delete;
-	    PerformanceTimer& operator=(PerformanceTimer&&) = delete;
-
-    private:
-	    cudaEvent_t event_start = nullptr;
-	    cudaEvent_t event_end = nullptr;
-
-	    using time_point_t = std::chrono::high_resolution_clock::time_point;
-	    time_point_t time_start_cpu;
-	    time_point_t time_end_cpu;
-
-	    bool cpu_timer_started = false;
-	    bool gpu_timer_started = false;
-
-	    float prev_elapsed_time_cpu_milliseconds = 0.f;
-	    float prev_elapsed_time_gpu_milliseconds = 0.f;
-    };
-}
+#pragma once
+
+#include <cuda.h>
+#include <cuda_runtime.h>
+
+#include <cstdio>
+#include <cstring>
+#include <cmath>
+#include <algorithm>
+#include <chrono>
+#include <stdexcept>
+
+#define blockSize 256
+
+#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
+#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
+
+/**
+ * Check for CUDA errors; print and exit if there was a problem.
+ */
+void checkCUDAErrorFn(const char *msg, const char *file = NULL, int line = -1);
+
+inline int ilog2(int x) {
+    int lg = 0;
+    while (x >>= 1) {
+        ++lg;
+    }
+    return lg;
+}
+
+inline int ilog2ceil(int x) {
+    return x == 1 ? 0 : ilog2(x - 1) + 1;
+}
+
+
+namespace Common {
+    /**
+    * This class is used for timing the performance
+    * Uncopyable and unmovable
+    *
+    * Adapted from WindyDarian(https://github.com/WindyDarian)
+    */
+    class PerformanceTimer
+    {
+    public:
+	    PerformanceTimer()
+	    {
+		    cudaEventCreate(&event_start);
+		    cudaEventCreate(&event_end);
+	    }
+
+	    ~PerformanceTimer()
+	    {
+		    cudaEventDestroy(event_start);
+		    cudaEventDestroy(event_end);
+	    }
+
+	    void startCpuTimer()
+	    {
+		    if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); }
+		    cpu_timer_started = true;
+
+		    time_start_cpu = std::chrono::high_resolution_clock::now();
+	    }
+
+	    void endCpuTimer()
+	    {
+		    time_end_cpu = std::chrono::high_resolution_clock::now();
+
+		    if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); }
+
+		    std::chrono::duration<double, std::milli> duro = time_end_cpu - time_start_cpu;
+		    prev_elapsed_time_cpu_milliseconds =
+			    static_cast<decltype(prev_elapsed_time_cpu_milliseconds)>(duro.count());
+
+		    cpu_timer_started = false;
+	    }
+
+	    void startGpuTimer()
+	    {
+		    if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); }
+		    gpu_timer_started = true;
+
+		    cudaEventRecord(event_start);
+	    }
+
+	    void endGpuTimer()
+	    {
+		    cudaEventRecord(event_end);
+		    cudaEventSynchronize(event_end);
+
+		    if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); }
+
+		    cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end);
+		    gpu_timer_started = false;
+	    }
+
+	    float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015
+	    {
+		    return prev_elapsed_time_cpu_milliseconds;
+	    }
+
+	    float getGpuElapsedTimeForPreviousOperation() //noexcept
+	    {
+		    return prev_elapsed_time_gpu_milliseconds;
+	    }
+
+	    // remove copy and move functions
+	    PerformanceTimer(const PerformanceTimer&) = delete;
+	    PerformanceTimer(PerformanceTimer&&) = delete;
+	    PerformanceTimer& operator=(const PerformanceTimer&) = delete;
+	    PerformanceTimer& operator=(PerformanceTimer&&) = delete;
+
+    private:
+	    cudaEvent_t event_start = nullptr;
+	    cudaEvent_t event_end = nullptr;
+
+	    using time_point_t = std::chrono::high_resolution_clock::time_point;
+	    time_point_t time_start_cpu;
+	    time_point_t time_end_cpu;
+
+	    bool cpu_timer_started = false;
+	    bool gpu_timer_started = false;
+
+	    float prev_elapsed_time_cpu_milliseconds = 0.f;
+	    float prev_elapsed_time_gpu_milliseconds = 0.f;
+    };
+}