Skip to content

Project2: Gangzheng Tong #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Project2-Character-Recognition/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -22,6 +22,8 @@ if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
endif()

include_directories(.)
link_directories(${CUDA_TOOLKIT_ROOT_DIR}/lib/x64)

add_subdirectory(character_recognition)

cuda_add_executable(${CMAKE_PROJECT_NAME}
@@ -30,6 +32,8 @@ cuda_add_executable(${CMAKE_PROJECT_NAME}
)

target_link_libraries(${CMAKE_PROJECT_NAME}
curand
cublas
character_recognition
${CORELIBS}
)
23 changes: 17 additions & 6 deletions Project2-Character-Recognition/README.md
Original file line number Diff line number Diff line change
@@ -3,12 +3,23 @@ CUDA Character Recognition

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 2**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Gangzheng Tong
* www.gtong.me
* Tested on: Windows 10, i7-8th Gen @ 2.2GHz 16GB, RTX 2070 8GB (Personal Laptop)

### (TODO: Your README)
![Screenshot](img/output.png)
![Screenshot](img/time_neurons.PNG)

Include analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
### Features Implemented
In this project I implemented the following features:
1. Loading data from files
2. Forward and backward propagation implemented on GPU
3. Wrap cuBLAS, thrust and my custom kernel into a Matrix struct and make it easy to use; could also be useful for future projects
4. Use C++ smart pointers to manage memory and avoid memory leak all at once
5. Test the time on different number of neurons

However, I'm not able to predict the character given training samples. The cost fluctuates between 0.2 and 0.3 and seems not dropping within 40 iterations. I did unit tests on every kernel and didn't find anything wrong. Maybe it's due to the limited number of training samples or inappropriate initial weights and bias.

### A Few Observations
1. GPU is capable of handling large throughput. With the increasing # of hidden neurons, the data becomes huge (10212 * 2048 floats for a weight matrix) but my RTX 2070 was able to complete on iteration under 2 seconds. That's 52 samples and a dozen of big matrix operations.
2. C-Style matrix is typically row-major, but CUDA matrix is colmn-major. I spent a lot of time debugging the matrix dot production being unware of this.
Original file line number Diff line number Diff line change
@@ -7,5 +7,5 @@ set(SOURCE_FILES

cuda_add_library(character_recognition
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_75
)
Loading