Skip to content

Commit 719b1fa

Browse files
authored
Initial commit
0 parents  commit 719b1fa

File tree

15 files changed

+289
-0
lines changed

15 files changed

+289
-0
lines changed

.figures/mnist.png

26.1 KB
Loading

.github/setup.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
sudo apt install cmake
3+
pip install torch --index-url https://download.pytorch.org/whl/cpu
4+
pip install numpy
5+
cd build_cpu
6+
cmake -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` .
7+
cmake --build . --config Release
8+
./test-net
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
name: build-and-run-tests
2+
3+
on: [push]
4+
5+
jobs:
6+
build-and-run-tests:
7+
runs-on: ubuntu-latest
8+
9+
steps:
10+
- name: get_packages
11+
run: sudo apt update && sudo apt install cmake
12+
13+
- name: install_pytorch
14+
run: pip3 install torch --index-url https://download.pytorch.org/whl/cpu
15+
16+
- name: install_numpy
17+
run: pip install numpy
18+
19+
- name: get_repo
20+
uses: actions/checkout@v3
21+
with:
22+
path: main
23+
24+
- name: configure_and_build
25+
shell: bash
26+
working-directory: ${{github.workspace}}/main/build_cpu
27+
run: |
28+
cmake -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` .
29+
cmake --build . --config Release
30+
31+
- name: runtest
32+
shell: bash
33+
working-directory: ${{github.workspace}}/main/build_cpu
34+
run: ./test-net

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.vscode/
2+
dependencies/
3+
build/

CMakeLists.txt

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
2+
3+
project(train-net LANGUAGES CXX CUDA)
4+
cmake_policy(SET CMP0004 OLD)
5+
find_package(Torch REQUIRED)
6+
7+
# Enable CUDA language support
8+
find_package(CUDAToolkit REQUIRED)
9+
set(CUDA_SEPARABLE_COMPILATION ON)
10+
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} ${CUDAToolkit_CXX_FLAGS} -pthread")
11+
12+
add_executable(train-net source/train_net.cpp)
13+
target_link_libraries(train-net "${CUDAToolkit_libraries} ${TORCH_LIBRARIES}")
14+
set_property(TARGET train-net PROPERTY CXX_STANDARD 17)
15+
16+
add_executable(test-net tests/test_net.cpp)
17+
target_link_libraries(test-net "${CUDAToolkit_libraries} ${TORCH_LIBRARIES}")
18+
set_property(TARGET test-net PROPERTY CXX_STANDARD 17)

README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
## HPC exercise training a neural network on the MNIST data-set.
2+
- The exercise explores training a neural network using [the torch c++ api](https://pytorch.org/cppdocs/).
3+
4+
![where_is_waldo](.figures/mnist.png)
5+
6+
You will learn how to train a network to recognize handwritten digits. To do so we will use the mnist data-set.
7+
The image above shows example images. The exercise assumes you are working on the systems at the Juelich Supercomputing Centre.
8+
To solve this exercise look through the files in the `source` folder. `TODO`s mark parts of the code that require your attention.
9+
Come back to this readme for additional hints.
10+
11+
- To get started on the JUWELS Booster load the modules
12+
``` bash
13+
Stages/2023 GCC/11.3.0 OpenMPI/4.1.4 CUDA/11.7 CMake PyTorch
14+
```
15+
16+
- Use `mkdir build` to create your build directory. Change directory into your build folder and compile by running:
17+
```bash
18+
cmake -DCUDA_CUDA_LIB=/usr/lib64/libcuda.so -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` ..
19+
cmake --build . --config Release
20+
```
21+
22+
- Navigate to `source/net.h` implement the constructor for the `Net` struct.
23+
The `Net` should implement a fully connected network
24+
25+
$$
26+
y = \ln(\sigma (W_3f_r(W_2 f_r(W_1 x + b_1) + b_2) + b_3))
27+
$$
28+
29+
with $W_1 \in \mathbb{R}^{h_1, n}, W_2 \in \mathbb{R}^{h_2, h_1}, W_3 \in \mathbb{R}^{m, h_2}$
30+
and $b_1 \in \mathbb{R}^{h_1}, b_2 \in \mathbb{R}^{h_2}, b_3 \in \mathbb{R}^{m}$, where
31+
$n$ denotes the input dimension $h_1$ the number of hidden neurons in the first layer $h_2$ the number of neurons in the second layer, and $m$ the number of output neurons.
32+
Finally $\sigma$ denotes the [softmax function](https://en.wikipedia.org/wiki/Softmax_function) and $\ln$ the natural logarithm.
33+
Use `register_module` to add `Linear` layers to the network. Linear layers that implement $Wx +b$ are provided by `torch::nn:Linear`.
34+
Move on to implement the forward pass. Follow the equation above, use `torch::relu` and
35+
`torch::log_softmax`. What happens if you choose `torch::sigmoid` instead of the ReLU?
36+
37+
- Before training your network network implement the `acc` function in `source/train_net.cpp`. It should find the ratio of
38+
correctly identified digits, by comparing the `argmax` of the network output and the annotations.
39+
40+
- Torch devices are defined i.e. by `torch::Device device = torch::kCPU;` move to GPUs by choosing `torch::kCUDA;` if cuda-GPUs are available.
41+
42+
- Finally iterate over the test data set and compute the test accuracy.
43+
44+
- Train and test your network by executing:
45+
```bash
46+
./train_net
47+
```
48+
49+
- When your network has converged, you should measure more than 90% accuracy.
50+

build_cpu/CMakeLists.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
cmake_minimum_required(VERSION 3.18 FATAL_ERROR)
2+
3+
project(train-net LANGUAGES CXX)
4+
cmake_policy(SET CMP0004 OLD)
5+
find_package(Torch REQUIRED)
6+
7+
add_executable(train-net ../source/train_net.cpp)
8+
target_link_libraries(train-net "${TORCH_LIBRARIES}")
9+
set_property(TARGET train-net PROPERTY CXX_STANDARD 17)
10+
11+
add_executable(test-net ../tests/test_net.cpp)
12+
target_link_libraries(test-net "${TORCH_LIBRARIES}")
13+
set_property(TARGET test-net PROPERTY CXX_STANDARD 17)

data/t10k-images-idx3-ubyte

7.48 MB
Binary file not shown.

data/t10k-labels-idx1-ubyte

9.77 KB
Binary file not shown.

data/train-images-idx3-ubyte

44.9 MB
Binary file not shown.

0 commit comments

Comments
 (0)