Geometrical Deep Learning on 3D Models:
Classification for Additive Manufacturing

TUM Data Innovation Lab in cooperation with VW Data:Lab | Summer 2021

Table of Content

1. Repositroy Structure
2. Problem understanding
3. Data understanding
4. Requirements
5. LRZ AI System
6. Model
7. Results
8. Evaluation
Glossary
Contributors

1. Repository Structure

data: Contains some sample data
dev_notebooks: Contains all Jupyter notebook files which are used for rapid prototyping
doc: Contains diverse documentation files
infra: Contains all files which are used in order to create the environment for training deep learning models on the LRZ AI System
src: Contains all python source files of the project
tests: Contains all test files in order to perform unit testing

2. Problem understanding

2.1 Background

The overall goal of this project is about developing an algorithm using deep learning, which is capable of predicting whether a 3D-model is printable or not. In this context, printable refers to the context of additive manufacturing.

2.2 Target

Goal Definition
Goal Measurment

2.3 Boundaries

How is the algorithm going to be consumed in the end?
Technical requirements (Expected Results, Cosntraints)?

3. Data understanding

3.1 Data Source

In a first step, public available datasets (i.e. open source data) are used. To be specific, the following to ressources are considered as the main source:

3.2 Data Model

Describe the ETL (Extract, Transform, Load)

3.3 Data Description/Exploration

Describe result of the Data Exploration
- What kind of data?
- Which range?
- Sample Data

3.4 Data Cleaning/Selection

How is the data cleaned?
Is all data used in the end?

3.5 Data Quality/Verification

Does the Data match the requirements (from problem description)?

4. Requirements

4.1 Data

Thingi10K
ABC Dataset

4.2 Infrastructure

LRZ AI System
Docker
Enroot

For elaborate information about the infrastruture and how to use it, please consult section about LRZ AI System .

4.3 Libraries

PyTorch
Numpy
Matplotlib

5. LRZ AI System

5.1 General User Instructions

Request only the number of GPUs you are using (i.e. do not allocate more than 1 GPU if you are not performing parallel training using horovod)
If your training has not mix precision enabled, please select only these resources with P100 cards
The time limit is restricted to max. 24 hours, so if the job is running longer that the available time slot, save checkpoints frequently

5.2 Login

First, make sure that you've established a VPN tunnel to the LRZ. This can be done using any VPN client (e.g. Cisco AnyConnect) and providing the domain https://asa-cluster.lrz.de/ once you're asked for that. Afterwards, use your TUM credentials to login.

Once the VPN connection is established successfully, use SSH in order to login to the AI-System.

$ ssh [email protected]

5.3 Available GPU Instances

A overview of the available GPU instances within the LRZ AI System is given below:

One of the most important information provided by the table above is the "SLURM Partition" as this information is needed in order to submit the job to the desired machine.

5.4 SLURM Commands

This section is about providing some useful commands in order to use SLURM on the AI System.

Show all available queues
$ sinfo 

Show reservation of queus (if there is any)
$ squeue

Create allocation of ressources
$ salloc 
Example:
$ salloc -p dgx-1-p100 --ntasks=8 --gres=gpu:8

Cancel allocation of ressources
$ scancel <JobID>

Run job
$ srun
Example:
$ srun --pty enroot start --mount ./data:/mnt/data ubuntu.sqush bash

Submit batch job into SLURM pipeline
$ sbatch
Example:
$ sbatch script.sbatch

5.5 Containerized Software Stack

The LRZ AI System is a container based solution. This means, in order to make use of any GPU instance, we have to first deploy a container on the AI System from which a SLURM job can be submitted.

To this end, enroot container runtime is used, in order to deploy a conatainer on the LRZ AI System, as users are not granted root access to the file system. In contrast to Docker, using enroot means we are able to start a rootless container, wherein we have root permissions, but not outside of it.

The following steps show how to deploy a container on the LRZ AI System:

Import a container from a repository (e.g. Docker Hub)

$ enroot import docker://ubuntu

The result of this step is an image which is called "ubuntu.sqsh".

Start an enroot container using the recently created image

$ enroot start ubuntu.sqsh

5.6 Data Storage Limitations

The current data storage within the LRZ AI System is resticted to the following properties:

Disk quota: 150 GB
File quota: 200.000

If this needs to be enlarged, a service request has to be created at the LRZ service desk.

5.7 Submitting Jobs into SLURM

Generally, there are two different ways in submitting jobs to the SLURM pipeine:

Interactive Jobs (i.e. interact with the terminal)
Batch Jobs (i.e. no interaction with the terminal)

Interactive Jobs Using this option enables to get a console output of the job which is submitted to the SLURM pipeline. For example, if we submit a job which is about training a Neural Network, we would see the training progress in the CLI. This is basically done by fist allocating a ressource followed by submitting a containerized job (for help, see SLURM commands section above).

Batch Jobs Using this option does not provide a console output. However, within the .sbatch script it can be declared where to write the output and errors. A sample script is provided below:

#!/bin/bash
#SBATCH -N 1
#SBATCH -p dgx
#SBATCH --gres=gpu:8
#SBATCH --ntasks=8
#SBATCH -o enroot_test.out
#SBATCH -e enroot_test.err

srun --container-mounts=./data-test:/mnt/data-test --container-image='horovod/horovod+0.16.4-tf1.12.0-torch1.1.0-mxnet1.4.1-py3.5' \
python script.py --epochs 55 --batch-size 512

5.8 MLflow UI on local Browser

Once the deep learning container has been started on the LRZ AI system, you'll get a similar output to the one which is presented right below (please be patient, this might take some seconds):

Cloning into 'TUM-DI-LAB'...
remote: Enumerating objects: 3151, done.
remote: Counting objects: 100% (233/233), done.
remote: Compressing objects: 100% (172/172), done.
remote: Total 3151 (delta 122), reused 135 (delta 61), pack-reused 2918
Receiving objects: 100% (3151/3151), 108.13 MiB | 6.50 MiB/s, done.
Resolving deltas: 100% (2100/2100), done.
Updating files: 100% (145/145), done.
Branch 'dev' set up to track remote branch 'dev' from 'origin'.
Switched to a new branch 'dev'
entrypoint.sh: line 9: cd: /workspace/mount_dir/: No such file or directory
root@9fdaa7f9b957:/workspace# [2021-06-23 09:06:13 +0000] [339] [INFO] Starting gunicorn 20.1.0
[2021-06-23 09:06:13 +0000] [339] [INFO] Listening at: http://0.0.0.0:5000 (339)
[2021-06-23 09:06:13 +0000] [339] [INFO] Using worker: sync
[2021-06-23 09:06:13 +0000] [341] [INFO] Booting worker with pid: 341

Things which are happening here are:

The latest version of the TUM-DI-LAB github repository gets cloned into the container
The working branch is changed to dev
The MLflow UI gets started as a background process (please do not kill this job, otherwise UI is not accessible anymore)

After you are facing a similar output as shown above, please hit ENTER, in ordert to get to the well known bash CLI.

From there on, please follow the corresponding steps in order to open the MLflow UI on your local web browser:

Get to know the ip-address of the machine you are working on. Therefore execute the follwoing command:

$ ifconfig

Please look for the ip-adress which starts with 10.XXX.XXX.XXX

Start a web browser on your local machine and open the socket comprised of the ip-adress and the port 5000. A respective example can be found below:

http://10.195.15.242:5000/

Glossary

Define most important terms and expressions.

Term	Definition
NN	Neural Network
CLI	Command Line Interface

Contributors

Name	E-Mail
Bouziane, Nouhayla	[email protected]
Ebid, Ahmed	[email protected]
Srinivas, Aditya Sai	[email protected]
Bok, Felix	[email protected]
Kiechle, Johannes	[email protected]
Lux, Kerstin (Supervisor TUM)	[email protected]
Cesur, Guelce (Supervisor VW)	[email protected]
Kleshchonok, Andrii (Supervisor VW)	[email protected]
Danielz, Marcus (Supervisor VW)	[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geometrical Deep Learning on 3D Models:
Classification for Additive Manufacturing

TUM Data Innovation Lab in cooperation with VW Data:Lab | Summer 2021

Table of Content

1. Repository Structure

2. Problem understanding

2.1 Background

2.2 Target

2.3 Boundaries

3. Data understanding

3.1 Data Source

3.2 Data Model

3.3 Data Description/Exploration

3.4 Data Cleaning/Selection

3.5 Data Quality/Verification

4. Requirements

4.1 Data

4.2 Infrastructure

4.3 Libraries

5. LRZ AI System

5.1 General User Instructions

5.2 Login

5.3 Available GPU Instances

5.4 SLURM Commands

5.5 Containerized Software Stack

5.6 Data Storage Limitations

5.7 Submitting Jobs into SLURM

5.8 MLflow UI on local Browser

Glossary

Contributors

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 763 Commits
data		data
dev_notebooks		dev_notebooks
doc		doc
infra		infra
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md

hannesk95/ClassificationForAdditiveManufacturing

Folders and files

Latest commit

History

Repository files navigation

Geometrical Deep Learning on 3D Models: Classification for Additive Manufacturing

TUM Data Innovation Lab in cooperation with VW Data:Lab | Summer 2021

Table of Content

1. Repository Structure

2. Problem understanding

2.1 Background

2.2 Target

2.3 Boundaries

3. Data understanding

3.1 Data Source

3.2 Data Model

3.3 Data Description/Exploration

3.4 Data Cleaning/Selection

3.5 Data Quality/Verification

4. Requirements

4.1 Data

4.2 Infrastructure

4.3 Libraries

5. LRZ AI System

5.1 General User Instructions

5.2 Login

5.3 Available GPU Instances

5.4 SLURM Commands

5.5 Containerized Software Stack

5.6 Data Storage Limitations

5.7 Submitting Jobs into SLURM

5.8 MLflow UI on local Browser

Glossary

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Geometrical Deep Learning on 3D Models:
Classification for Additive Manufacturing

Packages