Skip to content

Latest commit

 

History

History
268 lines (221 loc) · 13.2 KB

changelog.md

File metadata and controls

268 lines (221 loc) · 13.2 KB

Change Log

In this work, an approach to detect human body joints with only little computation complexity will be researched with a special focus on figure ice skating. The resulting application should be able to run on mobile phones and detect body joints from just the camera image.

Therefore, the different implementations from OpenPose in C++, TensorFlow and PyTorch will be explored and inspected. These already include some experiments with different neuronal network architectures, however their keypoint detection amount varies as does their current work status or rather commit frequency. Furthermore, a new paper suggests ‘spatio temporal affinity fields’, which will be analyzed in this project.

Different experiments and evaluations of the already existing research projects on keypoint detection will add to the final application. The training will be conducted on the provided DeepLearn server from the Stuttgart Media University. Therefore, the data from PoseTrack and the Coco dataset will be used.

Since this research focuses on the detection of keypoints in figure ice skating, further considerations must be taken in mind: people in the background should not be detected, but only the person in focus for later analysis. The question there is, how to get the corespondent training data.


2018-04-01 [Status Report]: Improve Custom Training Loop, Evaluation, Loss Function

Learned Mask:

Loss Calculation

Learned Classes

Loss Calculation

  • experiment with different net structures

    • try to train faster/ less memory
      • strided-down/-up convolution architecture: result converges worse
      • rescale in dataset generator: works far better (graphs will be included later)
    • u-net with mobilenetv2 as base (not able to learn body parts)
    • u-net as base for HPNet (converges worse than just HPNet and is slower)
    • improve HPNet with filter amount and layer amount according to evaluation analysis of layers
      • combine add and concat of layers
      • always add the initial Input layer to concat/ add
      • performs best HPNet Model Architecture
  • create new custom loss function for Class Imbalance

    • create weighed map according to a calculated graph with the relative body part distances
  • reduce classes to 9

  • add automatic dataset download and processing including saving in numpy compressed files

  • Loss Calculation

    • the new loss increased training efficiency a lot (exact results will be included later)
  • improve evaluation

    • add functionality to test different net architectures
    • add simple save functionality
    • show predicted images additionally to feature maps of layers
    • code refactoring & improvement
    • evaluation of best performing net HPNet with Adam optimizer after epoch 1595 v7_1595_adam_1595_mask
  • evaluate different optimizers, learning rates & learning rate adjustments

Accuracy of different used optimizers

red: adam; learning rate: 0.01
orange: nadam; learning rate: 0.01
blue: sgd learning rate: 0.1
green: sgd with custom decay (increase on plateau; decrease with variable decay)
light blue: sgd with custom decay of 0.001

Accuracy: accuracy_all Relation of correct body part pixel to all body part pixel accuracy_body_part_all Comparison of loss loss Learning Rates of SGD and SGD with custom loss lr Comparison of SGD and custom decay/ learning rates loss

2020-03-01 [Status Report]: Create Custom Training Loop (first step)

  • Solve problems with Tensorflow 2:
  • create Python Generator function for loading the image data and not running out of memory
  • cleanup code to use classes and type information
  • create custom training loop
  • integrate tensorboard logging information
  • create HRNetwork with 3 building blocks for the body segmentation task: Model Architecture

Analysis initial test training

The Prediction after the 64th epoch shows compared to the prediction, that the network learns to categorize the pixel data better to the different classes after some epochs. At the beginning the prediction shows a variegated picture with multiple class predictions, with preceding epochs this prediction becomes clearer and less scattered.

However, the prediction at the end seems to target the torso as main class, probably because it's pixels appears most often in the picture. Therefore, the hyperparameters and loss functions must be further refined for getting decent training results.

This behavior is reflected by the loss and accuracy graphs. The accuracy graph starts to flatten after the 35th episode and reassembles a logarithmic graph which converges to 0.94. It rises rather quickly however, the result is not the desired body segmentation. Here accuracy is probably the wrong metric. (Experiments will follow)

The loss graph starts to flatten already after the 30th episode and does not improve much until the last episode. The difference is only about 0.03. Here as well experiments with the hyperparameters and loss function must be conducted.

Training configuration:

batch_size = 3
prefetch_batch_buffer = 1
epoch_steps = 32
epoch_log_n = epoch_steps // 2
epochs = 64

optimizer = tf.keras.optimizers.SGD(learning_rate=1e-3)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

Loss Plot

HPNet Model Architecture

Accuracy Plot

HPNet Model Architecture

Predition after 2nd epoch

HPNet Model Architecture

Predition after last epoch

HPNet Model Architecture

Custom HPNet Model

HPNet Model Architecture

2020-02-14 [Status Report]: Preprocess 3DHuman Dataset

  • request access to dataset
  • for initial testing purposes:
    • download first woman batch (including different female characters and 70 actions)
      • segmentation_body: as label for the body segmentation task
      • segmentation_clothes : label for avatar clothes
      • rgb : as input image data
  • preprocessing:
    • cleanup segmentation body include clear labels
    • store frames beloning to one action in one compressed numpy .npz file to save memory and speedup trainig
    • remove background from rgb data by using segmentation_clothes (idea: level 1 training)

Body segmentation class labels

class name pixel color
0 bg [153,153,153]
1 Head [128,64,0]
2 RUpArm [128, 0, 128]
3 RForeArm [128, 128, 255]
4 RHand [255, 128, 128]
5 LUpArm [0, 0, 255]
6 LForeArm [128, 128, 0]
7 LHand [0, 128, 0]
8 torso [128, 0, 0]
9 RThigh [128, 255, 128]
10 RLowLeg [255, 255, 128]
11 RFoot [255, 0, 255]
12 LThigh [0, 128, 128]
13 LLowLeg [0, 0, 128]
14 LFoot [255, 128, 0]

2020-02-07 [Status Report]: Refine Ressearch

  • discuss paper with Johannes Theodoridis
  • Result:
    • combine HPNet pose recognition instead of PAF from STAF with Keypoint Recognition
    • implement step-wise

2020-02-07 [Status Report]: XSens Motion Capture Record

  • plan new Motion Record in ice rink
  • Problems:
    • skater injured
    • substitute skater is in competition preparation and has not enough time for decent recording
  • Result:
    • Recording must be repeated
    • new scheduled recording on march 6th
  • find substitute dataset to work with:
    • 3DPEOPLE DATASET
    • similar approach with blender and Motion capturing
    • will be used for creating the initial training

2020-01-12 [Status Report]: XSens Motion Capture Dataset

  • record mocap data with XSens in the ice rink
  • research and experiment wit XSens data
    • create avatar and integrate Makehuman plugin into blender
    • create simple 3d test-scene and experiment with mocap data settings
    • write script to get 2d locations of keypoints equal to the Openpose dataset
    • analyze keypoints in rendered video

2019-12-03 [Status Report]: Analyze Figure Ice Skating Dataset

  • find figure ice skating videos for labeling:
    • 332 MB of Video Data
    • filmed from different mobile devices and a GOPRO 5 Hero Black
    • videos of different length, perspectives, moving camera, different ice rinks and lightning conditions
  • set-up Docker container on hal9k MI server: create Docker image
  • create bash script to traverse all video files and label by OpenPose
  • inspect labeled video quality
    • write script to detect on how many frames no person was detected and analyze these frames
      • create video data without these frames
        • analysis: on some videos on every frame a person was detected. No pattern for device type identifiable
        • videos still play fluently
      • detect the person in focus by taking the person with the highest score
    • check quality of videos by iterating over all videos, find subsequent frames with a minimum score
      • → This tests two conditions
        • whether from the subsequent frames, the keypoints with the maximum scores applied to these frames, would enhance the labels
        • if there can't be found subsequent frames with the specified score, the videos are marked as 'not usable labels'
      • manually analyze the images of the found subsequent frames
        • Result: sometimes very bad labeled frames, however they receive a relatively high score

2019-11-22 [Status Report]: Get Ready For Machine Learning

  • Work over the book: Deep Learning for Computer Vision By Rajalingappa Shanmugamani
    • book repeats several basics in a rough overview
    • many practical examples from different ML problems
  • Practical pre-experiments
    • implement simple UNet with a pre-trained VGG-16 encoder
    • include tensorboard statistics for later evaluations

2019-11-17 [Status Report]: Planning

  • create initial project plan
  • write project proposal
  • Difficulties in planning:
    • problems to setup OpenPose (mainly CMAKE)
      • only works with a reduced network on a usual laptop
      • not mobile ready
  • Change focus of investigation (included in project plan)
    • investigations of the tensorflow and Pytorch github repositories resulted in the conclusion, to put the main focus on keypoint recognition with focus on performance, than the analysis of the curves
    • the Tensorflow and Pytorch repositories rely on the old initial architecture of PAF (no active commits to the repositories)
      • make the training run, needed some tweaking in the code
      • the tensorflow implementation experiments with different pre-trained nets (e.g. mobile-net versions)
        • labeling works well on a usual laptop

2019-11-06 [Kickoff-Meeting]: Initial Research