Skip to content

Computer Vision Models and Further Methods

Tim Büchner edited this page Apr 18, 2024 · 2 revisions

As an initial note, we build JeFaPaTo only on existing computer vision models. No additional fine-tuning or learning has been done to alter these models. Therefore, their performance is as good as the original author could achieve by replicating the required preprocessing steps.

HaarCascade Face Detection

For the simple face detection in the GUI version of JeFaPaTo, we rely on Cascade Classifiers. This allows for the fast possible detection of faces.

In the Facial Feature Extraction, the cascade classifier serves only as default initialization if the user uses the bounding box. However, the bounding box can be updated and changed freely afterward. Therefore, the user takes care of this.

For the Eye Blink Extraction, we use the cascade classifier to find the face of the user who a) provided a video and b) clicked a matched blink in the table. Then, JeFaPaTo loads the corresponding frame in the background, and we search for the face using this cascade classifier. As this is only a quality-of-life feature, no important decisions are made.

Mediapipe Landmarks and Blendshapes

For the landmark detection, we use the mediapipe framework by Google. The model consists of the

Therefore, if the bounding box in the initial GUI selection is chosen bigger, the additional face detector helps to still find the face inside the frame.

We initially tested several existing 2D landmarking methods, but only mediapipe consisted of different test probands and test patients.

Find Peaks and Blink Intervals

To extract the blinks, we interpret the EAR score time series as an inverse peak finding problem. This allows us to leverage to extract several features like: height, left turning point, or right turning point. (please see the official documentation for more information). These then build the basis for the extraction of the internal width.

As the peak finding algorithm cannot detect the onset and offset reliably, many computations are impossible. Therefore, we developed a parameter-free method, which is still under review to be published. This method would reduce the possible wrong prediction of peaks and would, at the same time, reduce the need to fine-tune many extraction parameters by hand. We plan to include this blink extractor in an upcoming version of JeFaPaTo.

Otsu's Thresholding

To automatically label the blinks by partial or closed state, we use Otsu's algorithm to estimate the thresholding parameter in a data-driven manner. Therefore, compute the histogram among all blinks and use the prominence (e.g., the closeness state regarding the default open state). The algorithm then tries to model two Gaussian distributions inside the prominence histogram, with the threshold as boundary area between them. Hence, one distribution models the partial blinks and the other the complete blinks.