-
Notifications
You must be signed in to change notification settings - Fork 2
Computer Vision Models and Further Methods
As an initial note, we build JeFaPaTo
only on existing computer vision models.
No additional fine-tuning or learning has been done to alter these models.
Therefore, their performance is as good as the original author could achieve by replicating the required preprocessing steps.
For the simple face detection in the GUI version of JeFaPaTo,
we rely on Cascade Classifiers.
This allows for the fast possible detection of faces.
In the Facial Feature Extraction,
the cascade classifier serves only as default initialization if the user uses the bounding box.
However, the bounding box can be updated and changed freely afterward.
Therefore, the user takes care of this.
For the Eye Blink Extraction,
we use the cascade classifier to find the face of the user who a) provided a video and b) clicked a matched blink in the table.
Then, JeFaPaTo
loads the corresponding frame in the background, and we search for the face using this cascade classifier.
As this is only a quality-of-life feature, no important decisions are made.
For the landmark detection, we use the mediapipe framework by Google. The model consists of the
- a face detector BlazeFace,
- a face mesh predictor based on an MLP-Mixer,
- and a Blendshape model.
Therefore, if the bounding box in the initial GUI selection is chosen bigger, the additional face detector helps to still find the face inside the frame.
We initially tested several existing 2D landmarking methods, but only mediapipe
consisted of different test probands and test patients.
To extract the blinks, we interpret the EAR score
time series as an inverse peak finding problem.
This allows us to leverage to extract several features like: height,
left turning point,
or right turning point.
(please see the official documentation for more information).
These then build the basis for the extraction of the internal width.
As the peak finding algorithm cannot detect the onset and offset reliably, many computations are impossible.
Therefore, we developed a parameter-free method, which is still under review to be published.
This method would reduce the possible wrong prediction of peaks and would, at the same time, reduce the need to fine-tune many extraction parameters by hand.
We plan to include this blink extractor in an upcoming version of JeFaPaTo.
To automatically label the blinks by partial
or closed
state, we use Otsu's algorithm to estimate the thresholding parameter in a data-driven manner.
Therefore, compute the histogram among all blinks and use the prominence
(e.g., the closeness state regarding the default open state).
The algorithm then tries to model two Gaussian distributions inside the prominence histogram, with the threshold as boundary area between them.
Hence, one distribution models the partial
blinks and the other the complete
blinks.