-
Notifications
You must be signed in to change notification settings - Fork 2
Facial Feature Description
Here, we will briefly list the facial features used, their computation, their ranges, and how to interpret them.
-
Definition: EAR (eye-aspect-ratio) is a measure used in facial technology to assess the openness or closure of the eyes in an image or video.
-
Calculation: It is calculated by comparing the ratio of the distance between specific landmarks on the eye (six in our case; see image below), typically the inner corner, outer corner, and the highest point of the eyelid, to the distance between the same landmarks on a closed eye.
-
Purpose: EAR is often employed in various applications such as drowsiness detection, fatigue levels monitoring, alertness assessment, or medical applications regarding eye health.
-
Range: A higher EAR value indicates open eyes, while a lower value suggests closed or partially closed eyes. Most of the values in the range of [0,1] are possible.
-
Limitations: While EAR can be a useful metric, especially due to its distance invariance, factors like rotation can affect its measurements.
-
Overview: We use MediaPipe to predict 478 facial landmarks in videos
-
Landmarks: The facial landmarks consist of 478 defined points. We use six per eye to describe the EAR score. Therefore, we provide the EAR2D6 feature, which leverages only 2D information, and EAR3D6, which also utilizes the monocular depth estimation capabilities.
-
Accuracy and Fairness: The face model was trained on roughly 1700 samples and has high accuracy during Geographical Evaluation. Therefore,
JeFaPaTo
should perform well as well in the same geographical regions. We would like to refer to the following quote from the link model card:Observed discrepancy across different genders and skin tones is less than one defined in our fairness criteria. We therefore consider the model performing well across groups.
-
Monocular Depth: MediaPipe also enables monocular facial depth estimation. We offer the
EAR3D6
feature, which performs similarly to theEAR2D6
feature.
-
Overview: MediaPipe additionally offers blendshapes for facial expression analysis in images and videos.
-
Blendshapes: They try to estimate from a full facial movement by subcomponents that individual small facial movements can represent. MediaPipe models them after the Facial Action Coding System by Ekman, and the blendshapes correlate to specific Action Units. Therefore, a muscle connection might exist.
-
Accuracy and Fairness: We refer to the model card for detailed information. However, the model was trained on 511 individuals and scored well on the fairness scale. We would like to quote the following statement from the model card:
Observed discrepancy across different genders and skin tones is less than one defined in our fairness criteria. We therefore consider the model to be performing well across groups.
-
Supported Blendshapes: 1 - browDownLeft; 2 - browDownRight; 3 - browInnerUp; 4 - browOuterUpLeft; 5 - browOuterUpRight; 6 - cheekPuff; 7 - cheekSquintLeft; 8 - cheekSquintRight; 9 - eyeBlinkLeft; 10 - eyeBlinkRight; 11 - eyeLookDownLeft; 12 - eyeLookDownRight; 13 - eyeLookInLeft; 14 - eyeLookInRight; 15 - eyeLookOutLeft; 16 - eyeLookOutRight; 17 - eyeLookUpLeft; 18 - eyeLookUpRight; 19 - eyeSquintLeft 20 - eyeSquintRight; 21 - eyeWideLeft; 22 - eyeWideRight; 23 - jawForward; 24 - jawLeft; 25 - jawOpen; 26 - jawRight; 27 - mouthClose; 28 - mouthDimpleLe; 29 - mouthDimpleRight; 30 - mouthFrownLeft; 31 - mouthFrownRight; 32 - mouthFunnel; 33 - mouthLeft; 34 - mouthLowerDownLeft; 35 - mouthLowerDownRight; 36 - mouthPressLeft; 37 - mouthPressRight; 38 - mouthPucker; 39 - mouthRight; 40 - mouthRollLower; 41 - mouthRollUpper; 42 - mouthShrugLower; 43 - mouthShrugUpper; 44 - mouthSmileLeft; 45 - mouthSmileRight; 46 - mouthStretchLeft; 47 - mouthStretchRight; 48 - mouthUpperUpLeft; 49 - mouthUpperUpRight; 50 - noseSneerLeft; 51 - noseSneerRight; 52 - tongueOut;