Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add depth detection #2

Open
rjkilpatrick opened this issue Nov 13, 2021 · 1 comment
Open

feat: Add depth detection #2

rjkilpatrick opened this issue Nov 13, 2021 · 1 comment

Comments

@rjkilpatrick
Copy link
Owner

I attempted to implement a naive depth detection scheme, where we scale as the square of the distance between the eyes.

const measuredEyeDistanceSquared = leftEye.distanceToSquared(rightEye);
const estimatedDepth = THREE.MathUtils.clamp(0.2 / measuredEyeDistanceSquared, 0.1, 4);

Which for small eye-webcam distances, this worked fine.
However, for larger distances, a small change in inter-eye distance (even by just changing head angle left-to-right) leads to a large change in estimated depth, seeing as we are inverting an ever-smaller number, which asymptotes to infinity where the pixel resolution gets smaller.
It is worth noting, that this is with downscaling my 19201080 webcam to 320240, but increasing this would mean PoseNet takes much longer to run, so I think at the moment, we should not include depth perception.

However, should ml5js support any eye-depth estimation, I will change my model.

@rjkilpatrick
Copy link
Owner Author

rjkilpatrick commented Nov 13, 2021

Switching to FaceMesh for 3D pose-estimation

From some preliminary testing (i.e. moving my face around on the ml5js examples of facemesh and PoseNet for eye-tracking), facemesh appears to be a faster model for inference, despite its output size being much higher, which is why I initially discounted it.

It is slightly harder an API to use as it's just a raw array, there's no dictionary, but it'll be worth it for the depth estimation.
Below are the corresponding numbers to the vertex:

So you can use a dictionary of points, and it even has the desired predictions[0].annotations.midwayBetweenEyes[0] to give you an estimate of 2D eye-position and depth on the z-axis (as described in the paper), is only relative to the centre of mass of all the points, i.e. not an estimate of depth to the face.

It does however at least give us a better estimate for the parallax error of a highly-rotated face.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant