-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results in real-time videos #24
Comments
Hi @ahsan3803 , Thanks for your work! It's so clear to show the difference between our network production and others. I'd like to explain more following your demonstration. There are several reasons:
And I have another question for you, after having the position of Videopose outputs, how do you convert it to bvh file? Using some IK solvers? Feel free to let me know your ideas. Thanks! |
@Shimingyi Thanks for explanation. For the the 3rd and 4th video I just set better view in video creation but I think you can judge it with current view. After getting skeleton information, I convert pose data to joints angles. Furthermore I use Euler method as representation. As Quaternions is best practice especially in machine level but still I can get good results using Euler. May be in future I will change it to Quaternions I also have question about foot contact, I can see that code is using foot contact loss but how we can get predictions about foot contact? I mean same as joints rotations, bones etc from |
@ahsan3803 Euler angle is ok if you don't use any learning optimization method. But I am also cautious of your transformation from position to rotations. Do you use a ground truth skeleton? or do something like average? Ok riging? And how do you convert it to rotation? Regarding foot contact, we have two losses here. Firstly, we will extract the gt of foot contact from the dataset and then predict it as a part in E_q output, so we can apply a reconstrution_loss of foot contact here. And in addition, we will constraint the velocity to be 0 when the foot is predicted as 'contact'. Related code: code. |
This might be only tangentially related, but I noticed in the future work section of the paper you said
I feel like that's similar to what you are doing in the paper? |
Hi @JinchengWang, the idea is similar but it's different. You can imagine what causes foot sliding when you see a foot motion. On important thing is the foot moving, we consistant the velocity of foot to zero so it can be more stable. But another important thing is the height between the ground and foot joint, only when we can guarantee that the velocity=0 and height'=0, we can say it's stable. But in our system, we don't reconstruct the environment so the network doesn't account for any physical constraints with ground. In this SIGGRAPH Asia, there is a paper called PhysCap can do it well, it's worth having a reading. |
@Shimingyi I looked at the paper "Interactive Character Animation by Learning Multi-Objective Control" by Lee et al. It seems they addressed this problem by using a foot contact loss, which is almost identical to the While it is true that there is no visible foot sliding in their end result, the way you wrote it seems to imply they are using a fundamentally different method. (Which, I think, is not the case, assuming that I understand both paper correctly.) BTW, great fan of the PhysCap paper as well! On hindsight, there is a glaring lack of cooperation between the pose estimation and humanoid robot control communities... |
Hi @Shimingyi For BVH conversion I use this tool for VideoPose. As some other members are also asking about foot contacts and in your work foot contacts and global root position is key factor for any motion capture system. Just take these 2D-frames as examples. Below image have foot contact labels of above mentioned example We can see that foot contact values are varying. Furthermore I wanna ask that what are the standard to set foot contacts? I mean if height from foot is less than 20mm above the ground (as mentioned in paper section 3.3) then it will be consider as foot is in contact with ground. What if height of foot from ground is more than 20mm (it may be 50mm, 100mm, even more than 200mm etc....) so in this situation values of foot contact from MotioNet are always constant or will be changed according to height from ground? In other words what is highest or lowest value if foot is in or not in contact? I'm asking it because from above results, values are varying too much and seems that MotioNet is not using binary flags in term of foot contacts. My other question is about global root position that in paper you used it as loss and there is method about it in Animation file. So in getting BVH here you didn't make use of foot contacts and global position. This functionality will be supported in next commit or there is no plan to release this functionality? Looking forward for your response. |
Hi @ahsan3803 , Thanks so much for your useful feedback! Those experiments can help others to understand the system better. Let me explain more in this thread:
Best, |
@Shimingyi Thanks so much for explanation
|
I will keep working in this field so just feel free to contact me if you need help or any idea can be shared! |
Hello @ahsan3803 |
Hi @ahsan3803 , can you please guide me as to how you were calculating joint angles from pose data,, i am lost there. I would really appreciate it. |
I tested some real-time videos from YouTube and I noticed that results are not good as your paper is showing. I compared MotioNet results with VideoPose after converting VideoPose results to BVH file.
As you compared your approach results with different approaches (here) and its clearly seems that MotionNet is performing well as compared to others but when I tested with different videos then I got shocked results. Check the attached gif files as sample results that I got.
Ignore the viewing angles of results I just set to get better view, you can judge with current view of every gif file.
MotioNet insures the geometric skeleton and rotational parameters which are key things to drive any 3D character, VideoPose didn't use any of these parameters but still results are stable and better as compared to MotioNet.
So did you compare after applying some filters?
If you wanna test these videos by yourself then I can send you these videos.
Note: Both approaches are using pre-trained weights without any filter or smoothness.
The text was updated successfully, but these errors were encountered: