Skip to content

Commit

Permalink
integrate group feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
muelea committed Dec 14, 2023
1 parent 55f1137 commit 235ea85
Show file tree
Hide file tree
Showing 8 changed files with 1,261 additions and 15 deletions.
29 changes: 14 additions & 15 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ <h4 class="col-md-12 text-center">
</li>

<li class="list-inline-item">
<a href="https://ps.is.mpg.de/person/black">Michael Black<sup>2</sup></a>
<a href="https://ps.is.mpg.de/person/black">Michael Black<sup>1</sup></a>
</li>

<li class="list-inline-item">
Expand Down Expand Up @@ -251,7 +251,7 @@ <h4 class="col-md-12 text-center">
<div class="col-md-10 mx-auto">
<p class="text-justify">
<b>
Our method estimates the poses of two people in close social interaction. We first train a generative model that learns the joint distribution of two interacting people.
Our method takes a single image and estimates the poses of two people in close social interaction. We first train a generative model that learns the joint distribution of two interacting people.
Then we use this model as prior during optimization when fitting two SMPL-X body models to detected 2D joint locations.
</b>
</p>
Expand Down Expand Up @@ -300,7 +300,7 @@ <h4>Sampling meshes from BUDDI </h4>
></video>
</div>
<p class="text-justify">
BUDDI is a diffusion model that learned the joint distribution of two people in close proxeminty.
BUDDI is a diffusion model that is trained to model the joint distribution of two people in close proximity.
It directly generates SMPL-X body model parameters for two people, starting from random noise.
</p>
</div>
Expand All @@ -309,7 +309,7 @@ <h4>Sampling meshes from BUDDI </h4>

<div class="row mb-3 pt-2">
<div class="col-md-10 mx-auto">
<h4>Comparison against BEV (right) vs. <i>Optimization with BUDDI</i> (left)</h4>
<h4>Comparison with BEV (left) vs. <i>Optimization with BUDDI</i> (right)</h4>

<div class="comp-container" id="comp-container" style="visibility: none">

Expand Down Expand Up @@ -368,7 +368,7 @@ <h4>Comparison against BEV (right) vs. <i>Optimization with BUDDI</i> (left)</h4
</div>
<p></p>
<p class="text-justify">
We compare BUDDI to <a href="http://www.yusun.work/BEV/BEV.html">BEV</a> [SUN et al., CVPR 2022], a recent human mesh regressor that estimated the 3D pose and shape of multiple people in the world coordinate system. The predicted spacial positioning is often correct, but BEV fails to reconstruct contact and subtle detail. BUDDI addresses this problem by a two-step approach and is able to reconstruct more relistic interaction and social proxemics, also for complex poses like hugs, piggy backs, or with arms interlinked.
We compare BUDDI to <a href="http://www.yusun.work/BEV/BEV.html">BEV</a> [SUN et al., CVPR 2022], a recent human mesh regressor that estimates the 3D pose and shape of multiple people in the world coordinate system. The predicted spacial positioning is often correct, but BEV fails to reconstruct contact and subtle detail. BUDDI addresses this problem by a two-step approach and is able to reconstruct more relistic interaction and social proxemics, also for complex poses like hugs, piggy backs, or with arms interlinked.
<b>Click on renderings to zoom</b>.
</p>
</div>
Expand All @@ -386,8 +386,7 @@ <h4>How does BUDDI work?</h4><p></p>
src="media/method_twostep.m4v"
width="100%"
></video></div>
We take a two-step appraoch, where in the first stage we train a generative model that learns a 3D generative model of two people in close social
interaction. In the second step we use this model as prior during optimization.
We take a two-step approach, where, in the first stage, we train a model to generate the 3D meshes of two people in close social interaction. In the second step, we use this model as prior during optimization.

<p> </p>
<div class="col-md-10 mx-auto">
Expand Down Expand Up @@ -416,7 +415,7 @@ <h4>How does BUDDI work?</h4><p></p>

<p class="text-justify">
The conditional model can be used as a social prior in the
downstream optimization task of reconstructing 3D people in close proximity from images, without any extra annotation such as contact maps. We initialize the optimization routine with a sample BUDDI given the BEV estimate and use standard fitting losses to minimize the 2D re-projection error, a loss term to resolve intersections between the two people, and a term to stay close to the initial sample.
downstream optimization task of reconstructing 3D people in close proximity from images, without any extra annotation such as contact maps. We initialize the optimization routine with a sample from BUDDI given the BEV estimate and use standard fitting losses to minimize the 2D re-projection error, a loss term to resolve intersections between the two people, and a term to stay close to the initial sample.
</p>

<!-- <div class="col-md-10 mx-auto">
Expand All @@ -437,9 +436,9 @@ <h4>How does BUDDI work?</h4><p></p>
></img></div>
<p class="text-justify">
To use BUDDI as a prior in optimization, we adopt the SDS loss presented in previous work like <a href="https://dreamfusion3d.github.io">DreamFusion</a> (L diffusion). When fitting SMPL-X to image keypoints,
in each optimization iteration, we diffuse the current estimate diffuse it and let BUDDI propose a refined estimate. The refined estimate is
closer to the true distribution of interacting people and serves as prior via an L2-Loss. This enables us to fit 3D meshes to images of closely
interacting people without replying on ground-truth contact annotations at test time.
in each optimization iteration, we take the current estimate diffuse it and let BUDDI propose a refined estimate. The refined estimate is
more likely under the true distribution of interacting people and serves as prior via an L2-Loss. This enables us to fit 3D meshes to images of closely
interacting people without relying on ground-truth contact annotations at test time.
</p>
</div>
</div>
Expand All @@ -457,7 +456,7 @@ <h4>Training Data - Flickr Fits</h4>

<p class="text-justify">
To create training data for BUDDI, we fit SMPL-X meshes to <a href="https://ci3d.imar.ro">FlickrCI3D Signatures</a>, a dataset of images
collected from Flickr with ground-truth c3D contact annotations. We show sample images from this dataset with the contact map visulaized on the left and our Flickr fits on the right. See our <a href="https://github.com/muelea/buddi">GitHub</a> repo for more information about the training data.
collected from Flickr with ground-truth 3D annotations indicating binary pairwise contact between regions of the human body between two people. We show sample images from this dataset with the contact map visulaized on the left and our Flickr fits on the right. See our <a href="https://github.com/muelea/buddi">GitHub</a> repo for more information about the training data.
</p>
</div>
</div>
Expand All @@ -467,9 +466,9 @@ <h4>Training Data - Flickr Fits</h4>
<h4>BibTeX</h4>
<textarea id="bibtex" class="form-control" readonly>
@article{mueller2023buddi,
title={Generative Proxemics: A Prior for 3D Social Interaction from Images}
author={M{\“u}ller, Lea and Ye, Vickie and Pavlakos, Georgios and Black, Michael and Kanazawa, Angjoo},
journal={arXiv preprint 2306.09337v1}
title={Generative Proxemics: A Prior for {3D} Social Interaction from Images},
author={M{\“u}ller, Lea and Ye, Vickie and Pavlakos, Georgios and Black, Michael J. and Kanazawa, Angjoo},
journal={arXiv preprint 2306.09337v1},
year={2023}}
</textarea>
</div>
Expand Down
Loading

0 comments on commit 235ea85

Please sign in to comment.