Update docs/source/en/model_doc/vitpose.md

Co-authored-by: NielsRogge <[email protected]>
NielsRogge · Sep 10, 2024 · 5197549 · 5197549
1 parent cb6d45f
commit 5197549
Showing 1 changed file with 1 addition and 2 deletions.
diff --git a/docs/source/en/model_doc/vitpose.md b/docs/source/en/model_doc/vitpose.md
@@ -40,8 +40,7 @@ The original code can be found [here](https://github.com/ViTAE-Transformer/ViTPo
 >>> outputs = model(pixel_values, dataset_index)
 ```
 
-- The current model utilizes a 2-step inference pipeline. The first step involves placing a bounding box around the region corresponding to the person.
-  After that, the second step uses VitPose to predict the keypoints.
+- ViTPose is a so-called top-down keypoint detection model. This means that one first uses an object detector, like [RT-DETR](rt-detr), to detect people (or other instances) in an image. Next, ViTPose takes the cropped images as input and predicts the keypoints.
 
 ```py
 >>> import torch