-
Notifications
You must be signed in to change notification settings - Fork 7k
Added the KeyPoints TVTensor #8817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8817
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
I'd love to see this get included in a future release. |
Given that the PR is about 4 months old now... me too ? |
Thank you for your patience @Alexandre-SCHOEPP , I don't have much bandwidth at the moment, but I will try to get this reviewed and merged for 0.23 |
No worries @NicolasHug, that project is very large given the size of the team behind it, i understant that prioritization needs to happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Thanks a lot for the cool PR. I do think it will be a very relevant addition to torchvision. I feel the PR is well aligned with the structure of the codebase. However, this is introduces a large number of changes, so I propose we make a couple iterations to fully align with the main branch before merging. How does it sound?
I left some comments in the code, but here are my high level questions and remarks.
- Regarding parameter naming, let's make sure we are sticking to "keypoint" for user-facing functions and not use "kp" and "kpoint" there;
- I am not sure about the
_xyxy_to_points
output. Can we include some tests for both_xyxy_to_points
andconvert_box_to_points
to make sure the input/output is as expected? - We a keypoint rcnn model that support keypoints in torchvision. I'm not sure we should change the code because of the risk of breaking it, but can use this as some advanced integration test?
- I think we are missing a certain number of tests for the transforms:
TestResize.test_kernel_key_points
,TestResize.test_functional for key_points
,TestResize.test_key_points_correctness
,TestResize.test_noop
,TestResize.test_no_regression_5405
,TestHorizontalFlip.test_kernel_key_points
,TestHorizontalFlip.test_bounding_boxes_correctness
... Would it be eventually possible to add those? - More generally, the keypoints are defined here a set of independent points. Is it the usual definition, can those be used to define arbitrary polygons? In those case, will some other functionalities be needed?
Thanks again for your time submitting this PR. Let me know what you think about my comments. On my side, I am excited to work on this with you and make sure we can merge the feature into torchvision.
Best,
Antoine
def get_all_keypoints(flat_inputs: List[Any]) -> Iterable[tv_tensors.KeyPoints]: | ||
"""Yields all KeyPoints in the input. | ||
|
||
Raises: | ||
ValueError: No KeyPoints can be found | ||
""" | ||
generator = (inpt for inpt in flat_inputs if isinstance(inpt, tv_tensors.KeyPoints)) | ||
try: | ||
yield next(generator) | ||
except StopIteration: | ||
raise ValueError("No Keypoints were found in the sample.") | ||
return generator | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this get_all_keypoints
function is not used elsewhere. Should we delete it if this is not needed in the current state of the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is meant to parallel the one for bounding boxes, the schematic of which I followed here. I don't necessarily have an attachment to it, but I felt like it was better to have it if someone who worked on the BBoxes had feedback on it.
from ._tv_tensor import TVTensor | ||
|
||
|
||
class KeyPoints(TVTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, It seems to me there are a few discrepancies between the KeyPoints
and BoundingBoxes
classes (c.f. comments below). Maybe we can try to reduce the gap to align as much as possible with bbox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in the response below, I think that having more freedom in the shape makes sense in this context. Points are inherently less structure than Boxes, and whether it's having a representation of a polygons/polylines (N_Poly, N_sides, 2), a skeleton (N_bones, 2, 2), I feel like letting this freedom stay enables the development of future more complex TVTensors without having to re-implement all of the kernels (which is frankly unfun, 0/10 would not recommend)
I also feel like there are use cases where having more than (N, 4) shaped bboxes would make sense (for example for groups of up to N objects to detect and group together), but the limitation feels far less oppressing than it would be for points.
|
||
|
||
class KeyPoints(TVTensor): | ||
""":class:`torch.Tensor` subclass for tensors with shape ``[..., 2]`` that represent points in an image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the expected shape shape [N, 2]
here where N
is the number of keypoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I left the free space for more complex shapes for use in more complex structures, like skeletons. A skeleton would be a NumBones, 2, 2
tensor (a list of NumBones
segments), and would be a specific type of KeyPoints
to reuse the kernels as much as possible
dtype: Optional[torch.dtype] = None, | ||
device: Optional[Union[torch.device, str, int]] = None, | ||
requires_grad: Optional[bool] = None, | ||
canvas_size: Tuple[int, int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe align the order of the arguments with the BoundingBoxes
class? In this case, canvas_size
should be the 4th argument and not the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for the sake of consistent code styling, however, given that this is a keyword-only argument, this difference isn't visible at runtime outside of inspecting the function's attributes
# Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent | ||
# or complex errors | ||
output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output) | ||
elif isinstance(output, MutableSequence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case handles lists, and list-like objects.
Unlike in the case of tuples, (which are non-mutable sequences), I can't reuse the container object. Here, I can.
I agree that it needs to be changed, however. Custom-made non-mutable sequences are not supported, which is an oversight on my part.
The valid logic would be :
if isinstance(output, torch.Tensor) and not isinstance(output, KeyPoints):
output = KeyPoints(output, canvas_size=canvas_size)
elif isinstance(output, MutableSequence):
# For lists and list-like object we don't try to create a new object, we just set the values in the list
for i, part in enumerate(output):
output[i] = KeyPoints(part, canvas_size=canvas_size)
elif isinstance(output, Sequence): # Non-mutable sequences handled here (like tuples)
# NB: output is checked against sequence because it has already been checked against Tensor
# Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
# or complex errors
output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
return output
def _resized_crop_keypoints_dispatch( | ||
inpt: tv_tensors.BoundingBoxes, top: int, left: int, height: int, width: int, size: List[int], **kwargs | ||
): | ||
out, canvas_size = resized_crop_keypoints( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename the variable out as output to align with bounding box function conventions (c.f. _resized_crop_bounding_boxes_dispatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
@@ -176,6 +181,29 @@ def _xyxy_to_cxcywh(xyxy: torch.Tensor, inplace: bool) -> torch.Tensor: | |||
return xyxy | |||
|
|||
|
|||
|
|||
def _xyxy_to_points(bounding_boxes: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename as _xyxy_to_keypoints
for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
@@ -176,6 +181,29 @@ def _xyxy_to_cxcywh(xyxy: torch.Tensor, inplace: bool) -> torch.Tensor: | |||
return xyxy | |||
|
|||
|
|||
|
|||
def _xyxy_to_points(bounding_boxes: torch.Tensor) -> torch.Tensor: | |||
return bounding_boxes[:, [[0, 1], [2, 1], [2, 3], [0, 3]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to return a tensor of shape N, 4
instead of N, 2
? We should add a test for this function to control the expected input/output format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's N, 4, 2
, but I agree it needs a test, if such questions arise
"""Make the KeyPoints for testing purposes""" | ||
if isinstance(num_points, int): | ||
num_points = [num_points] | ||
half_point: Tuple[int, ...] = tuple(num_points) + (1,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the half_point
variable in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coordinate_shape
would probably be a better name here, I used half point since it's half the shape of the points.
I agree it's unclear.
new_format=BoundingBoxFormat.XYXY, | ||
inplace=False, | ||
) | ||
return tv_tensors.KeyPoints(_xyxy_to_points(bbox), canvas_size=bounding_boxes.canvas_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also make sure this function is covered in the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I marked every comment where I agreed with the need to change things with the word agree
in my response just so I can ensure to not miss one whenever I get the time to fix it
"""Make the KeyPoints for testing purposes""" | ||
if isinstance(num_points, int): | ||
num_points = [num_points] | ||
half_point: Tuple[int, ...] = tuple(num_points) + (1,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
coordinate_shape
would probably be a better name here, I used half point since it's half the shape of the points.
I agree it's unclear.
def get_all_keypoints(flat_inputs: List[Any]) -> Iterable[tv_tensors.KeyPoints]: | ||
"""Yields all KeyPoints in the input. | ||
|
||
Raises: | ||
ValueError: No KeyPoints can be found | ||
""" | ||
generator = (inpt for inpt in flat_inputs if isinstance(inpt, tv_tensors.KeyPoints)) | ||
try: | ||
yield next(generator) | ||
except StopIteration: | ||
raise ValueError("No Keypoints were found in the sample.") | ||
return generator | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is meant to parallel the one for bounding boxes, the schematic of which I followed here. I don't necessarily have an attachment to it, but I felt like it was better to have it if someone who worked on the BBoxes had feedback on it.
@@ -176,6 +181,29 @@ def _xyxy_to_cxcywh(xyxy: torch.Tensor, inplace: bool) -> torch.Tensor: | |||
return xyxy | |||
|
|||
|
|||
|
|||
def _xyxy_to_points(bounding_boxes: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
@@ -176,6 +181,29 @@ def _xyxy_to_cxcywh(xyxy: torch.Tensor, inplace: bool) -> torch.Tensor: | |||
return xyxy | |||
|
|||
|
|||
|
|||
def _xyxy_to_points(bounding_boxes: torch.Tensor) -> torch.Tensor: | |||
return bounding_boxes[:, [[0, 1], [2, 1], [2, 3], [0, 3]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's N, 4, 2
, but I agree it needs a test, if such questions arise
new_format=BoundingBoxFormat.XYXY, | ||
inplace=False, | ||
) | ||
return tv_tensors.KeyPoints(_xyxy_to_points(bbox), canvas_size=bounding_boxes.canvas_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
|
||
|
||
class KeyPoints(TVTensor): | ||
""":class:`torch.Tensor` subclass for tensors with shape ``[..., 2]`` that represent points in an image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I left the free space for more complex shapes for use in more complex structures, like skeletons. A skeleton would be a NumBones, 2, 2
tensor (a list of NumBones
segments), and would be a specific type of KeyPoints
to reuse the kernels as much as possible
dtype: Optional[torch.dtype] = None, | ||
device: Optional[Union[torch.device, str, int]] = None, | ||
requires_grad: Optional[bool] = None, | ||
canvas_size: Tuple[int, int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for the sake of consistent code styling, however, given that this is a keyword-only argument, this difference isn't visible at runtime outside of inspecting the function's attributes
points.canvas_size = canvas_size | ||
return points | ||
|
||
if TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no need for type:ignore
. This solution works, and quite frankly, I think should be the standard in the library (despite marking it jokingly as EVIL)
# Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent | ||
# or complex errors | ||
output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output) | ||
elif isinstance(output, MutableSequence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case handles lists, and list-like objects.
Unlike in the case of tuples, (which are non-mutable sequences), I can't reuse the container object. Here, I can.
I agree that it needs to be changed, however. Custom-made non-mutable sequences are not supported, which is an oversight on my part.
The valid logic would be :
if isinstance(output, torch.Tensor) and not isinstance(output, KeyPoints):
output = KeyPoints(output, canvas_size=canvas_size)
elif isinstance(output, MutableSequence):
# For lists and list-like object we don't try to create a new object, we just set the values in the list
for i, part in enumerate(output):
output[i] = KeyPoints(part, canvas_size=canvas_size)
elif isinstance(output, Sequence): # Non-mutable sequences handled here (like tuples)
# NB: output is checked against sequence because it has already been checked against Tensor
# Since a Tensor is a sequence of Tensor, had it not been the case, we may have had silent
# or complex errors
output = tuple(KeyPoints(part, canvas_size=canvas_size) for part in output)
return output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_geometry.py was hidden by the change reviewer, and so I failed to answer a few comments. Here's the change
# 2) Now let's transform the points using affine matrix | ||
keypoints = torch.matmul(keypoints, transposed_affine_matrix).to(original_dtype) | ||
|
||
return keypoints, canvas_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember it beeing unneded, but I don't remember why (it's been a few months). Since I didn't write a comment about it, I agree it should be added just in case.
def _rotate_keypoints_dispatch( | ||
kp: tv_tensors.KeyPoints, angle: float, expand: bool = False, center: Optional[List[float]] = None, **kwargs | ||
) -> tv_tensors.KeyPoints: | ||
out, canvas_size = rotate_keypoints(kp, angle, center=center, expand=expand, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
@@ -1404,6 +1603,28 @@ def crop_image(image: torch.Tensor, top: int, left: int, height: int, width: int | |||
_register_kernel_internal(crop, PIL.Image.Image)(_crop_image_pil) | |||
|
|||
|
|||
def crop_keypoints( | |||
kp: torch.Tensor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
width: int, | ||
) -> Tuple[torch.Tensor, Tuple[int, int]]: | ||
|
||
kp.sub_(torch.tensor([left, top], dtype=kp.dtype, device=kp.device)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
numer_points = torch.matmul(kp, theta1.T) | ||
denom_points = torch.matmul(kp, theta2.T) | ||
transformed_points = numer_points.div_(denom_points) | ||
return clamp_keypoints(transformed_points, canvas_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike for bboxes, we never need to modify the shape during the process, so I don't think its needed here.
@@ -1671,6 +1921,36 @@ def perspective_bounding_boxes( | |||
).reshape(original_shape) | |||
|
|||
|
|||
def _compute_perspective_thetas( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! DRY isn't always applicable (and in high throughput contexts, this does have an impact), but I like to do it when I can.
This perspective code is also hard to understand (one might say, oblique) so I felt I had to do it to stop my (and anyone after me) brain from melting
def _resized_crop_keypoints_dispatch( | ||
inpt: tv_tensors.BoundingBoxes, top: int, left: int, height: int, width: int, size: List[int], **kwargs | ||
): | ||
out, canvas_size = resized_crop_keypoints( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
from ._tv_tensor import TVTensor | ||
|
||
|
||
class KeyPoints(TVTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in the response below, I think that having more freedom in the shape makes sense in this context. Points are inherently less structure than Boxes, and whether it's having a representation of a polygons/polylines (N_Poly, N_sides, 2), a skeleton (N_bones, 2, 2), I feel like letting this freedom stay enables the development of future more complex TVTensors without having to re-implement all of the kernels (which is frankly unfun, 0/10 would not recommend)
I also feel like there are use cases where having more than (N, 4) shaped bboxes would make sense (for example for groups of up to N objects to detect and group together), but the limitation feels far less oppressing than it would be for points.
Description
Adds and integrates the KeyPoints TVTensor (requested in #8728), which is a representation of picture-attached points (or vertices) attached point
Details
Inner workings
The KeyPoints represent a tensor of shape
[..., 2]
, which allow for arbitrarily complex structures to be represented (polygons, skeletons, or even SAM-like points prompts). Whenever the__new__
is called, the shape of the source tensor is checked.Tensors of shape
[2]
are reshaped to[1, 2]
, similarly to BoundingBoxes.KeyPoints, like BoundingBoxes, carry arround a
canvas_size
attribute which represents the scale of a batch-typical picture.Kernels
Kernels for all operations should be supported (if I missed one, I will fix this). It merely consists of an adaptation of the code of BoundingBoxes.
Particularities
Maintainers may notice that a
TYPE_CHECKING
section was added that differs significantly from the implementation:I marked this section as
EVIL
since it is a trick, but it cannot generate vulnerabilities: TYPE_CHECKING is alwaysFalse
at runtime, and only everTrue
for the linter.For the last few months, I had issues in my weird
PyLance
+Mypy
mix withBoundingBoxes
initialization. No overload is ever detected to match it. By "re-defining" it, I got to it solved on my machine.Convertors
Added a convertor
convert_box_to_points
intorchvision.transorfms.v2.functional._meta
exported intorchvision.transforms.v2
which (as its name states) converts a[N, 4]
BoundingBoxes
TVTensor into a[N, 4, 2]
KeyPoints TVTensor.Other changes
For the purposes of my custom type checking, I also changed
tv_tensors.wrap
to be 3.8-compatible generics.Since
wrap
only ever outputs a subclass of itslike
argument, I used aTypeVar
bound toTVTensor
to ensure that type-checking passes no matter the checker used.Methodology
Discussion
Since many converters of BoundingBoxes are based on chaning the bboxes to polygons, then operating on the points, I believe that there is a possibility to lower line count and increase reliability for negligeable computational latency cost by using KeyPoints kernels and converting using the method described in the details above