Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object detection implementation for single image - iOS and android #21

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

andrewjb123
Copy link

No description provided.

@lennartschoch
Copy link

Hey @a7medev, I'd love to see this feature in the project :)
Is there any way we can move this PR forward? any way I can help?
Would be highly appreciated🙏

@Ravim-addweb
Copy link

Hi @andrewjb123 - thanks for taking time and writing the module to detect the object. I implemented that in my RN project.

I went ahead and added custom object detection method and passed my model.

However for both object detection and custom object detection cases, I've seen that the coordinates given by the algorithm of detected objects are pretty odd. Basically, I want to superimpose a sticker/video on the given coordinates when object is detected. So with my custom tensorflow model, it detects the object correctly but the coordinates are way off. Could you give it a shot (for default object detection; as I was getting weird coordinates for that as well) please? Thanks a ton.

@andrewjb123
Copy link
Author

andrewjb123 commented Jun 7, 2024

Hi, a couple of things you can check,

  • whether the x / y is inversed, for example if you are giving the image in the orientation of portrait but you are displaying in landscape you will need to detect that and inverse the x / y

  • if there is any scaling on your image display component you will need to calculate the scale it is using and apply that scale offset to your x / y. Alternatively scale the image first to the dimensions of your phone screen, and both display that scaled image and provide that scaled image to the image detector. This would be my preferred option to scale the physical image to the size of your display component, generally you will be making the image smaller which is low cost on performance, whereas if you are providing a larger image into the object detection there’s a larger processing footprint on device to detect an object.

  • on your image display component ensure there is no centering / alignment happening, if there is then you will need to account for it and adjust the x/y appropriately. Alternatively switch off the alignment on the component completely for testing.

  • I think I included a test app in the commit which does / did work when I made this pull request. You could build that test app and install it on a phone and try with the image you are using to see if it gives the correct coordinates and then figure out what the differences are between the test app code and yours. Here is the commit for the changes I made to get object detection working on a test app: 23f659e

@Ravim-addweb
Copy link

@andrewjb123 thanks for quick response, appreciate it.

Is there a way to get the X and Y Coordinates for the scanned image? Basically, I am looking forward to build something like this, kindly check.

Android doc reference which says the return type will be Rect.

In my local application, I am able to detect the Photo object using my custom Tensorflow model. But I am getting Rect as response from the CustomObjectDetection SDK. I would like to have XY coordinates of all 4 pointers, then only I will be able to superimpose the video, right now my video seems like it's a sticker.

Thanks.

@andrewjb123
Copy link
Author

andrewjb123 commented Jun 7, 2024

Hi,

the interface is giving you a rect, framex, framey, frameWidth, frameHeight.

what I’m suggesting is that your display image component (e.g <Image/> in your react native app is doing some form of scaling, centering or rotation which you are unable to see, or haven’t turned off, so for example you are giving an image of 1024x1024 to the object detection library, and it’s giving you the correct x, y, width, height of the rect for the image dimensions of 1024x1024, but your component library has scaled your image to say 512x512 (for example) so in your case the rect will be positioned incorrectly on your 512x512 scaled image unless you appropriately divide x,y,width,height by 2 to compensate for the scale change on display in the image component.

if you can give the test app a try and if it’s not working on that please upload the image you are using and the tensor model and I can take a look.

@andrewjb123
Copy link
Author

andrewjb123 commented Jun 7, 2024

I’ve taken a Quick Look at the video you posted, I think you’re probably using the wrong technology to achieve video overlay which tracks a placeholder being moved by a hand.

These are the steps I think you are trying to achieve:

  • detect someone dancing (simple image detection on dance yes/no decision)
  • Track a moving object and overlay a video to that moving object.

For this you would better using an augmented reality library like react-viro

https://youtu.be/Waqb0zTMSDY?si=-FcGtWZNr9kXJHki

https://youtu.be/2pGCnipzl3c?si=mNue84X3asBFW3NM

If you used imageMarker and video components from that library I think it would achieve the desired effect you want.

https://viro-community.readme.io/docs/image-recognition

https://viro-community.readme.io/docs/video

I’ve provided you with some links to the components you could use

@Ravim-addweb
Copy link

Hi @andrewjb123 sorry for late reply.

I actually am able to detect the CustomObject by taking inspiration from your DetectObject module you wrote for Android. It gives me Rect back, now I am using another model to detect coordinates of detected object using tensorflow and training my model to use both image and annotations.

I will inform you once I am near to any solution. Thanks again for answering. Appreciate it!

@andrewjb123
Copy link
Author

Appreciate you may use what I’ve provided but you’ll never get the performance you need using it by using a bridge combined with and image as input, the interface provided will only allow single images and on mobile device won’t be fast enough for realtime processing doing what you’re trying to do.

@Ravim-addweb
Copy link

Hey, thanks for getting back. I have decided to go with a 3rd party service called Vuforia which accurately detects cloud image target and you can nicely overlay image, video or 3D augmentation on it. As neither Google's SDK or any other solution didn't give me the accurate XY coordinates of the image, we had to follow the remote solution. Thanks.

@BoavistaLudwig
Copy link

@a7medev Is there any way we can move this PR forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants