Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Enhancement] OCR Workflow Requires Excessive Clicks and Warnings in 2.0.0-rc1 #762

Open
1 task
powyncify opened this issue Feb 20, 2025 · 3 comments
Open
1 task
Labels
type: bug Something isn't working

Comments

@powyncify
Copy link

Environment

GitHub v2-dev branch

Description

Description:

The workflow for attaching and processing images with OCR in version 2.0.0-rc1 has become significantly more cumbersome, requiring multiple clicks and confirmations. This negatively impacts usability, especially for users who primarily rely on OCR for image attachments.

Steps to Reproduce:

  1. Attach an image file to a message.
  2. Click on the attached image thumbnail.
  3. Select the "Add Text (OCR)" option from the popup menu.
    • Warning 1: A warning appears: "Warning: May not be compatible with the current model. Please try another format." (This warning appears even if the model is compatible with OCR.)
  4. Wait for the OCR processing to complete.
  5. Click the "Chat" (or send) button.
    • Warning 2: A second warning appears: "Attachment Compatibility Notice. Some attached files may not be fully compatible with the current AI model. This could affect processing. Would you like to review or proceed?"

This results in a total of 4-5 clicks and two warnings per image before the OCR'd text is sent to the model.

Expected Behavior:

In previous versions (or as a desired enhancement), attaching an image should ideally:

  • Optionally, automatically perform OCR on the image without requiring manual selection (configurable via a setting).
  • Minimize or eliminate unnecessary warnings if the selected model supports OCR and the image format is compatible.
  • Streamline the process to require fewer clicks.

Enhancement Request:

Introduce a setting (e.g., "OCR all attached images by default") to allow users to bypass the manual selection of the "Add Text (OCR)" option. When enabled, any attached image would automatically be processed with OCR.

Additional Notes:

  • The two warnings ("May not be compatible..." and "Attachment Compatibility Notice") appear to be overly cautious and often unnecessary, especially when the user knows the model and image format are compatible. Consider suppressing these warnings, or making them less intrusive, when OCR is explicitly enabled or when compatibility can be confidently determined.
  • Our use case exclusively involves attaching images for OCR processing. We never upload images that are not intended for OCR. Therefore, a default OCR option would significantly improve our workflow.
  • The warnings are shown even if the models are compatible.

This revised report is more structured, clearly separates the bug (excessive clicks/warnings) from the enhancement request (default OCR setting), and provides precise steps to reproduce the issue. It also emphasizes the user impact and provides context for the requested changes. It is ready to be posted.

Device and browser

Big-AGI deployed on Vercel, accessed via Edge on Windows 11

Screenshots and more

Image

Willingness to Contribute

  • 🙋‍♂️ Yes, I would like to contribute a fix.
@powyncify powyncify added the type: bug Something isn't working label Feb 20, 2025
@enricoros
Copy link
Owner

@powyncify this makes sense. Default behavior for attachments should be a thing.
Do you want the Image at all, or just the OCR of it? I.e. should the option be to pre-check "Add Text" to the images or "Only Text (OCR)"?

@powyncify
Copy link
Author

That's a great idea, @enricoros ! Yes "Only Text (OCR)" makes sense. But only if it is easy to implement. Otherwise, just pre-check "add text" to the images will do fine.

Keep up the good work!

@enricoros
Copy link
Owner

Thanks @powyncify , it you deploy your own source this can be done easily, probably just a single line change.

I can't make it Default for everyone as even people uploading photos would get some weird OCR symbols out of them.

Seems like having an option would be the way to go (note that I try to minimize options in favor of auto detects or ux behaviors).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants