Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions to improve ClearScan #13

Open
raffaem opened this issue Jul 17, 2022 · 1 comment
Open

Suggestions to improve ClearScan #13

raffaem opened this issue Jul 17, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@raffaem
Copy link
Owner

raffaem commented Jul 17, 2022

Thanks for your hard-working and for creating such a great tool!!!! I've got some questions:

It seems that the higher quality ClearScan ( with a higher --clearscan-upscaling-factor ) increases the file size a lot. Also, it messes up the figures.

Is it possible to apply OCR first and only perform a clear scan on the area with text?
Also, I'm not sure how to do this. But if I understand correctly, it seems that Acrobat's clear scan creates a kind of font to reduce the size of the final file. Is there any way to implement this kind of function?

Thanks a lot!

Originally posted by @c0rychu in #12

@raffaem raffaem self-assigned this Jul 17, 2022
@raffaem raffaem added enhancement New feature or request help wanted Extra attention is needed labels Jul 17, 2022
@raffaem
Copy link
Owner Author

raffaem commented Jul 17, 2022

Hi,

thanks for the suggestions.

The best way to mimic Adobe and the new fonts it create is probably to compress the resulting PDF with JBIG2.

There is an open source encoder here.

Problems are: (1) the encoder seems abandoned (last commit dated 2019) (2) you need to compile it from source (3) ImageMagick doesn't seem to support JBIG2 compression for PDF files.

Regarding excluding images from ClearScan, I'm afraid it would be very very difficult.

PDFsak operates differently from Adobe.

The passages to mimic clearscan are the following:

  1. The PDF is converted into an image
  2. The image is passed to potrace
  3. The image is converted back into PDFs and merged

I currently don't have a clear idea how we can exclude existing images from this process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant