Suggestions to improve ClearScan #13

raffaem · 2022-07-17T05:02:22Z

Thanks for your hard-working and for creating such a great tool!!!! I've got some questions:

It seems that the higher quality ClearScan ( with a higher --clearscan-upscaling-factor ) increases the file size a lot. Also, it messes up the figures.

Is it possible to apply OCR first and only perform a clear scan on the area with text?
Also, I'm not sure how to do this. But if I understand correctly, it seems that Acrobat's clear scan creates a kind of font to reduce the size of the final file. Is there any way to implement this kind of function?

Thanks a lot!

Originally posted by @c0rychu in #12

raffaem · 2022-07-17T05:03:25Z

Hi,

thanks for the suggestions.

The best way to mimic Adobe and the new fonts it create is probably to compress the resulting PDF with JBIG2.

There is an open source encoder here.

Problems are: (1) the encoder seems abandoned (last commit dated 2019) (2) you need to compile it from source (3) ImageMagick doesn't seem to support JBIG2 compression for PDF files.

Regarding excluding images from ClearScan, I'm afraid it would be very very difficult.

PDFsak operates differently from Adobe.

The passages to mimic clearscan are the following:

The PDF is converted into an image
The image is passed to potrace
The image is converted back into PDFs and merged

I currently don't have a clear idea how we can exclude existing images from this process.

raffaem self-assigned this Jul 17, 2022

raffaem added enhancement New feature or request help wanted Extra attention is needed labels Jul 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions to improve ClearScan #13

Suggestions to improve ClearScan #13

raffaem commented Jul 17, 2022

raffaem commented Jul 17, 2022

Suggestions to improve ClearScan #13

Suggestions to improve ClearScan #13

Comments

raffaem commented Jul 17, 2022

raffaem commented Jul 17, 2022