dpsprep [options] src [dest]
This tool, initially made specifically for use with Sony's Digital Paper System (DPS), is now a general-purpose DjVu to PDF converter with a focus on small output size and the ability to preserve document outlines (e.g. TOC) and text layers (e.g. OCR).
-q,--quality: Quality of images in output. Used only for JPEG compression, i.e. RGB and Grayscale images. Passed directly to Pillow and to OCRmyPDF's optimizer.-v,--verbose: Display debug messages.-o,--overwrite: Overwrite destination file.-w,--preserve-working: Preserve the working directory after script termination.-d,--delete-working: Delete any existing files in the working directory prior to writing to it.-t,--no-text: Disable the generation of text layers. Implied by --ocr.-p,--pool-size: Size of MultiProcessing pool for handling page-by-page operations.-m,--mode<infer|bitonal|grayscale|rgb>: Override the image modes encoded in the DjVu file for individual pages. It sometimes makes sense to force bitonal images since they compress well.--dpi: Override DPI values encoded in the DjVu file for individual pages.--ocr: Perform OCR via OCRmyPDF rather than trying to convert the text layer. If this parameter has a value, it should be a JSON dictionary of options to be passed to OCRmyPDF.-O1: Use the lossless PDF image optimization from OCRmyPDF (without performing OCR).-O2: Use the PDF image optimization from OCRmyPDF.-O3: Use the aggressive lossy PDF image optimization from OCRmyPDF.--help: Show help message and exit.--version: Show the version and exit.
Produce file.pdf in the current directory:
dpsprep /wherever/file.djvuProduce output.pdf with reduced image quality and aggressive PDF image optimizations:
dpsprep ---quality=30 -O3 input.djvu output.pdfProduce an output file using a large pool of workers:
dpsprep --pool=16 input.djvuForce bitonal images:
dpsprep --mode bitonal input.djvuProduce an output file by disregarding the text layer and running OCRmyPDF instead:
dpsprep --ocr '{"language": ["rus", "eng"]}' input.djvuOr simply disregard the text layer without OCR:
dpsprep --no-text input.djvuWe perform compression in two stages:
-
The first one is the default compression provided by Pillow. For bitonal images, the PDF generation code says that, if
libtiffis available,group4compression is used. -
If OCRmyPDF is installed, its PDF optimization can be used via the flags
-O1to-O3(this involves no OCR). This allows us to use advanced techniques, including JBIG2 compression viajbig2enc.
If manually running OCRmyPDF, note that the optimization command suggested in the documentation (setting --tesseract-timeout to 0) may ruin existing text layers. To perform only PDF optimization you can use the following undocumented tool instead:
python -m ocrmypdf.optimize <input_file> <level> <output_file>