Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I load an existing pdf implementation and replace strings in the document without changing the original layout #247

Open
2622594863 opened this issue Jun 13, 2024 · 2 comments

Comments

@2622594863
Copy link

How do I load an existing pdf implementation and replace strings in the document without changing the original layout

@hoehermann
Copy link

You may be looking for something like https://github.com/JoshData/pdf-redactor.

@sl2c
Copy link

sl2c commented Jul 20, 2024

Preserving layout, a.k.a. reflow, is a non-trivial operation. Take a look at pdfrwx whose classes provide full support for stream decompression (all PDF stream filters are supported) and parsing, which transforms a PDF stream into an abstract syntax tree (AST; see examples). After that you can focus on what exactly you want to do with the text by working directly with the AST. Once the AST has been edited, you can re-encoded it as stream and save the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants