-
-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating PDF/A conforming PDFs #630
Comments
I opened a ticket on PDF X/3 compliance: #640 Perhaps to start the discussion on what direction WeasyPrint should take, it may be worthwhile to collect the purpose of the different standards: PDF A -> a standard used predominantly for document archiving For detailed differences on the two standards, see page 17 of this document: https://www.impressed.de/DOWNLOADS/pdfToolbox_Server/callas_pdfEngine_Reference.pdf I attach great importance to PDF X, as I believe achieving full print-compliance is an absolute necessity for a mature PDF creation/conversion tool. |
I've tried to give Acrobat various PDF files generated by WeasyPrint… It's awful, there are many, many, many things to fix before reaching PDF/A or PDF/X conformance.
I agree, but there's a long way waiting for us. |
Hi - opening this can of worms - can we list the things needed to conform to PDF/A? |
🐛🐛🐛🐛🐛🐛🐛🐛
That would be really useful.
I don’t really remember, but I think that there’s a PDF validator in Acrobat (not in Reader, it’s not free 😢). Does anyone know an open source (or at least free) tool to check PDF/A and PDF/X conformance?
As far as I can remember, there were lots of errors, and most of them were just impossible to fix with Cairo. I think that we need a dedicated PDF generator for that (see #841). |
I seem to recall Apache PDFBox having some features, I'll have to check better though.
Maybe this is another use for a post-processor that would parse through the pdf and do what is needed. Seems like a massive undertaking though if it is supposed to support changing everything to be pdf/a compliant. Might be smart to start by being able to convert simple pdf's that don't include edge cases like embedded video etc. edit: I was looking through #841 and must say I somewhat disagree about getting rid of external dependencies unless they're proving to be severe limiting factors (Maybe Cairo is?). They're literally what make big opensource projects viable and not just a massive liability to the developers. |
The current post-processor only knows how to parse PDF files generated by Cairo. It removes a lot of edge cases.
Of course, removing all external dependencies is not a goal per se. But there are some reasons why it would be interesting to consider getting rid of some of them:
So. Here’s what I think.
|
Ok, I understand and agree with your points.
I agree to steer away from any freemium solutions as they tend to become a liability down the road when they refuse to push features to their "community" versions. Do you see this new PDF generator as a separate project or would it be part of WeasyPrint? |
👍
It can be a separate project, with a quite low-level API. The hard part is probably to handle fonts, by creating a PangoCairo equivalent. (If anyone knows how to convert PDF to PNG in pure Python, that would be useful too 😒.) |
I found an opensource PDF/A conformance checker that is pretty cool: https://verapdf.org/ |
That’s really cool, thanks!
That’s really impressive. Having PDF/A conformance is probably one of the best features we can get once we have a new PDF generator. I’m currently working on that 😉. (That = the generator, not the PDF/A conformance yet) |
Cool, do you have an open repo for it yet? I had been pondering the same. |
@liZe is teasing a lot about this new generator. If you need help let me know 😄 |
How is it going? |
Pretty well! The new PDF generator (called pydyf) is now used in An online PDF validator thinks that many PDF files we generate are already PDF/A compliant, but I suppose that we still have a lot of work (for tags at least, I think). We have to check with veraPDF too. As explained in #1232, the next step is to have a |
If I get the latest version from Conda is this already inside? Because I've been trying to produce quite simple (no images or weird components) PDF/A compliant files and from the file info I can see that the version is only 1.5 and they're not PDF/A compliant. :( So maybe the version that I'm using (52.4) still does not include pydyf support? |
Hello @guidocioni! The latest version on Conda (52.5) doesn’t include pydyf. All 52.x versions are using (and will use) Cairo. Currently there is no release working with pydyf, but the current |
Would be good, the problem is that where I'm deploying this I can only use conda to install anything :D Is there a way to install the master with conda? As you can imagine also converting a PDF to PDF/A using solely conda/python installation is kind of a nightmare :D |
I don’t think there is an easy way to install the master branch directly with Conda, but you can use pip in a Conda environment and so install the master branch with pip. |
eh eh I wish it would be so easy. Unfortunately I can only give a list of dependencies to install through conda forge and access a Python environment running with Spark. No access to pip or the underlying unix system. Thanks for the help anyway! I hope someday this will make its way in the stable release |
@grewn0uille I managed to install the latest 53.0b1 version (which uses
any idea where are those coming from? |
They come from a bug that’s just been fixed by f804d59. Thanks a lot for the report! |
And there’s a long way ahead… But at least now we can generate the PDF we want. |
Hello! (The survey is now closed. Thanks for all your answers! We’ll share the results soon 😉) If you’re interested in PDF/A compliance, we created a short survey where you can give a boost to this feature and help us to improve WeasyPrint 😉 Vote for it! |
So is there no way to force |
It can’t be controlled right now, at least without code being added to WeasyPrint. |
Ok thanks. For the moment I'm using ghostscript piping input and output, where the input is a temporary file where weasyprint writes and the output is in the filesystem, to directly convert what's coming out of weasyprint to PDF/A but of course it would be amazing to have such a feature built-in the tool. Anyway keep up the good work! |
Hi can you share this code like how are you converting an existing pdf to PDF/A using ghost script as i am trying it is not working for me |
This is something I used in the past but I'm not sure it is still working now import subprocess
import os
def convert_to_pdfa(sourceFile, targetFile):
ghostScriptExec = ['gs', '-dPDFA', '-dBATCH', '-dNOPAUSE',
'-sColorConversionStrategy=UseDeviceIndependentColor',
'-sDEVICE=pdfwrite', '-dPDFACompatibilityPolicy=2']
# because of a ghostscript bug, which does not allow parameters that are longer than 255 characters
# we need to perform a directory changes, before we can actually return from the method
cwd = os.getcwd()
os.chdir(os.path.dirname(targetFile))
try:
subprocess.check_output(ghostScriptExec +
['-sOutputFile=' + os.path.basename(targetFile), sourceFile])
except subprocess.CalledProcessError as e:
raise RuntimeError("command '{}' return with error (code {}): {}".format(
e.cmd, e.returncode, e.output))
os.chdir(cwd) |
Hi thanks for this solution I tried with different policy and multiple changes to make the file PDF/A-3B compliant and Vera PDF validated it I am trying to look for a way to attach an XML to it like embedd and XML in it to make this with Factur-X standard, Any suggestion or help is highly appreciated. Thanks |
@winklemint WeasyPrint does not use GitHub discussions but maybe you can open an issue about Factur-X support. My idea is to gather snippets and advice how to generate Factur-X PDFs using WeasyPrint. |
Is it possible to generate PDFs that conform to PDF/A using Weasyprint?
From wikipedia:
Many Thanks
The text was updated successfully, but these errors were encountered: