Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate font issues with AntennaAttributionDocumentReporter #2755

Closed
sschuberth opened this issue Jun 22, 2020 · 7 comments
Closed

Investigate font issues with AntennaAttributionDocumentReporter #2755

sschuberth opened this issue Jun 22, 2020 · 7 comments
Assignees
Labels
bug Issues that are considered to be bugs reporter About the reporter tool

Comments

@sschuberth
Copy link
Member

I looks like some license texts contain characters which are missing in the font(s) we are using to generate the attribution document(s):

07:10:15.815 [main] WARN  org.apache.pdfbox.pdmodel.font.FileSystemFontProvider - New fonts found, font cache will be re-built
07:10:15.815 [main] WARN  org.apache.pdfbox.pdmodel.font.FileSystemFontProvider - Building on-disk font cache, this may take a while
07:10:15.866 [main] WARN  org.apache.pdfbox.pdmodel.font.FileSystemFontProvider - Finished building on-disk font cache, found 6 fonts
07:10:15.867 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-Roman
07:10:15.869 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-Bold
07:10:15.869 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-Italic
07:10:15.869 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Times-BoldItalic
07:10:15.870 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica
07:10:15.870 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica-Bold
07:10:15.871 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica-Oblique
07:10:15.871 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Helvetica-BoldOblique
07:10:15.871 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier
07:10:15.872 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier-Bold
07:10:15.872 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier-Oblique
07:10:15.872 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Courier-BoldOblique
07:10:15.873 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font Symbol
07:10:15.873 [main] WARN  org.apache.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for base font ZapfDingbats
07:10:19.295 [main] ERROR org.ossreviewtoolkit.commands.ReporterCommand - Could not create 'AntennaAttributionDocument' report: IllegalArgumentException: U+2028 ('.notdef') is not available in this font Times-Roman encoding: WinAnsiEncoding

We should find out what the root cause is, and if it also occurs with the same license text(s) in Antenna upstream / stand-alone.

PDFs typically only embed the used characters of a font to save space, but as we don't know beforehand which character will be present in the license text(s), some characters might be missing. So maybe the solution simply is to enforce embedding all characters of all fonts in the PDF templates.

@sschuberth
Copy link
Member Author

sschuberth commented Jun 25, 2020

@sschuberth
Copy link
Member Author

... or we simply need to set the encoding to UTF8 instead of WinAnsi.

sschuberth added a commit that referenced this issue Jun 25, 2020
This extends the existing work-around until we have a proper fix for
issue #2755.

Signed-off-by: Sebastian Schuberth <[email protected]>
sschuberth added a commit that referenced this issue Jun 25, 2020
This extends the existing work-around until we have a proper fix for
issue #2755.

Signed-off-by: Sebastian Schuberth <[email protected]>
@bs-ondem
Copy link
Member

This issue also occurs with antenna when running the AttributionDocumentGenerator with a license text containing the unicode U-2028. I have to explain how Antenna loads the fonts so that you can understand the cause of this issue.

The class Templates is responsible to load/store the 4 template PDF files and the fonts types sans, sans-bold, bold-italic and sans-italic which are provided with the implementation of the interface TemplateBundle. But if there is no font provided, then the Templates class will load the ..basic'' Times Roman font provided from the pdfbox dependency. This ,,basic'' font is incomplete and therefore the error message with ...U+2028 ('.notdef') is not available in this font Times-Roman... occurs.

I downloaded a complete Times New Roman font and added this to the implementation of BasicPDFTemplateBundle which solved the issues with the unicode U+2028, but i'm not sure if all unicodes are covered. And I also don't know if we can push fonts into the Antenna project?!

@sschuberth
Copy link
Member Author

As discussed offline, looking at the Using fallback font LiberationSans for base font ... warnings the root cause seems to be that the mentioned fonts are not provided by the operating system (here: Docker container based on Ubuntu), and installing them should solve this.

sschuberth added a commit that referenced this issue Jun 26, 2020
@bs-ondem
Copy link
Member

bs-ondem commented Jun 26, 2020

The unicodes \u0009, \u0092, \u009d, \u00a0, \u00ad and \037e are available in the font LiberationSans, the other ones from the commit b9db19 are not available.

@sschuberth
Copy link
Member Author

Relates to eclipse-archived/antenna#548.

@sschuberth
Copy link
Member Author

Tracking this only as the above-mentioned upstream issue in Antenna.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues that are considered to be bugs reporter About the reporter tool
Projects
None yet
Development

No branches or pull requests

2 participants