Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghostscript crashes when writing XMP metadata with Asian characters #5

Open
2 tasks
matteosecli opened this issue Mar 12, 2018 · 0 comments
Open
2 tasks
Assignees
Labels
bug ghostscript The bug is caused by Ghostscript, not directly by pdf2archive itself.

Comments

@matteosecli
Copy link
Owner

matteosecli commented Mar 12, 2018

It appears that GS crashes anyway, but 9.14produces a valid PDF/A-1B result while 9.22 does not. In particular, version 9.14 copies the correct character in the XMP metadata while 9.22 does not (causing the validation failure).

TODO:

  • Report to the GS guys.
  • Check for possible solutions from my side.

Original file:
1309.4626.pdf

Converted file with metadata preservation (gs 9.22):
1309.4626-PDFA.pdf
GS crashes and the file is not PDF/A-1B compliant but it's a valid PDF. The XMP metadata is not correct.

Converted file with metadata preservation (gs 9.14):
1309.4626-PDFA_914.pdf
GS crashes but the file is a valid PDF/A-1B. The XMP metadata is correct.

Converted file with metadata reset (--cleanmetadata, gs 9.22)
1309.4626-PDFA_clean.pdf
GS does not crashes and the file is a valid PDF/A-1B.

Conversion output: (click to show)

$ ./pdf2archive --debug --validate 1309.4626.pdf 
=== Welcome to PDF2ARCHIVE ===
  DEBUG: running PDF2ARCHIVE, version 0.3
  DEBUG: using Ghostscript binary at /usr/local/bin/gs, version 9.22
  DEBUG: the input file is '1309.4626.pdf'
  DEBUG: the output file is '1309.4626-PDFA.pdf'
  DEBUG: the intermediate processing file is /var/folders/r2/21fdm8ds1rlc552vl1mcnqx40000gn/T/tmp.8nfVxnqp
  DEBUG: the temporary directory is /var/folders/r2/21fdm8ds1rlc552vl1mcnqx40000gn/T/tmp.hmUdqwki
  DEBUG: the current quality options are ''
  DEBUG: PDF title ''
  DEBUG: PDF author 'Md. Mohi Uddin'
  DEBUG: PDF subject ''
  DEBUG: PDF keywords ''
  DEBUG: PDF creator 'Word u( Acrobat PDFMaker 8.1'
  DEBUG: PDF producer 'Acrobat Distiller 8.1.0 (Windows)'
  DEBUG: PDF creation date 'D:20130917195031+09'00''
  DEBUG: PDF modification date 'D:20130917195112+09'00''
  DEBUG: PDF trapping ''
  Creating the definition file...
  Compressing PDF & embedding fonts...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Processing pages 1 through 12.
Page 1
Querying operating system for font files...
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
Loading NimbusRoman-Bold font from /usr/local/Cellar/ghostscript/9.22/share/ghostscript/9.22/Resource/Font/NimbusRoman-Bold... 5088616 3528885 2680024 1311425 2 done.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 2
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 3
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 4
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 5
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 6
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 7
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 8
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 9
Substituting font Times-Bold for TimesNewRomanPS-BoldMT.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 10
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 11
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 12
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

  Converting to PDF/A-1B...
GPL Ghostscript 9.22 (2017-10-04)
Copyright (C) 2017 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Processing pages 1 through 12.
Page 1
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 2
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 3
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 4
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 5
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 6
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 7
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 8
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 9
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 10
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 11
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Page 12
GPL Ghostscript 9.22: 

Use of -dUseCIEColor detected!
Since the release of version 9.11 of Ghostscript we recommend you do not set
-dUseCIEColor with the pdfwrite/ps2write device family.

Error: /syntaxerror in -file-
Operand stack:
   --nostringval--   Title   ()   Author   (Md. Mohi Uddin)   Subject   ()   Keywords   ()   Creator
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1999   1   3   %oparray_pop   1998   1   3   %oparray_pop   1982   1   3   %oparray_pop   1868   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push
Dictionary stack:
   --dict:987/1684(ro)(G)--   --dict:0/20(G)--   --dict:79/200(L)--
Current allocation mode is local
Current file position is 869
GPL Ghostscript 9.22: Unrecoverable error, exit code 1
  Removing temporary files...
  Done, now ESSE3 is happy! ;)
  Validating resulting file...
  FAIL /Users/matteo/Dropbox/UNI/MScThesis/Elaborato/LaTeX/pdf2archive/1309.4626-PDFA.pdf
  FAIL 6.2.3-2
  FAIL 6.2.3-4
  FAIL 6.7.3-1


Original file metadata:

$ exiftool -a -G1 1309.4626.pdf 
[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : 1309.4626.pdf
[System]        Directory                       : .
[System]        File Size                       : 519 kB
[System]        File Modification Date/Time     : 2017:11:02 17:02:04+01:00
[System]        File Access Date/Time           : 2018:03:12 15:23:07+01:00
[System]        File Inode Change Date/Time     : 2018:03:12 15:23:01+01:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.4
[PDF]           Linearized                      : Yes
[PDF]           Tagged PDF                      : Yes
[PDF]           Page Count                      : 12
[PDF]           Page Layout                     : OneColumn
[PDF]           Create Date                     : 2013:09:17 19:50:31+09:00
[PDF]           Author                          : Md. Mohi Uddin
[PDF]           Creator                         : Word 用 Acrobat PDFMaker 8.1
[PDF]           Producer                        : Acrobat Distiller 8.1.0 (Windows)
[PDF]           Modify Date                     : 2013:09:17 19:51:12+09:00
[PDF]           Source Modified                 : D:20130917104643
[PDF]           Title                           : 
[XMP-x]         XMP Toolkit                     : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
[XMP-pdf]       Producer                        : Acrobat Distiller 8.1.0 (Windows)
[XMP-pdfx]      Source Modified                 : D:20130917104643
[XMP-xmp]       Create Date                     : 2013:09:17 19:50:31+09:00
[XMP-xmp]       Creator Tool                    : Word 用 Acrobat PDFMaker 8.1
[XMP-xmp]       Modify Date                     : 2013:09:17 19:51:12+09:00
[XMP-xmp]       Metadata Date                   : 2013:09:17 19:51:12+09:00
[XMP-xmpMM]     Document ID                     : uuid:c1155296-9b86-4283-b549-b3f53693a7dc
[XMP-xmpMM]     Instance ID                     : uuid:acf3e0f9-3b0e-40d0-8ae6-c0c5d27a48c6
[XMP-xmpMM]     Subject                         : 21
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Creator                         : Md. Mohi Uddin
[XMP-dc]        Title                           : 

Converted file metadata (notice the unknown character in the XMP metadata; the Info dictionary, instead, is preserved correctly):

$ exiftool -a -G1 1309.4626-PDFA.pdf 
[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : 1309.4626-PDFA.pdf
[System]        Directory                       : .
[System]        File Size                       : 441 kB
[System]        File Modification Date/Time     : 2018:03:12 15:23:45+01:00
[System]        File Access Date/Time           : 2018:03:12 15:23:48+01:00
[System]        File Inode Change Date/Time     : 2018:03:12 15:23:45+01:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.4
[PDF]           Linearized                      : No
[PDF]           Page Count                      : 12
[PDF]           Producer                        : GPL Ghostscript 9.22
[PDF]           Create Date                     : 2018:03:12 15:23:43+01:00
[PDF]           Modify Date                     : 2018:03:12 15:23:43+01:00
[PDF]           Author                          : Md. Mohi Uddin
[PDF]           Creator                         : Word 用 Acrobat PDFMaker 8.1
[PDF]           Title                           : 
[XMP-x]         XMP Toolkit                     : XMP toolkit 2.9.1-13, framework 1.6
[XMP-pdf]       Producer                        : GPL Ghostscript 9.22
[XMP-xmp]       Modify Date                     : 2018:03:12 15:23:43+01:00
[XMP-xmp]       Create Date                     : 2018:03:12 15:23:43+01:00
[XMP-xmp]       Creator Tool                    : Word � Acrobat PDFMaker 8.1
[XMP-xmpMM]     Document ID                     : uuid:844b5246-5e1d-11f3-0000-eb0c71bba29b
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Title                           : 
[XMP-dc]        Creator                         : Md. Mohi Uddin
[XMP-pdfaid]    Part                            : 1
[XMP-pdfaid]    Conformance                     : B

By using

./pdf2archive --cleanmetadata --title="用" --debug --validate 1309.4626.pdf 

GS does not crashes, but the resulting file is still not a PDF/A-1B valid file. I suspect this is because I cannot really write Asian characters in the terminal (I have the wrong locale), or maybe they are just not parsed correctly. In fact, the result is:

[ExifTool]      ExifTool Version Number         : 10.80
[System]        File Name                       : 1309.4626-PDFA.pdf
[System]        Directory                       : .
[System]        File Size                       : 442 kB
[System]        File Modification Date/Time     : 2018:03:12 15:35:49+01:00
[System]        File Access Date/Time           : 2018:03:12 15:35:52+01:00
[System]        File Inode Change Date/Time     : 2018:03:12 15:35:49+01:00
[System]        File Permissions                : rw-r--r--
[File]          File Type                       : PDF
[File]          File Type Extension             : pdf
[File]          MIME Type                       : application/pdf
[PDF]           PDF Version                     : 1.4
[PDF]           Linearized                      : No
[PDF]           Page Count                      : 12
[PDF]           Producer                        : GPL Ghostscript 9.22
[PDF]           Create Date                     : 2018:03:12 15:35:47+01:00
[PDF]           Modify Date                     : 2018:03:12 15:35:47+01:00
[PDF]           Author                          : 
[PDF]           Creator                         : 
[PDF]           Title                           : çfl¨
[PDF]           Subject                         : 
[PDF]           Trapped                         : 
[XMP-x]         XMP Toolkit                     : XMP toolkit 2.9.1-13, framework 1.6
[XMP-pdf]       Producer                        : GPL Ghostscript 9.22
[XMP-pdf]       Keywords                        : 
[XMP-xmp]       Modify Date                     : 2018:03:12 15:35:47+01:00
[XMP-xmp]       Create Date                     : 2018:03:12 15:35:47+01:00
[XMP-xmp]       Creator Tool                    : 
[XMP-xmpMM]     Document ID                     : uuid:33d4f44d-5e1f-11f3-0000-eb0c71bba29b
[XMP-dc]        Format                          : application/pdf
[XMP-dc]        Title                           : �
[XMP-dc]        Creator                         : 
[XMP-dc]        Description                     : 
[XMP-pdfaid]    Part                            : 1
[XMP-pdfaid]    Conformance                     : B
@matteosecli matteosecli added bug ghostscript The bug is caused by Ghostscript, not directly by pdf2archive itself. labels Mar 12, 2018
@matteosecli matteosecli added this to the PDF2ARCHIVE v0.4 milestone Mar 12, 2018
@matteosecli matteosecli self-assigned this Mar 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ghostscript The bug is caused by Ghostscript, not directly by pdf2archive itself.
Projects
None yet
Development

No branches or pull requests

1 participant