Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fields with periods are truncated #324

Open
terrafrost opened this issue Dec 28, 2023 · 1 comment
Open

fields with periods are truncated #324

terrafrost opened this issue Dec 28, 2023 · 1 comment

Comments

@terrafrost
Copy link

terrafrost commented Dec 28, 2023

So I have a PDF with just one field on it - a field named "xxx.yyy". When I run pdf2json 3.0.5 on the PDF I'm told that the only field on that PDF is "yyy".

test.pdf demonstrates the problem.

Here's what Adobe Acrobat Pro 2020 shows:

image

pdftk 2.02 also finds "xxx.yyy" when I run pdftk test.pdf dump_data_fields:

FieldType: Text
FieldName: xxx.yyy
FieldFlags: 0
FieldJustification: Left

Unfortunately, pdftk doesn't return the coordinates whereas pdf2json does.

According to qpdf test.pdf --json the field's alternativename, fullname and mappingname are "xxx.yyy" whereas the partialname is "yyy" so maybe that's the issue?

@terrafrost
Copy link
Author

terrafrost commented Jan 3, 2024

So I used qpdf's QDF mode (qpdf test.pdf --qdf test.qdf) to further dig into this and I guess the issue is that when there are dots the dots are treated as parent objects.

%% Object stream: object 7, index 2; original object ID: 24
<<
  /DA (/Helv 12 Tf 0 g)
  /F 4
  /FT /Tx
  /MK <<
  >>
  /P 21 0 R
  /Parent 17 0 R
  /Rect [
    190.784
    658.903
    340.784
    680.903
  ]
  /Subtype /Widget
  /T (yyy)
  /Type /Annot
>>

So if you look at the /T tag in isolation you get yyy. The xxx is due to the /Parent 17 0 R bit:

%% Object stream: object 17, index 0; original object ID: 10
<<
  /Kids [
    7 0 R
  ]
  /T (xxx)
>>

So I guess what pdf2json needs to do is to recursively go back and find each parent until there is no parent and it needs to prepend each parent to the /T tag with dots separating each part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant