Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal images support #91

Closed
wants to merge 7 commits into from
Closed

Minimal images support #91

wants to merge 7 commits into from

Conversation

webcarrot
Copy link

Minimal/primitive images support:

  • images put into formImage.Pages[].Images[]
  • does not analyze, verify or convert image data
  • omits position, scaling etc
  • skip groups, masks and inline images
  • omits duplicates

Typical output:

{
  "formImage": {
    "Transcoder": "[email protected] [https://github.com/modesty/pdf2json]",
    "Agency": "",
    "Id": {
      "AgencyId": "",
      "Name": "",
      "MC": false,
      "Max": 1,
      "Parent": ""
    },
    "Pages": [
      {
        "Height": 52.625,
        "HLines": [],
        "VLines": [],
        "Fills": [
          {
            "x": 0,
            "y": 0,
            "w": 0,
            "h": 0,
            "clr": 1
          }
        ],
        "Texts": [],
        "Fields": [],
        "Boxsets": [],
        "Images": [
          {
            "src": ""
          },
          {
            "width": 1,
            "height": 1,
            "data": {
              "0": 255,
              "1": 255,
              "2": 255,
              "3": 255
            }
          }
        ]
      }
    ],
    "Width": 37.188
  }
}
`

@modesty
Copy link
Owner

modesty commented Apr 22, 2017

base64 encoded image output, nice add-on, thanks. It'd be less impact on current clients if it's optional, controlled by a new command line arg, for example.

@jonaskello
Copy link

What is the status of this PR? Seems there is still not any support for parsing information about images?

@jonaskello
Copy link

@modesty If we fix the merge conflicts of this PR would it be possible to merge it?

@fancydev18
Copy link

Please, this would be so useful! Would it be possible to merge it? @modesty

@modesty
Copy link
Owner

modesty commented Nov 25, 2021

please, fix the conflicts first

@jonaskello
Copy link

@modesty I opened a new PR in #253 that has conflicts resolved. Could you please review it?

@webcarrot
Copy link
Author

-> #253

@webcarrot webcarrot closed this Nov 25, 2021
@modesty
Copy link
Owner

modesty commented Nov 25, 2021

a few notes:

  1. Minimal images support (updated) #253 still has conflicts. An easier way would be to create a new branch based off master then reapply change and submit new PR
  2. More importantly, JSON is data exchange text format, generally not for binary data, including images. A preferred way is to create separate image files, like how other parsed files work
  3. Another reason for supporting image separately is performance, it should be an opted-in process for requested caller, needs to be off by default for common use cases to avoid unnecessary CPU cycles and memory usage. A new command line switch is recommended
  4. Coding wise, lib/imagedata.js is a new file with single function exported while referencing this internally. Better to encapsulate all image related data and ops in lib/pdfimage.js file, extending it with functionality of your lib/imagedata.js and function addImage(image, images)
  5. needs sample testing PDF. you can add it to test/pdf/misc and add more test script

thanks.

@jonaskello
Copy link

@modesty Thanks for the input, I'll try to find some time to go through all of your feedback :-). Just a quick note about #253, when I view the PR github says "This branch has no conflicts with the base branch". I'm not sure where to look for the conflicts you mentioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants