Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to update to python3 #84

Open
oscaroboto opened this issue Nov 19, 2018 · 5 comments
Open

Need to update to python3 #84

oscaroboto opened this issue Nov 19, 2018 · 5 comments

Comments

@oscaroboto
Copy link

There are a number of places that need to be updated for this to work with python3. In particular the print statements. All prints need to be updated to conform to python3 standards. Currently all prints are of the form print 'stuff', this does not work for python3. Convert all of the print to print('stuff').

@aBaechtold
Copy link

Hi there,
I gave it a try but stopped after some hours . I'm a fairly new to Python (this is about my second usage after a simple script for decompressing Deflate streams in PDFs), and I hoped that my learnings might help others. So far it seems easier to have also a Python 2.x environment.

It's not just the print statement that was removed in favor of a print function (this one can be handled easily by means of the Tools/2to3.py script), far more impacting is that with python 3 binary data (bytes, bytearray) cannot be treated interchangeably as string.

Most of the file content handling in the scripts is based on strings while the file is read in binary mode ('rb') as Python 2.x did not enforce such strictures. A naïve approach for the conversion to Python 3 might be to replace all literal strings with binary representations e.g. ('...' -> b'...'), however then still need to fix up the places that use string functions such as ord() or unescapeString(), etc...

What I did:
Converted all the scripts with 2to3.py. Fixed an additional incorrect intention level error when launching peepdf.py. Then debugged through PDFCore.py and changed string literals to binary literals (e.g. for tokens). BTW: This applies also to RegEx patterns which work fine with binary data as long as the pattern is of the same type. There are comparisons on a character level which needed to be modified when using bytes instead of strings. This is because someByteVar[0] returns an integer and not a single bytes object e.g. needed to change most such statements from someByteVar[i] to someByteVar[i:i+1]. Some function expected strings and tested for that type, thus had to remove that, also use of string functions were removed e.g. ord().
This got me as far as partially parsing the dictionary of the first PDF object (obj). However, the encounter of unescapeString() in the PDFString class and the breakout into the JSAnalysis.py made me wonder if this approach is fruitful. I think that some attributes of the classes should remain strings (as expected by higher layers maybe) and be decoded from the raw binary data e.g. have a self.rawValue as binary data and self.value as decoded string. Jose Miguel might better know what strategy to use.

I attached the files in case someone wants to have a look at them e.g. compare/diff.
peepdf-master_26Dec2018.zip

@skierpage
Copy link

skierpage commented Nov 13, 2019

@jbremer 's fork of peepdf and others work on Python3 thanks to commits from @Evert0x and others. I got "ValueError: jpeg is required unless explicitly disabled using --disable-jpeg, aborting" trying to install it with easyinstall --user . because that version of peepdf depends on Pillow which has its own installation difficulties; I had to install python-devel and libjpeg-devel on Fedora and still have trouble with lxml if I want XML output.

Somewhere in the many branches of the 123 forks of this project is one that will work, but it's definitely not this one and it's hard to find 🤕 .

@jbremer
Copy link

jbremer commented Nov 14, 2019

That's correct. What about it though? Perhaps would be good to include this information in the README. Personally I only use peepdf in the context of Cuckoo Sandbox and in that case those setup steps are documented in the Cuckoo documentation hehe. Feel free to PR on our fork.

@ferpalma21
Copy link

I had to use python2 because was impossible to make it work with python3.

@agyss
Copy link

agyss commented Sep 29, 2021

Branch from enzok/peepdf is working just fine, works with python 3.9 (tested on 29.9.2021)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants