Need to update to python3 #84

oscaroboto · 2018-11-19T18:02:34Z

There are a number of places that need to be updated for this to work with python3. In particular the print statements. All prints need to be updated to conform to python3 standards. Currently all prints are of the form print 'stuff', this does not work for python3. Convert all of the print to print('stuff').

The text was updated successfully, but these errors were encountered:

aBaechtold · 2018-12-29T21:06:36Z

Hi there,
I gave it a try but stopped after some hours . I'm a fairly new to Python (this is about my second usage after a simple script for decompressing Deflate streams in PDFs), and I hoped that my learnings might help others. So far it seems easier to have also a Python 2.x environment.

It's not just the print statement that was removed in favor of a print function (this one can be handled easily by means of the Tools/2to3.py script), far more impacting is that with python 3 binary data (bytes, bytearray) cannot be treated interchangeably as string.

Most of the file content handling in the scripts is based on strings while the file is read in binary mode ('rb') as Python 2.x did not enforce such strictures. A naïve approach for the conversion to Python 3 might be to replace all literal strings with binary representations e.g. ('...' -> b'...'), however then still need to fix up the places that use string functions such as ord() or unescapeString(), etc...

What I did:
Converted all the scripts with 2to3.py. Fixed an additional incorrect intention level error when launching peepdf.py. Then debugged through PDFCore.py and changed string literals to binary literals (e.g. for tokens). BTW: This applies also to RegEx patterns which work fine with binary data as long as the pattern is of the same type. There are comparisons on a character level which needed to be modified when using bytes instead of strings. This is because someByteVar[0] returns an integer and not a single bytes object e.g. needed to change most such statements from someByteVar[i] to someByteVar[i:i+1]. Some function expected strings and tested for that type, thus had to remove that, also use of string functions were removed e.g. ord().
This got me as far as partially parsing the dictionary of the first PDF object (obj). However, the encounter of unescapeString() in the PDFString class and the breakout into the JSAnalysis.py made me wonder if this approach is fruitful. I think that some attributes of the classes should remain strings (as expected by higher layers maybe) and be decoded from the raw binary data e.g. have a self.rawValue as binary data and self.value as decoded string. Jose Miguel might better know what strategy to use.

I attached the files in case someone wants to have a look at them e.g. compare/diff.
peepdf-master_26Dec2018.zip

skierpage · 2019-11-13T21:18:08Z

@jbremer 's fork of peepdf and others work on Python3 thanks to commits from @Evert0x and others. I got "ValueError: jpeg is required unless explicitly disabled using --disable-jpeg, aborting" trying to install it with easyinstall --user . because that version of peepdf depends on Pillow which has its own installation difficulties; I had to install python-devel and libjpeg-devel on Fedora and still have trouble with lxml if I want XML output.

Somewhere in the many branches of the 123 forks of this project is one that will work, but it's definitely not this one and it's hard to find 🤕 .

jbremer · 2019-11-14T10:32:37Z

That's correct. What about it though? Perhaps would be good to include this information in the README. Personally I only use peepdf in the context of Cuckoo Sandbox and in that case those setup steps are documented in the Cuckoo documentation hehe. Feel free to PR on our fork.

ferpalma21 · 2021-05-11T21:33:16Z

I had to use python2 because was impossible to make it work with python3.

agyss · 2021-09-29T14:17:14Z

Branch from enzok/peepdf is working just fine, works with python 3.9 (tested on 29.9.2021)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need to update to python3 #84

Need to update to python3 #84

oscaroboto commented Nov 19, 2018

aBaechtold commented Dec 29, 2018

skierpage commented Nov 13, 2019 •

edited

Loading

jbremer commented Nov 14, 2019

ferpalma21 commented May 11, 2021

agyss commented Sep 29, 2021

Need to update to python3 #84

Need to update to python3 #84

Comments

oscaroboto commented Nov 19, 2018

aBaechtold commented Dec 29, 2018

skierpage commented Nov 13, 2019 • edited Loading

jbremer commented Nov 14, 2019

ferpalma21 commented May 11, 2021

agyss commented Sep 29, 2021

skierpage commented Nov 13, 2019 •

edited

Loading