Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix when parsing NOP #507

Merged
merged 2 commits into from
Dec 1, 2024
Merged

Fix when parsing NOP #507

merged 2 commits into from
Dec 1, 2024

Conversation

c10udlnk
Copy link
Contributor

I found that uncompyle6 will raise error when parsing NOP opcode, like this:

# uncompyle6 version 3.9.2
# Python bytecode version base 2.7 (62211)
# Embedded file name: <frozen XXX>

-- Stacks of completed symbols:
START ::= |- stmts . 
Instruction context:
-> 
 L.   1         0  NOP              
                   1  NOP              
                   2  NOP              
                   3  NOP              
                   4  NOP              
                   5  NOP              
                   6  NOP              
                   7  NOP              
                   8  NOP              
                   9  NOP              
                  10  NOP              
                  11  NOP              


# file XXX_res.pyc
# Parse error at or near 'NOP' instruction at offset 0

NOP is a very useful opcode when unpacking some .pyc shells, and I added a few code in case someone needs it. If nop_stmt shouldn't be here, please tell me or modify it directly. Thank you!

Fix error when parsing NOP opcode
@rocky
Copy link
Owner

rocky commented Nov 29, 2024

I have a slightly uneasy feeling about this. We can try it; should there be a problem it can be removed or changed.
Would you write a test case for this?

@c10udlnk
Copy link
Contributor Author

test case 1

Here is a short test case, which is a function of the process file I am unpacking (Added Python2 header to use pydisasm): test.pyc.zip

That is a shell which can encrypt the function code. Its bytecode structure is as follows:

  3:           0 LOAD_GLOBAL          (shell_begin)
               3 CALL_FUNCTION        (0 positional, 0 named)
               6 POP_TOP
               7 NOP
               8 NOP
               9 NOP
              10 NOP
              11 NOP
              12 NOP
              13 SETUP_FINALLY        (to 121)
              # ... encrypted function code
         >>  121 LOAD_GLOBAL          (shell_end)
             124 CALL_FUNCTION        (0 positional, 0 named)
             127 POP_TOP
             128 END_FINALLY
             129 LOAD_CONST           (None)
             132 RETURN_VALUE
             133 NOP
             134 NOP
             135 NOP
             136 NOP

The .pyc file above was a code object after I decrypted the encrypted code in the middle. Now I need to nop its wrap bytecodes (from offset 0 to 16, and from offset 121 to 137), so that it can run normally.

If the wrap bytecodes at the beginning is deleted directly, it will affect the opcodes using absolute offset in co_code (like JUMP_ABSOLUTE, POP_JUMP_IF_FALSE and etc.). So changing them directly to nop is a very useful fix (and convenient fix also).

After adding my PR's changes into my local uncompyle6, it works perfectly when processing such bytecodes.

test case 2

Another test case is, some tools used to obfuscate .pyc (like pyc_obscure and my fix version of it) will add some bytecodes like this to certain parts of the function bytecodes to prevent decompilation tools such as uncompyle6 from decompiling it:

              98 POP_JUMP_IF_FALSE    (to 118)
# obfuscation start
             100 JUMP_ABSOLUTE        (to 104)
             102 <223>                61
# obfuscation end
         >>  104 LOAD_NAME            (print)

It adds some random bytes which makes decompilation tools confuse and uses JUMP_ABSOLUTE to jump them to ensure the code runs properly. When dealing with this kind of obfuscation, it is also useful to directly modify them to NOP to avoid processing the parameters of the opcodes using absolute offset. The above bytecodes is from a test case using python 3.7 and the same error occurs at NOP.

Test files, contains the obfuscated .pyc and the .pyc after fixing: test2.pyc.zip


These test cases are not in normal .pyc files, but may be used in some situations where anti-reverse-engineering is required. I am not quite sure whether the changes in the PR will affect the decompilation of other statements, although there are currently no existing Python statements (maybe?) that start with NOP opcode.

@rocky
Copy link
Owner

rocky commented Nov 30, 2024

Please create a PR that adds the bytecode in test.pyc.zip and/or test2.pyc.zip and put them in the appropriate test/bytecode-2x directory.

I am not quite sure whether the changes in the PR will affect the decompilation of other statements, although there are currently no existing Python statements (maybe?) that start with NOP opcode.

That is why we have tests in the first place, that is why I said I had some trepidation about this.

Right now, CI says that we are good. If it turns out in the future there is some problem with this, we can remove this. By having the test and the change be in the same merge, I can make sure to get everything should this need to be removed.

Add test cases for check NOP opcode
@c10udlnk
Copy link
Contributor Author

c10udlnk commented Dec 1, 2024

Okay, I have narrowed the test cases (using Python2.7 and Python3.6) and added them into this PR. Thanks for your patience and responsibility!

@rocky
Copy link
Owner

rocky commented Dec 1, 2024

Thanks - onward and upward.

(Future decompilers will have a lot more flexibility in specifying which portions of code to decompile and what constructs we are looking for.)

@rocky rocky merged commit 5e6fad2 into rocky:master Dec 1, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants