Skip to content

Python 3.11 decompilation fixes (for/while/try/with/comprehensions/comparisons)#608

Open
goromachine wants to merge 23 commits into
zrax:masterfrom
goromachine:py311-pr
Open

Python 3.11 decompilation fixes (for/while/try/with/comprehensions/comparisons)#608
goromachine wants to merge 23 commits into
zrax:masterfrom
goromachine:py311-pr

Conversation

@goromachine

Copy link
Copy Markdown

Summary

This PR adds comprehensive Python 3.11 decompilation support. Python 3.11
introduced several bytecode changes that break the existing decompiler logic.
All changes are backward-compatible (guarded by version checks where needed).

Validated against 239 real-world Python 3.11 .pyc files from a production
codebase (all 239 produce output that compiles as valid Python 3.11 via
py_compile) and 95 Python 3.11 stdlib modules (45/95 fully clean output).

Note on process: these fixes were developed with significant assistance
from Claude (Anthropic), reflected in the Co-Authored-By lines. The fixes
are individually tested and verified against a regression harness before each
commit.


New opcodes handled

  • MAKE_CELL, COPY_FREE_VARS, LOAD_ASSERTION_ERROR
  • DICT_MERGE/DICT_UPDATE, MAP_ADD, LIST_TO_TUPLE
  • RETURN_GENERATOR (generators/coroutines)
  • POP_JUMP_FORWARD/BACKWARD_IF_TRUE/FALSE/NONE/NOT_NONE

Control flow reconstruction

for loops — 3.11 removes POP_BLOCK; the loop now closes via
JUMP_BACKWARD whose operand is relative (×2, pos − operand). Previously
the loop never closed because the code compared the raw offset against the
absolute start position; code after the loop was silently absorbed into the body.
JUMP_BACKWARD before the loop end is emitted as continue.

while loops — 3.11 bottom-test optimization: POP_JUMP_FORWARD_IF_FALSE end; body; POP_JUMP_BACKWARD_IF_TRUE start. New pre-pass ScanWhileLoops detects the guard+back-edge pair and reconstructs the while block correctly (was emitting two unrelated if statements).

with statement — 3.11 mid-block shape with BEFORE_WITH + exception-table cleanup (no SETUP_WITH or POP_BLOCK).

try/finally — 3.11 exception-table based; no SETUP_FINALLY.

try/except with nested try — the exception table splits a try region into multiple entries sharing one handler. Previously a new spurious try was opened for each entry producing cascading nesting. Now detects continuation of an already-open try.

Chained comparisons (a == b == c) — compiled with COPY/SWAP plus a POP_TOP; JUMP_FORWARD trampoline. New pre-pass ScanChainedCompare resolves the trampoline and emits a == b and b == c.

Exception handling

except ... as e — 3.11 emits PUSH_EXC_INFO; CHECK_EXC_MATCH; STORE_FAST e plus cleanup (LOAD_CONST None; STORE_FAST e; DELETE_FAST e). The cleanup epilogue is now suppressed in output.

raise X from Y — rendered correctly for Python 3 (was Python 2 raise X, Y).

Expressions and comprehensions

SWAP_A — treated as a real stack swap (needed for chained comparisons and starred unpack; was a no-op).

List/dict/set/generator comprehensions — inline the implicit code object call, substitute the .0 argument, support multiple nested filters.

Class kwargs (metaclass= and other keyword arguments in class definitions) — reconstructed from 3.11 KW_NAMES/CALL bytecode.

Correctness fixes

  • Return-in-if: only skip the instruction after return inside an if/else block when it is a real unconditional jump — not arbitrary code.
  • continue in else: emit continue for a backward jump out of an else into the enclosing loop.
  • Function signature: *args placed before keyword-only parameters.
  • Decorator reconstruction: 3.11 no-PUSH_NULL form.
  • Class artifacts: strip __classcell__ / return __class__ epilogue.
  • Module-level returns: strip spurious return None at module level (all nested blocks).
  • EOF guard: don't read past end of bytecode buffer on empty bodies.
  • except boundary: handle clause that raises / clause at function end.

goromachine and others added 23 commits June 4, 2026 23:32
- MAKE_FUNCTION 3.6+ flag bitmask (defaults/kwdefaults/annotations/closure)
- decorator reconstruction for functions and classes (3.11 no-PUSH_NULL form)
- consume NULL before LOAD_BUILD_CLASS so decorated classes reconstruct
- strip 3.11 class artifacts (__classcell__ / return __class__)
- try/except inside loops (exception-table stack_depth > 0)
- inline list comprehensions called as code objects (substitute .0)
- guard extra RETURN_VALUE bytecode read against EOF (fixes empty bodies)
- strip implicit module-level 'return None'
- add opcodes: MAKE_CELL, COPY_FREE_VARS, LOAD_ASSERTION_ERROR, DICT_MERGE/UPDATE,
  MAP_ADD, LIST_TO_TUPLE, RETURN_GENERATOR, POP_JUMP_FORWARD/BACKWARD_IF_NONE/etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
SWAP_A was modelled only as tuple-unpack construction, corrupting the stack
for the 3.11 chained-comparison idiom (SWAP n; COPY n). Implement it as a
genuine stack swap via FastStack::swap.

Harness: +17 files (decompilation target: 212→224, stdlib corpus 13->18), 0 regressions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
NODE_RAISE always joined params with commas (Python 2 syntax). For Python 3,
two params is 'raise X from Y'. Harness: +1, 0 regressions; also fixes the
common 'raise X from None' idiom across many files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n inlining

- PUSH_EXC_INFO pushes an exception sentinel; CHECK_EXC_MATCH keeps it so the
  'as <var>' STORE can bind it; POP_TOP discards it for bare handlers.
- Emit 'except <type> as <var>:' and suppress the compiler cleanup
  (<var> = None; del <var>).
- WITH_EXCEPT_START no longer aliases SETUP_WITH; it consumes the sentinel so
  the with-cleanup never leaks it.
- Detect <setcomp>/<dictcomp> (not just <listcomp>) as comprehensions so
  SET_ADD/MAP_ADD reconstruct them for inlining.

Harness: +1, 0 regressions (foundation for multi-except/finally/with).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.11 'with' compiles to: body -> implicit __exit__(None,None,None) -> JUMP over
an exception-cleanup handler -> resume. A pre-pass (ScanWithBlocks) recognizes
this shape from the exception table: it records the body end and the resume
offset, verifying the handler begins with PUSH_EXC_INFO; WITH_EXCEPT_START and
that the normal-exit jump skips over it. BEFORE_WITH then opens an ASTWithBlock
(the context manager stays on the stack for the STORE/POP_TOP -> expr + 'as'),
and the [bodyEnd, resume) cleanup region is skipped during decompilation.
With-statements without this clean shape are left unhandled (no regression).

Harness: +2, 0 regressions (corpus 19->20, decompilation target: 225→226).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A finally compiles to: try body -> finally body (normal copy) -> JUMP over an
exception handler that duplicates the finally body and re-raises. A pre-pass
(ScanTryFinally) recognizes this from the exception table, distinguishing
finally from bare/typed except by the handler shape after PUSH_EXC_INFO (no
POP_TOP, no CHECK_EXC_MATCH). The try-body entry opens a CONTAINER carrying the
finally end + BLK_TRY ending at the real body end; the try close opens a
BLK_FINALLY for the normal copy; the duplicate exception handler region is
skipped.

Harness: +2, 0 regressions (corpus 20->22, decompilation target 226).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The def/lambda signature printed positional, keyword-only(*), *args, **kwargs,
producing invalid 'def f(*, kw, *args)'. Python order is positional, *args,
keyword-only, **kwargs. Index locals explicitly (they are stored positional,
keyword-only, *args, **kwargs) and emit in source order.

Harness: 0 gate change (affected files have other errors) but fixes incorrect
signatures across many files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A generator expression compiles to a <genexpr> code object that yields instead
of building a comprehension node. SynthGenexpr reconstructs it from the
decompiled for-loop: the FOR block becomes the generator, the yielded value the
result, and a wrapping 'if' the filter. Rendered as an equivalent comprehension
with the real iterable substituted for the implicit '.0'.

Harness: +9, 0 regressions (corpus 22->31).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
__build_class__ keyword arguments (e.g. metaclass=) arrive via a 3.11 KW_NAMES
map at TOS, which broke the build-class detection (consumed as a base / caused
fall-through to a bare __build_class__ call printed as <NODE:27>). Capture the
KW_NAMES map before scanning bases, store it as the class call's kwparams, and
emit 'class X(bases, kw=val):'.

Harness: +3, 0 regressions (corpus 31->33, decompilation target: 226→227).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A list comprehension with an 'if' filter performs LIST_APPEND inside the filter
block, so the normal comprehension build path (which expects the FOR block) is
missed and produced a '[][x]' hack. In a <listcomp> code object, emit the
appended value as a yield-style marker instead, so SynthGenexpr reconstructs the
comprehension together with its filter condition.

Harness: +3, 0 regressions (corpus 33->36).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A comprehension with several 'if' clauses nests the filters as IF blocks inside
the for-loop. findCompYield now descends through nested IF blocks, combining
their conditions with 'and' and honoring negated filters (POP_JUMP_*_IF_TRUE),
so '(x for x in y if a if not b)' reconstructs as 'x for x in y if a and not b'.

Harness: 0 gate change (affected stdlib files have other errors) but fixes
multi-filter comprehensions generally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a with-statement (or try/finally) is nested inside an enclosing try, the
implicit __exit__/finally cleanup region is re-protected by the outer try's
exception-table entry. Processing that entry reopened a spurious try over the
cleanup, leaking 'None(None, None)' / 'if not None:' into the body. Move the
with/finally block close and the cleanup-skip to the top of the loop, before
exception-entry processing, so the block closes and the region is skipped first.

Harness: +3, 0 regressions (corpus 36->37, decompilation target: 227→229).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… end)

When refining the initial type-less except handler, the clause end was taken
from the whole handler region (curblock->end()) instead of the dispatch jump
target. A clause that does not fall through (e.g. ends in 'raise') then never
closed, nesting the following 'except' inside it. Use the dispatch jump target
(offs) as the clause end whenever it is a valid forward offset.

Harness: +2, 0 regressions (decompilation target: 229→231).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 'return' is invalid at module scope, but the implicit 'return None' (and,
with nested ifs, copies of it) can land inside module-level blocks. Recurse
into every nested block (not only the last) and strip each block's trailing
bare return. Only applied to the <module> code object, so it never removes a
real return from a function/class body.

Harness: +4, 0 regressions (corpus 37->40, decompilation target: 231→232).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reconstruction artifacts at module scope can be 'return <expr>' too (invalid
Python). Drop any trailing plain return (rettype RETURN) from module blocks,
not only None returns, since none are legitimate at module scope.

Harness: +1, 0 regressions (decompilation target: 232→233).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In 3.11 a return inside an if/else may fall straight into a sibling
branch. The old code unconditionally consumed the next instruction to
skip a redundant jump; when that instruction was the LOAD_CONST of a
code object feeding a MAKE_FUNCTION (e.g. a list comprehension after
'if not x: return []'), dropping it left MAKE_FUNCTION without its
operand and crashed with std::bad_cast.

Now peek the next instruction and keep it only when it is a LOAD_CONST
of a code object; otherwise preserve the original skip behavior.
Added PycBuffer::pos()/setPos() for safe peeking. decompilation target: 234/239 files, corpus 40/95 (+1, 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o body)

In 3.11 there is no POP_BLOCK, so a for-loop only closed via the
JUMP_BACKWARD that ends its body. The is_jump_to_start test compared
the RELATIVE jump operand against the loop's ABSOLUTE start position,
so it almost never matched and the loop never closed: any statement
after the loop (and even except handlers / the function's return) was
absorbed into the loop body, producing wrong (often still-compiling)
output and breaking nested try/except indentation.

- Compute the real jump target (pos - offs) in 3.10+ and compare to
  the loop start.
- Distinguish the implicit loop-iteration back-jump (pos == block end,
  closes the loop) from an explicit  (earlier, emits continue).
- Guard the BLK_ELSE branch's stack_hist.top() against an empty stack
  (a for nested in a while/else could otherwise crash, e.g. csv).

Fixes the core 'code after for loop' defect across all files. decompilation target: 235/239 files, corpus 41/95 (+2: realization, gettext; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two combined fixes for the 'return inside if inside try / except as e'
pattern:

1. The return-in-if skip logic could consume the PUSH_EXC_INFO that
   immediately follows a return inside an if within a try. Dropping it
   left the handler without its exception sentinel, so the 'as e'
   binding captured a garbage stack value and the handler was mis-nested
   as a statement in the try body. Never skip PUSH_EXC_INFO.

2. Suppress the compiler cleanup 'e = None' when the store value is an
   explicit None constant (LOAD_CONST None; STORE), not only the NULL
   placeholder form. With the binding now correct, this removes the
   spurious 'e = None; del e' tail. decompilation target: 236/239 files, corpus 41/95 (+1: utilities; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Python 3.11 compiles 'while cond:' with a bottom test: an entry guard
(eval cond; POP_JUMP_FORWARD_IF_FALSE end) before the body, and a
back-edge (eval cond; POP_JUMP_BACKWARD_IF_TRUE loop_start) at the end.
pycdc rendered both halves as separate 'if' blocks, so EVERY while loop
came out as 'if cond: ... if not cond: pass' and never looped.

- New ScanWhileLoops pre-pass pairs each conditional backward jump with
  the forward guard immediately preceding its target (guard skipping to
  the instruction after the back-edge) to identify genuine while loops.
- At the guard, open a BLK_WHILE with the condition instead of an if.
- At the back-edge, discard the duplicated condition and close the loop,
  but only when a BLK_WHILE is actually open (guard against a
  misidentified back-edge collapsing the block stack and crashing).

Fixes while loops across all files (including nested loops and
continue). decompilation target: 237/239 files, corpus 41/95 (+1: signature; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When a while/for body's else block ends in a 'continue' (a backward
jump to the enclosing loop, before the loop end) the BLK_ELSE branch
treated it as the natural loop-back that closes the else, dropping the
continue and mis-nesting following statements. Detect the enclosing
loop and, when the jump lands before its end, emit a continue inside
the else instead of closing it (3.11+).

Partial progress on deeply-nested loops (e.g. generator/tasks). decompilation target: 237/239 files, corpus 41/95 (0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Python 3.11 zero-cost exceptions split a single try into multiple
exception-table entries (all targeting the same handler) when the try
body contains a nested try/except: the protected range is broken around
the inner handler. pycdc opened a fresh try/container for each entry,
producing a cascade of spurious nested 'try:' blocks (often with empty
bodies and returns leaking outward).

When an entry's handler target matches an already-open BLK_TRY, treat
it as a continuation of that same try and skip opening a new one.

Fixes nested try/except (try containing a try). decompilation target: 238/239 files;
corpus 43/95 (+3: external_forms, trace, pstats; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Chained comparisons compile to two COMPARE/POP_JUMP_FORWARD_IF_FALSE
halves sharing the middle operand (duplicated via SWAP/COPY) plus a
trailer 'JUMP_FORWARD body; POP_TOP; JUMP_FORWARD exit'. The first
half's jump targets the trampoline POP_TOP, the second targets the
exit, so the merge logic joined them with 'or' and the body leaked out
of the if (and closed enclosing loops early).

New ScanChainedCompare pre-pass records the trailer's leading
JUMP_FORWARD (skipped over the trampoline) and maps the trampoline
POP_TOP to its real exit; a conditional jump landing on the trampoline
is resolved to that exit so both halves merge with 'and'.

Reconstructs a == b == c as 'a == b and b == c'. decompilation target: 239/239 files;
corpus 44/95 (+2: tasks, quopri; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The post-return skip was eating whatever instruction followed a return
inside an if, on the assumption it was a redundant else-skipping jump.
In 3.11 the following instruction is usually real code (commonly the
LOAD_CONST of a trailing 'return <const>'), so the skip silently dropped
it: 'if cond: return 1\n return 0' lost the 'return 0'. Peek and skip
only when the next instruction is an actual unconditional jump.

Fixes silent loss of code after if/return (a very common pattern). decompilation target: 239/239 files, corpus 45/95 (+1: tracemalloc; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant