The following code snippet presents the first step of this challenge.
It’s a highly nested Python lambda function. We need to find an URL
pointing to a file so that we can pass to the second step.
The first thing I’ve noticed is that the function takes 3 arguments g,
c, d and it has a default argument $ that is initialized to None.
In order to understand what does the function, I adopted a bottom-up
approach to dissect it, from the most nested one to the first one.
In total, there are 7 lambda functions. I present first the final
pseudocode and the details of each function comes after. The pseudocode
is the following:
list0 adds to the dictionary a list of integers with key ’l0’.
list1 adds another list of integers with key ’l1’.
ziplist applies Python built-in function ZIP to ’l0’ and ’l1’
setIto0 adds to the dictionary an element with value 0 (by XORing
the argument d with itself) and key ’i’. ’i’ is used later as the
counter of iteration.
s=listofZip creates a list ’s’ from the list of tuples created by
’ziplist’
s=listXorC applies XOR operation on each element of list ’s’ with
the argument c in condition that the argument $g\%4 != 0$
dollar=reversedS sets ’$’ to the character chain that contains
each element of list ’s’ in the reversed order.
Therefore, I can trigger the decryption routine with g=2 (1, or 3),
c=key, d=0:
To find the right key (0x570), I’ve brute forced about 100 numbers and
the URL to the next step is the following:
The commands file and strings revealed the following information.
For the reason of brevity, only some interesting strings are presented.
By reading other strings, it turned out that this file is a Python
interpreter.
With IDA Pro, I found quickly the referece of these strings and discovered that a new built-in module is added.
Built-in function: run_me
After have tested the function run_me, I found that it takes one
argument and usually generates a segmentation fault by calling it. So I
started to reverse this function. The assembler code of this function
can be found in the Appendix. The following pseudocode
presents the reversed run_me function. It loads three functions and
executes them all. One of these functions in particular is read from
run_me’s argument and the other two are loaded from memory.
Obviously, I needed to reverse the two functions loaded from memory.
Analysis of functions called inside run_me
Firstly, I dumped the concerned memory zones with GDB. Then I tried the
Python package dis. But it couldn’t disassembly them because some
attributes were removed.
In Quarkslab’s blog, there is another article Building an obfuscated Python interpreter: we need more opcodes that explains how to add new opcodes for a custom Python interpreter. In the same article, author cites Looking inside the (Drop) box) that explained Dropbox was using a custom Python interpreter with permuted opcodes. I’m not sure that it’s intended to be a hint, but it explains the challenge itself.
Permuted opcode
Original opcode
LOAD_CLOSURE
STORE_FAST
LOAD_FAST
LOAD_GLOBAL
STORE_SUBSCR
BINARY_ADD
BINARY_TRUE_DIVIDE
RETURN_VALUE
CONTINUE_LOOP
MAKE_FUNCTION
RETURN_VALUE
GET_ITER
MAKE_CLOSURE
CALL_FUNCTION
IMPORT_STAR
POP_TOP
SETUP_WITH
LOAD_FAST
BUILD_CLASS
YIELD_VALUE
I checked that this Python interpreter does have permuted opcodes and a few new opcodes.The above table illustrates the permuted opcodes and their original opcodes In order to revsere the permuted opcodes, I compared their assembler code with the source code.
I’ve written a Python script that used dis package to map opcode and its name. The source code can be found in Appendix. As for the new opcodes, there are actually only two:
LOAD_CONST and setCustomOpcodeOffset
LOAD_CONST and unsetCustomOpcodeOffset
The first one is always called before the second in order to set up a jump conditon (setCustomOpcodeOffset) so that the following opcode will jump to the expected address.
The second opcode loads the value ’setCustomOpcodeOffset’ and use it as
an index for the jump table. Since the first opcode always set this
value to 1, the following opcode jumps to the same address and do a
LOAD_CONST operation and unsetCustomOpcodeOffset, no matter what is its
opcode.
In the following sections, I analyze the two code objects incr.9351
and decr.9357 thanks to the previously created disassembler.
incr.9351: code object at 0x56c940
As his name indicates, it add one to the global variable ’True’ and
store it.
decr.9357: code object at 0x56c720
This function first loads the global variable ’True’ and check if it’s
equal to 3. Then if the global variable ’quarkslab’ exists, it loads
two Python built-in functions append and join, one empty string and
another code object in memory. Actually there is no need to reverse this
code object, I found the right song title without studying it. Anyway,
I’ve put a few explanation of this code object in Appendix.
Then it creates a new function with this code object.
After that, it builds a list of integers and call the new function by
passing this list as argument. Finally it creates a string by
concatenating all elements in the list generated by previous function,
append this string to the global variable ’quarkslab’ and return.
Solution
In conclusion, run_me function first reads a code object from
argument. Then it calls incr.9351 that increments the global variable
’True’ to 2. After this, the submitted code object is called and
finally the function decr.9357. In order to call the in memory code
object genexpr, two conditions should be satisfied:
the global variable ’True’ = 3.
the global variable ’quarkslab’ is a not empty list.
My solution is simple: pass the same code object as incr.9351 and set
the global variable ’quarkslab’ before calling run_me.