PlasmASM is a framework that can do asm and binary manipulation in a unified way.
There are many other tools that are better suited for reverse-engineering. The goal of PlasmASM is to enable automatic modification. One application is software obfuscation.
The main internal representation is a collection of basic blocs. Each basic bloc is labelled by a symbol, and may contain a list of lines.
A symbol has a name and attributes. Standard attributes are:
- the section
- its address (if read from a binary)
- the type (e.g. function, object)
- the binding (e.g. global, local)
A line may be a CPU instruction, or data.
In addition to this list of symbols, this representation optionally includes:
- information on sections (attributes)
- other metadata (container type, compiler type and options, ...)
Either assembly or binary can be input and either assembly or binary can be output.
When assembly is input and assembly is output, a use case is the insertion of PlasmASM as an additional step during a compilation, after assembly generation and before object file generation. Another use case is conversion of x86 assembly from AT&T syntax to Intel syntax, or the reverse.
When binary is input, a use case is the modification of a binary of unknown source. Note that object files are usually easier to parse than executable binaries.
Note that PlasmASM can output a binary, but it is often more robust to have PlasmASM generate assembly and then to have as generate the binary.
The file disass.py
gives access to most functionalities of
PlasmASM: read asm or binary, output to asm that can be assembled
by GNU as, output to objdump-like syntax which can be used to
check the results of the binary parser, output of the internal
representation of symbols and basic blocs
The reliability of PlasmASM depends on the version of the compiler and of the operating system. Automatic github action usage.yml shows a list of supported compilers.
Example, on a small program:
# Minimal C source
echo 'int main(int a, char **v){while(a>1){if(a%2)a/=2;else a=3*a+1;};return a;}'>a.c
# Create asm, object and executable binary
gcc -m32 -S a.c
gcc -m32 -c a.c
gcc -m32 a.c
# Look at the assembly
cat a.s
# Disass.py can generate a valid assembly code
python tools/disass.py -a a.s # input asm, output asm
python tools/disass.py -aI a.s # input asm, output asm Intel syntax
python tools/disass.py -a a.o # input object, output asm
python tools/disass.py -a a.out # input executable, output asm
python tools/disass.py -c /MIASM -a a.o # with miasmX backend (default)
python tools/disass.py -c /AMOCO -a a.o # with amoco backend
# Same for 64-bit
gcc -m64 -S -o a64.s a.c
gcc -m64 -c -o a64.o a.c
gcc -m64 -o a64.out a.c
python tools/disass.py -a a64.s # input asm, output asm (note that only amoco backend is available)
python tools/disass.py -aI a64.s # input asm, output asm Intel syntax
python tools/disass.py -a a64.o # input object, output asm
python tools/disass.py -a a64.out # input executable, output asm
PlasmASM does not work as well on a larger program, because automatic generation of a valid assembly from a binary needs to take into account many side effects:
# Default /bin/sh on Ubuntu 12.04
python tools/disass.py -a /bin/sh > sh.s
# It compiles
gcc -o sh sh.s
# Seems to be fully functional
./sh -c 'echo toto' # OK
./sh -c 'for i in a b; do echo $i; done' # OK
# Default /bin/ls on Ubuntu 12.04
python tools/disass.py -a /bin/ls > ls.s
# It compiles (some packages are needed)
apt-get install libacl1-dev
apt-get install libselinux1-dev
gcc -o ls ls.s -lrt -lacl -lselinux
# But is not functional at all
./ls # Segmentation fault
# NB: seems to work when run in gdb
Once PlasmASM has generated an internal representation of its input, this representation can be modified before generating some output.
This can be done interactively with python. The following example, where a very simple modification of the shell from Ubuntu 12.04 is made, results in a valid modified binary that can be used on Ubuntu 21.10.
>>> from plasmasm.analyze_file import File
>>> from tools.step2_change import change_ret
>>> f = './non_regression/sh_x86_linux_ubuntu1204'
>>> pool = File().from_filename(f, rw=True, dead=True)
>>> pool.arch.set_asm_format('att_syntax')
>>> change_ret(pool)
>>> pool.to_asm('/tmp/a.s')
gcc -m32 -o /tmp/sh /tmp/a.s
/tmp/sh -c 'for i in a b c; do echo x$i; done'
compile.py
aims at making it easier to use PlasmASM in a compilation
chain, where the intermediate result (assembly or object) of the
compilation of each file can be automatically modified.
With the syntax below, basic functions -parse_asm
or -parse_obj
can
be used to check that plasmasm will work well for a given os + compiler.
make test CC='compile.py -parse_asm gcc'
make test CC='compile.py -parse_obj gcc'
This can also be used to make automatic changes. A simple example is
included in tools.step2_change
, which can be used in a compilation chain:
make test CC='compile.py -change gcc'
Non-regression tests in usage.yml show how to use this approach to obfuscate a full software. Note what is produced in this example is not a solid obfuscation: the obfuscation primitives are very simple, and one should always strip the symbols after the executable is generated.
PlasmASM can work with python >= 2.3, including python 3. But some dependencies of PlasmASM need recent python. It has been tested with multiple versions of CPython, PyPy or GraalPy.
Depending on which dependencies are installed, the capability of PlasmASM can be limited.
For example, if only amoco is present, then only assembly manipulation for the CPUs supported by amoco is possible.
-
elfesteem
Needed when binary files are input or output; not needed for asm input and output. The version of https://github.com/LRGH/elfesteem should be used. It works for python >= 2.3.
-
amoco
Used when working with IA32, X86_64 or SPARC. Available at https://github.com/LRGH/amoco ; It works for python >= 2.7, if https://github.com/LRGH/crysp is installed.
Note that amoco needs pyparsing, and that there are changes of interface between old and recent versions of pyparsing.
-
miasmX
Used when working with IA32 or PowerPC. It is a heavy modification of miasm v1. Available on https://github.com/LRGH/miasmX It works for python >= 2.3.
Note that PowerPC support is very incomplete.
For the record, the most recent official repository of miasm v1 is https://github.com/cea-sec/miasm/tree/cecdf0da2ed0221c203af9157add30e2bff7dd8c from June 11, 2013
Also includes a patched version of ply.
Dependencies can be installed using pip
or manually,
as done for example in
portability.yml
Manual installation with manual modifications of amoco is recommended if one wants to minimize the number of dependencies.
-
analyze_file.py
Recognizes the file type (asm or binary -- ELF, PE, Mach-O).Detects which CPU it is, and parse the file, creating the internal representation.
-
parse_asm.py
Parser for asm files. -
write_asm.py
Outputs the internal representation in asm, that can be input to GNU as. Outputs the internal representation in an objdump-lile format. -
parse_bin.py
Parser for binary files. -
write_bin.py
Creates an ELF object corresponding to the internal representation. TODO: other types of binary outputs. -
compilers.py
Extension for parse_bin.py, with compiler-dependent stuff. -
get_symbols.py
Extension for parse_bin.py, to extract symbols from various types of binaries. -
symbols.py
Main data structures (symbol table, symbols, line, ...) -
constants.py
Internal representation for lines that are constants (numeric, strings, labels, ...) -
python/compatibility.py
Needed for plasmasm to be compatible with python2.3 to python3.4. -
utils.py
Various additional functions, e.g. graph generation. -
arch
Directory with cpu-dependent and backend-dependent definitions.Each CPU implementation should have the class Instruction which inherits from Line and implements basic functions (parse from text, from binary, some access to opname or operands), the class InstructionCFG which adds some computation of the CFG (by computing the possible destinations for branch instructions), the class InstructionRW which lists which registers are read or written (and is used for the computation of dead registers).
__init__.py
contains functions for importing architectures by name and to list all available backends. It is made to be compatible with python2.3 to python 3.x.I386.py
,X64.py
,PPC.py
,SPARC.py
contain cpu-dependent information: the name of the cpu as known by each container; the list of existing mnemonics, to help auto detection of assembly.I386_MIASM.py
,I386_AMOCO.py
,X64_AMOCO.py
,PPC_MIASM.py
,SPARC_AMOCO.py
contain the CPU implementations; the filename isCPU_BACKEND.py
for an implementation of CPU based on BACKEND.
Static analysis to infer some local properties. Using these functions increases the running time, but is necessary in most cases where an automated modification shall not change the semantics of the software.
-
pic_tracking.py
When Position Independ Code is generated, there is a register that is used to memorize where the code has been loaded. The way it is done depends on the compiler, this module tracks this register, so we can use it when modifying the code. -
stack_tracking.py
For X64, there is a "red zone" that we shall not overwrite when doing push/pop. This module tracks the red zone, so we can use the stack when modifying the code. -
dead_registers.py
Dead registers are registers that are not used, and therefore can be used when modifying the code.
In addition to disass.py
and compile.py
mentioned above,
the tool testing_plasmasm.py
is very useful for understanding
why plasmasm sometimes fails to generate the right assembly when
parsing a binary.
If takes .s or .o files as arguments, this tool uses the parser of PlasmASM
and creates an assembly file with the write_asm
module of PlasmASM.
Then it uses the native compiler to create object files, and compares
these object files with objdump -drt
or otool -tvj
Note that sometimes the assembler has bugs (e.g. GNU as 2.14 or 2.15
changes the order of the arguments of test %reg, %reg
, while the same
version of objdump is OK) and therefore there are special cases to make
testing_plasmasm.py
succeed in case of assembler bugs.
If there exist a file having the same name as the input file, plus .plasmasm
,
this file is used to describe how to complete the parsing of the input file.
Ideally, these helper file should not be useful, because everything should be
automatically deduced. However, there are cases where automatic deduction is
not possible : for example, when compiling C to an ELF object file, if a
global variable is unitialised, then it is in the COM section, and if this
global variable is initialised to zero, then it is in the .bss section. But
after linking with ld, both are in the .bss section: there is no way to
know what was in the C source.
The .plasmasm helper file should contain a function named "helper" with an argument, the symbol pool. This function can make any modification to the symbol pool.