Skip to content

Latest commit

 

History

History
267 lines (190 loc) · 12.5 KB

README.md

File metadata and controls

267 lines (190 loc) · 12.5 KB

IR Fuzzer

Quick start

Compile

You should be able to prepare everything by running ./build.sh. It should compile everything for you. If it failed for any reason, please send an issue to this repo.

The script will set some environment variables. You may want to leave these in your .bashrc for further fuzzing:

# Path to this directory
export FUZZING_HOME=$(pwd)
# The LLVM you want to fuzz
export LLVM=<Your LLVM>
export AFL=AFLplusplus
export PATH=$PATH:$HOME/clang+llvm/bin
# Tell AFL++ to only use our mutator
export AFL_CUSTOM_MUTATOR_ONLY=1
# Tell AFL++ Where our mutator is
export AFL_CUSTOM_MUTATOR_LIBRARY=$FUZZING_HOME/mutator/build/libAFLCustomIRMutator.so
# AFL instrumentation method
export AFL_LLVM_INSTRUMENT=CLASSIC

If you want to use dockerized environment, you can also do

docker build . -t irfuzzer

Seed selection

Seed is the initial input we give fuzzers, they have a directly impact on fuzzing performance. seeds provides a default seed start fuzzing, it is an empty module with some function signatures. For better fuzzing performance, you are more than welcome to move modules in $LLVM/llvm/test/CodeGen/<Arch> into seeds. Notice that seeds only accepts bytecode, not LLVM IR.

Run

Env vars

You can specify different arguments for the driver using environment variables.

Required

export TRIPLE=<Your triple>
export CPU=
export ATTR=

You can specify triples like x86_64, aarch64, aie, etc. If you don't know what triples you have, try llc --version, it will list all triples you have. CPU and ATTR can be left empty, but it is a must have. They are equivalent to -mcpu and -mattr you would normally put when using llc.

export MATCHER_TABLE_SIZE=13780

Matcher table size refers to the size of the matcher table generated by TableGen. The table is automatically generated as a static variable in in SelectCode(SDNode *N) <Target>GenDAGISel.inc(For SelectionDAG) and in <Target>InstructionSelector::getMatchTable() <Target>GenGlobalISel.inc(For GlobalIsel). You have three ways to find its length:

  1. every time AFL's compiler compiles the project, it counts the table size and pops a [+] MatcherTable size: 22660. You can look out for that.
  2. If you missed it, you can delete the object file (ISelDAGToDAG.cpp.o or InstructionSelector.cpp.o) and force a re-compilation.
$ cd build-afl
$ rm lib/Target/AIE/CMakeFiles/LLVMAIECodeGen.dir/AIEISelDAGToDAG.cpp.o
$ ninja

[6/27] Building CXX object lib/Target/AIE/CMakeFiles/LLVMAIECodeGen.dir/AIEISelDAGToDAG.cpp.o
[+] MatcherTable size: 22660
  1. You can also find this data in scripts/common.py. It may not be 100% accurate as the code gets updated.

Optional

export GLOBAL_ISEL=1;

By default, we are fuzzing SelectionDAG. If you want to fuzz GlobalIsel, attach this environment variable. Please make sure MATCHER_TABLE_SIZE matches with GlobalIsel's table size.

Command line

Once the environments are set, the easiest way to start fuzzing is to do

./AFLplusplus/afl-fuzz -i <seed-dir> -o fuzzing llvm-isel-afl/build/isel-fuzzing

It would start a fuzzing instant to fuzz SelectionDAG. Some useful argument you might give afl-fuzz includes:

  • -E <n>: execute/mutate the input for n times and quit
  • -V <t>: run the fuzzer for t seconds and quit

Fuzzing can take weeks, if not days. I recommend using screen to run the fuzzing in the background.

AFL++ will give you a fancy UI to describe what's happening. You may check this page to help you understand the stats.

Archs and table size

Check ./script/common.py.

Scripts

Dependencies

We prepared many scripts to automate the fuzzing process. These scripts runs on Python 3.10+, as it supports type hints to make it look less messy. Use python3.10 explictly to avoid conflict with python3.6... suppose you are still using ubuntu 18.04 or order. To install some dependencies you may want to:

# If your ubuntu is so old you don't have python3.10 in your apt I can't help you...
# `apt install -y python3.10 python3-pip wget`
wget https://bootstrap.pypa.io/get-pip.py
python3.10 get-pip.py

# You can install all the dependencies of the scripts with:
pip3.10 install -r scripts/requirements.txt

Description and usage

  • common.py: this is not intended to be directly called, yet it have many metadata inside, you are welcome to take a look.
  • fuzz.py: this fuzzes a lot of triples using docker or screen.
  • batch_classify.py: this script runs all the crashed inputs and cluster the same ones together using the stack trace. You may want to run this after a fuzzing process.
  • combine-fuzzing-results.py: this script combines multiple fuzzing directories into one. If you are not writing a paper and need massive data you probably don't need it.
  • process_data.py: summarize the fuzzing result.

Using fuzz.py don't need you to set any environment variables, the script will take care of it. You would most likely use the fuzz.py like this:

python3.10 scripts/fuzz.py -i seeds -o fuzzing -r 5 --set="  aie" --type=screen --isel=dagisel --fuzzer=irfuzzer --time=1w -j 80 --on_exist=force

It means: start fuzzing using input from seeds (-i seed), put the result in fuzzing (-o fuzzing), repeat the experiment for five times (-r 5), test aie without attribute and cpu setting (--set=" aie"), use screen to monitor the fuzzing (--type=screen), test SelectionDAG (--isel=dagisel), use our fuzzer (--fuzzer=irfuzzer), test for a week (--time=1w), start at most 80 jobs in parallel (-j 80) and if the output directory already exists, force remove it (--on_exist=force)

How do we fuzz

See the details in our paper

Trophies & Findings

(I think I will attach more links to keep track of these later)

AI Engine

  • AIE1 GlobalIsel lacks floating point support
  • AIE1 GlobalIsel lacks vector support.
  • AIE1 SelectionDAG has bugs in the memory store.
  • AIE1 SelectionDAG has truncation errors. Fixed.
  • AIE1 vst.spil generates two stores to the same address. PoC. Fixed.

Open sourced architecture

See our trophies repo.

FAQ

Why build two versions of LLVM?

One version is built by AFL's compiler, and another is built by LLVM14 and contains a new mutator we designed. AFL needs to inject some code to the AIE compiler to keep track of runtime info (Edge coverage, MatcherTable coverage, etc.) Besides, the driver also depends on it. The other version is the dependency for the mutator. You can use AFL instrumented mutator, but it would slow down mutation speed and thus not recommended.

Why fuzz a fork of AIE that is not up-to-date?

Mainly because mutator also needs to understand the architecture we are fuzzing, although it only generates mid-end IR. Therefore, until we merge mutator's code into AIE, all you can do is keep merging the code you want to test to mutator branch and compile everything.

Are we fuzzing AIE2?

Currently we are only fuzzing AIE1 since it is more complete than AIE2. But you can fuzz AIE2 if you want to. In principle fuzzing AIE1 is no different than AIE2. All you need to do is set TRIPLE=aie2 and set MATCHER_TABLE_SIZE correctly.

AIE compilation hangs

It's an known issue that Target/AIE/MCTargetDesc/AIEMCFormats.cpp will take a long time (~10 minutes) to compile. A function in it __cxx_global_var_init() will cause the optimizer to run for a really long time. It is an interesting bug, but we haven't had time to fix it.

What is a seed and what to use

Seed is the initial file you give fuzzer to work on. Unfortunately, this is required for AFL. (libFuzzer can cold-start without seed). In this repo, we included a minimal seed in seeds/ so you can start fuzzing without really worrying about it.

However, academic research and industry practice have shown that a better seed can lead to better results. You may reach the same result faster or find behavior unseen before with different seeds. So if you can manually craft some seeds to cover different codes you want to test, for example, if you want to focus on floating point, you can create seeds with floating point calculations in them.

To create a seed, you can write LLVM IR manually and convert it to bitcode using llvm-as. Or you can cast bitcode to IR using llvm-dis and change some of the instructions.

Matcher table coverage is 0.0%

Table coverage may be low but never 0.0% in any cases. Please make sure the matcher table is correctly instrumented.

  1. Make sure your binary is linked against the library compiled by AFL.
  2. Make sure AFL instrumented it. During compilation, there should be a line telling you [+] Instrumenting matcher table.

What does the stats in AFL's UI mean?

You may check this page to help you understand the stats.

We introduced a new coverage, so map density shows two stats. The first one is edge coverage, which should reach 70~80% in a day or two, meaning that (almost) all control flow has been tested. The second stat is matcher table coverage. It shows how much the table has been referenced. The higher, the better.

My fuzzer is running slow

There are two reasons it could happen. AFL has high file system interactions. Therefore, make sure your directory is not a nfs or any remotely mounted hard drive. If you want even faster speed, you can mount a tmpfs to do fuzzing in the memory.

Another reason is your seeds are taking a long time to execute. You may either choose smaller initial seeds or use shorter timeouts by adding -t <timeout> to AFL's arguments.

Where are the crashes located?

$FUZZING_HOME/fuzzing_output/default/crashes

How to reproduce errors?

One upside of fuzzing is it always gives you reproducible PoC. You can run build-release/bin/llc <args> <crashing-input>.

We have also find cases where llc won't reproduce. In that case try

export CPU=<YourCPU>
export ATTR=<YourAttr>
export TRIPLE=<YourTriple>
export MATCHER_TABLE_SIZE=<YourSize>
./llvm-isel-fuzzing/build/isel-fuzzing < <input>

We have noticed some setting difference between llc and our driver isel-fuzzing. We haven't had time to deal with it. Will update this later.

If there are any input that can't be reproduced even using isel-fuzzing, there are two possibilities:

  • Your matcher table size is set wrong.
  • It may be a bug and please send us an issue.

What if MatcherTable is not set or set incorrectly?

To pass compilation and AFL's self-testing, MATCHER_TABLE_SIZE is defaulted to a small amount. You would most like to see Shadow table size: 32 too small. Did you set it properly? that means it is not set. If MATCHER_TABLE_SIZE is not set correctly, you will have false positives where the seed is stored in crashes (Indicating the fuzzer finds the seed crashing), but you can't reproduce it with llc. That means the runtime code we injected is crashing, not the LLVM itself. Most likely, it's because MATCHER_TABLE_SIZE is set too small, and an OOB Write happened.

My mutator aborted during fuzzing?

This is a common issue, its not a bug in the mutator. Most likely you didn't set the types correctly. If mutator can't find a typed value to complete an instruction generation, it aborts. Therefore, it is important to write all types when creating the mutator.

Mutator is non-deterministic, debuging is hard. But here's a trick, the mutator is deterministic is the seed is the same. If your fuzzer crashed, go find the .cur_input in your repo, this is the last input that mutator worked on before it crashed. Use ./mutator/scripts/validate.sh .cur_input to verify the mutator with this input. The script will (hopefully) give you the seed that crashed the mutator. You can then debug the mutator by providing it with a deterministic seed that validator just poped out: ./mutator/build/MutatorDriver .cur_input <seed>. If you can confirm that the last stack trace is SourcePred.generate, that's it, you didn't provide all the types required. If you see any other reasons for crashing, contact me.

Also, when mutator dies, the fuzzer become a zombie process, don't forget to clean it up :)