ZOFI is a Zero Overhead Fault Injector. It is a tool for very fast fault coverage analysis of transient faults on binaries.
ZOFI has been tested on relatively modern x86_64 Linux 4.4 or later systems with glibc 2.23 or later. The main external build dependencies are:
- A C++14 compiler (http://gcc.gnu.org/)
- CMake 3.2 or later (https://cmake.org/)
- Capstone 4.0.1 or later (https://www.capstone-engine.org/)
Skip this section if capstone 4.0.1 is already installed on your system:
$ wget https://github.com/aquynh/capstone/archive/4.0.1.tar.gz
$ tar -xvf 4.0.1.tar.gz
$ cd capstone-4.0.1/ && ./make.sh && sudo make install
$ sudo ldconfig
Download, build and install ZOFI:
$ git clone https://github.com/vporpo/zofi
$ cd zofi
$ mkdir build && cd build
$ cmake ../src/ && make -j && sudo make install
This will install zofi into /usr/local/bin/
.
Please note that ZOFI can run straight from your build directory, so you could skip && sudo make install
.
Note: For a debug build you need to pass the -DCMAKE_BUILD_TYPE=Debug
parameter to cmake:
$ cmake -DCMAKE_BUILD_TYPE=Debug ../src/
If all went well, you can now run ZOFI:
$ zofi -help
Or if you did not install it on your system, you can run it straight from the build directory:
$ ./zofi -help
The main option that you need to specify is the path to the binary. This will run the binary once to check its original state, then it will run it once more injecting a fault in the process and will report the results.
$ zofi -bin <path/to/binary> [Other options]
Here is an example:
$ zofi -bin /usr/bin/seq -test-runs 50 -args -f '%g' 1 500000
This runs the /usr/bin/seq
binary with arguments: -f '%g' 1 500000
for 50
runs.
Here is what you will see on the terminal:
$ zofi -bin /usr/bin/seq -test-runs 50 -args -f '%g' 1 500000
------ Options ------
-args = -f %g 1 500000
-bin = /usr/bin/seq
-test-runs = 50
---------------------
-- Original (Timing) Run --
4.56r/s 1/1 : 0%=========25%=========50%=========75%========100%
Original Duration: 0.219s
-- Test Runs --
2.79r/s 50/50 : 0%=========25%=========50%=========75%========100%
-------------------------------
Outcome : Cnt, %
-------------------------------
Masked : 10, 20.000%
Exception : 15, 30.000%
InfExec : 2, 4.000%
Corrupted : 23, 46.000%
To speed-up the execution on a multi-core system you can use -j N
where N
is the number of parallel jobs.
For example, on a quad-core system we would run 4 jobs like this:
$ zofi -bin /usr/bin/seq -test-runs 50 -j 4 -args -f '%g' 1 500000
Please note that if you need to pass a -args
option to ZOFI, it will have to be placed last in command line, after all other options. The reason is that everything that follows it is considered an argument of the binary being tested, not ZOFI's.
Also the environmental variables are transparent to ZOFI and get passed through to the binary.
You can list all options with zofi -help
.
ZOFI supports various types of fault injections with the -inject-to <TYPE STRING>
flag. Each character in the type string enables a specific class of injection. The default type string enables faults to explicitly (e) read(r) and written(w) registers: "rwe"
. The types that are currently supported are:
-
Inject into registers read(
r
), or written(w
) by the instruction. The default option enables both. -
Inject to explicitly(
e
) or implicitly(i
) accessed registers. A large number of x86 instructions will clobber registers that are not explicitly stated in the assembly code, such as the zero flag. The default option allows only the explicitly(e
) accessed. -
Inject errors to the instruction pointer (
c
). ZOFI will not inject faults to the instruction pointer by default.
Option | Inject to |
---|---|
r |
Registers read by instruction |
w |
Registers written by instruction |
e |
Explicitly accessed registers |
i |
Implicitly accessed registers |
c |
Instruction pointer if a control instruction |
o |
Instruction pointer if not a control instruction |
For example, using rwco
will inject faults to the instruction pointer, regardless of whether the target instruction is a control-flow instruction (like a jump) or not.
Using rwc
will only inject faults to the instruction pointer if the target is a control-flow instruction (it will keep retrying until it lands on a control-flow instruction).
Unlike -inject-to
which will inject to registers being accessed by the currently executed instructions, -force-inject-to-reg
will inject a fault to the specified register. For example:
$ zofi -force-inject-to-reg rax ...
For a complete list of registers please use -force-inject-to-reg help
.
Note: This feature is available since version 0.9.7.
-force-inject-to-bit
forces the fault to be injected to a specific bit. If not forced, the bit is selected at random. You can get the valid bit range for each register with -force-inject-to-bit help
.
This option can be used in combination with -force-inject-to-reg
. It may also be used without it, but in that case the forced bit may be out of range because the register (picked at random) may be narrower. If this happens, the injector will keep retrying to inject a fault to some other instruction, until the bit is in range. There is a limit to the number of retries which can be overridden using -max-injection-attempts
.
Example:
$ zofi -force-inject-to-reg rax -force-inject-to-bit 42 ...
Note: This feature is available since version 0.9.7.
ZOFI supports running multiple test runs in parallel to speed up the fault injection process.
The number of parallel jobs is controlled with the -j
switch which defaults to 1.
Using multiple parallel jobs can speed up the evaluation significantly, as the test run throughput increases almost linearly with the number of jobs executed.
Please note that it is not advised to use higher jobs count than the number of threads supported by the target CPU, as this will mess with the timings of the injections.
ZOFI supports injecting faults to multi-threaded applications since version 0.9.4. The process is very similar to single-threaded fault injection. The main difference is that we now have to select a random running thread to inject the fault to. Please note that the fault injection happens only to the selected thread, the rest of them are not even stopped.
When ZOFI checks the test run's output (stdout and stderr) against that from the original run, it uses a simple one to one comparison between them. If any discrepancy is found, it will be reported as "Corrupted" output.
This simplistic diff function, can sometimes falsely report "Corrupted" outputs even though no corruption took place. This can happen when the output contains unique strings to each run, for example the current time, or the process id. In such cases a more suitable diff function would skip these patterns and would compare only the remaining output.
ZOFI provides the capability to do so, using a custom command with -diff-cmd
.
This spawns a shell (/bin/bash by default but can be modified with -diff-shell
) and executes the provided command within it.
If the executed commands exit normally with an exit code of 0, then the output is considered identical, otherwise it is considered "Corrupted".
For example, if we need to skip any pattern matching the regular expression ^Time:
, we would use the following:
-diff-cmd '/usr/bin/diff <(/bin/grep -v "^Time" %ORIG_STDOUT) <(/bin/grep -v "^Time" %TEST_STDOUT)'
For more accurate checking, we should check both stdout and stderr, by connecting the two diffs with &&.
-diff-cmd '/usr/bin/diff <(/bin/grep -v "^Time" %ORIG_STDOUT) <(/bin/grep -v "^Time" %TEST_STDOUT) && /usr/bin/diff %ORIG_STDOUT %TEST_STDERR'
Another example is the NAS Benchmarks, which need the following command to skip the "Time in seconds", "CPU Time", "Initialization time", "Mop/s total" , and "Compile date" lines.
-diff-cmd '/usr/bin/diff <(/bin/grep -v "\([Tt]ime\)\|\(Mop/s total\)\|\(Compile date\)" %ORIG_STDOUT) <(/bin/grep -v "\([Tt]ime\)\|\(Mop/s total\)\|\(Compile date\)" %TEST_STDOUT)'
By default ZOFI will inject faults to any instruction.
This can be problematic for fault-tolerance studies where the user's code has been protected by some fault-tolerance scheme, while the system's libraries have not.
ZOFI supports disabling fault injection to .so libraries that are dynamically linked to the executable, with the -no-inject-to-libs
flag.
Most modern machines implement Simultaneous Multi-Threading (SMT), which support more logical cores than the actual physical cores (usually twice as many). It is advised to set the number of jobs no higher than the number of physical cores. Otherwise, the timings may be affected considerably and that could affect the randomness of the experiments.
On single socket machines:
$ cat /proc/cpuinfo | grep "cpu cores" | uniq
The number of sockets:
$ cat /proc/cpuinfo | grep "physical id" | sort | uniq
The total number of physical cores is the number of cpu cores times the number of sockets.
Workloads that occupy a lot of shared resources, e.g., memory intensive workloads, need special consideration. Running many of them in parallel can lead to unpredictable timings. Therefore it is advised to run such workloads using fewer jobs.
Workloads with execution time that varies a from run to run.
The original timing run will most probably have a wrong value, therefore the user should probably override it with the -bin-exec-time
flag, or use a very large overshoot value -bin-exec-time-overshoot
.
Most modern systems will throttle the processor frequency once the temperature of the processor becomes too high. This can affect the timings too. So please make sure to either use fewer jobs, or improve the cooling of your processor.
Some compilers, like GCC, implement stack protection techniques to help protect from attacks.
Such techniques will report the error and the process will abort early, converting a potential exception to a corrupted output outcome.
If you are interested in measuring the application's raw fault coverage with no protections whatsoever, you may consider building your workload with such techniques disabled (for example -fno-stack-protector
).
ZOFI has some well-known limitations:
Injecting faults on applications with either:
- a very small run-time (e.g., less than 50ms), or
- variable execution time, is problematic. ZOFI may end up trying to inject a fault when the application has already stopped.
ZOFI will try to tolerate such failures by retrying at some other random injection time, but if the failures build up to a large number, greater than -max-injection-attempts
, then it will exit with an error.
Please note that the synchronization between the processes takes some time and the earliest a fault can be injected on a generic system is about half a millisecond.
If a workload uses signals as part of its normal operation, it may not expect to be signaled externally, and may stop working properly if so. On such workloads ZOFI may report false results.
An example of this is /bin/sleep from GNU coreutils
, which will return right after it receives a signal from ZOFI.
Even though the design of ZOFI is not tied to a specific architecture, it does have some minor glue code that is architecture-specific. The current implementation only supports x86_64, but supporting other ISAs should require little effort. For the most part ZOFI relies on the capstone-engine library (https://www.capstone-engine.org/) for the instruction semantics, which already supports several ISAs.
ZOFI includes a test-suite in the test/
directory.
You can run the whole suite by running $ make check
from within the build/
directory.
Tests are C/C++ source files with testing instructions prefixed by // RUN:
.
All // RUN:
lines are parsed by ZOFI's testing tool, ZIT (ZOFI Integrated Testing).
For details on how to use Zit, please see Zit's documentation in the sections that follow.
Zit is a testing script built to test source files with // RUN:
test commands.
A typical test command includes three parts, usually connected with &&
and pipes |
.
- The compilation CC/CXX part,
- The ZOFI part,
- The GREP part.
For example, the following test command first compiles the source file, then calls zofi, and finally uses grep to check for a specific pattern.
// RUN: %CC %THIS_FILE -o %UNIQUE_FILE && PraiseBob=1 %ZOFI -bin %UNIQUE_FILE -test-runs 1 -v 1 -injection-time 1 -injections-per-run 0 -no-test-redirect 2>&1 | %GREP 'PraiseBob=1' 2>&1 > /dev/null
If the command returns 0, it passes. If not, it is reported as a failure.
Since we want to avoid hard-coded tool paths and filenames, Zit supports several automatically substituted variables, all prefixed by %
.
Macros | Substituted by |
---|---|
%CC |
The C compiler specified in Zit and can be overriden with an argument. |
%THIS_FILE |
The file name of the zit test file (i.e., the source file). |
%UNIQUE_FILE |
A unique file path, usually in the form of /tmp/tmp.xxx . |
%ZOFI |
The zofi tool binary. |
%GREP |
The grep tool binary. |
Zit also supports a range of builtin functions that can help with checking the output of ZOFI.
Builtin | Functionality |
---|---|
%GREATER_THAN |
Takes one argument and checks whether the stdin is > the argument. |
%LESS_THAN |
Takes one argument and checks whether the stdin is < the argument. |
%EQUALS |
Takes one argument and checks whether the stdin is == to the argument. |
%GET_OUTCOME <Masked|Exception|InfExec|Corrupted> <N|%> |
Returns the fault coverage outcome. N returns the absolute count, while % returns the percentage. |
This is the command line syntax of the Zit tool:
Usage: zit <test.zit> [-zofi /path/to/zofi] [-cc /path/to/c/compiler] [-cxx /path/to/c++/compiler] [-help]
test.zit
Is one or more zit test files.-s
Silent mode. Print only the absolute minimum.-zofi /path/to/zofi
Override the zofi binary path.-cc /path/to/cc
Override the C compiler path.-cxx /path/to/cxx
Override the C++ compiler path.-help
Prints a brief help message.
For a high-level description of the design, a comparison against other techniques and a case study on NPB2.3 please refer to the paper:
"ZOFI: Zero-Overhead Fault Injection Tool for Fast Transient Fault Coverage Analysis", Vasileios Porpodas, arXiv:1906.09390, 2019 bib
ZOFI is under development, so please expect it to fail. You can help improve it by filing a bug report, or by contributing a fix.
ZOFI is free software and is distributed under the GPL License version 2. http://www.gnu.org/licenses/
Contact: Vasileios Porpodas (v dot last name at gmail.com)