-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Typically, it’s due to
- Instrumenting every instruction executed.
- Instrumenting every memory access.
Optimize your program with less instrumentation, e.g. by using UC_HOOK_BLOCK
instead of UC_HOOK_CODE
Updating PC is a very large overhead (10x slower in the worst case, see FAQ above) for emulation so the PC sync guarantee is explained below:
- A
UC_HOOK_CODE
is installed. In this case, the PC is sync-ed everywhere within the effective range of the hook. However, on some architectures, the PC might by sync-ed all the time if the hook is installed. - A
UC_HOOK_MEM_READ
orUC_HOOK_MEM_WRITE
is installed. In this case, the PC is sync-ed exactly before any read/write events within the effective range of the hook. - Emulation (
uc_emu_start
) terminates without any exception. In this case, the PC will point to the next instruction. - No hook mentioned above is installed and emulation terminates with exceptions. In this case, the PC is sync-ed at the basic block boundary, in other words, the first instruction of the basic block where the exception happens.
Below is an example:
mov x0, #1 <--- the PC will be here
mov x1, #2
ldr x0, [x1] <--- exception here
If ldr x0, [x1]
fails with memory exceptions, the PC will be left at the beginning of the basic block, in this case mov x0, #1
.
However, if a UC_HOOK_MEM_READ
hook is installed, the PC will be sync-ed:
mov x0, #1
mov x1, #2
ldr x0, [x1] <--- exception here and PC sync-ed here
Unicorn is a pure CPU emulator and usually it’s due to no handler registered for instructions like syscall
and SVC
. If you expect system emulation, you probably would like qiling framework.
Currently, only a small subset of the instructions can be instrumented.
On x86, all available instructions are: in
out
syscall
sysenter
cpuid
.
- Some instructions are not enabled by default on some architectures. For example, you have to setup CSR on RISC-V or VFP on ARM before emulating floating-point instructions. Refer to the corresponding manual to check if you leave out possible switches in special registers.
- If you are on ARM, please check whether you are emulating a THUMB instruction. If so, please use
UC_MODE_THUMB
and make sure the starting address is odd. - If either is not the case, it might be some newer instruction sets that qemu5 doesn’t support.
- Note some instruction sets are not implemented by QEMU.
If you are still using Unicorn1, please upgrade to Unicorn2 for better support.
There are several possibilities, e.g.:
- The instruction might access memory multiple times like
rep stos
in x86. - The address to access is bad-aligned and thus the MMU emulation will split the access into several aligned memory access. In worst cases on some arch, it leads to byte by byte access.
This is a minor change in memory hooks behavior between Unicorn1 and Unicorn2. To gracefully recover from memory read/write error, you have to map the invalid memory before you return true.
It is due to the fact that, if users return true
without memory mapping set up correctly, we don't know what to do next. In Unicorn1, the behavior is undefined in this case but in Unicorn2 we would like to force users to set up memory mapping in the hook to continue execution.
See the sample for details.
For MIPS, you might have an address that falls in MIPS kseg
segments. In that case, MMU is bypassed and you have to make sure the corresponding physical memory is mapped. See #217, #1371, #1550.
For ARM, you might have an address that falls in some non-executable segments. For example, for m-class ARM cpu, some memory area is not executable according to the ARM document.
This is intended as python signal module states:
A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
A workaround is to start emulation in another thread.
Unicorn is a fork of QEMU and inherits most QEMU internal mechanisms, one of which is called TB chaining. In short, every block (in most cases, a basic block
) is translated, executed and cached. Therefore, any operation on cached addresses won't immediately take effect without a call to uc_ctl_remove_cache
. Check a more detailed discussion here: #1561
Note, this doesn't mean you have to care about Self Modifying Code because the read/write happens within emulation (TB execution) and QEMU would handle such special cases. For technical details, refer to the QEMU paper.
TLDR: To ensure any modification to an address will take effect:
- Call
uc_ctl_remove_cache
on the target address. - Call
uc_reg_write
to write current PC to the PC register, if the modification happens during emulation. It restarts emulation (but doesn't quituc_emu_start
) on current address to re-translate the block.
As stated, Unicorn is a pure CPU emulator. For such emulation, you have two choices:
- Use the
timeout
parameter ofuc_emu_start
- Use the
count
parameter ofuc_emu_start
After emulation stops, you may check anything you feel interested and resume emulation accordingly.
Note that for cortex-m exec_return
, Unicorn has a magic software exception with interrupt number 8. You may register a hook to handle that.
To provide end users with simple API, Unicorn does lots of dirty hacks within qemu code which prevents it from sync painlessly.
Yes, it’s possible but that is not Unicorn’s goal and there is no simple switch in qemu to disable softmmu.
See milestones and coding convention.
Be sure to send pull requests for our dev branch only.
Prior to 2.0.0, Unicorn is based on qemu 2.2.1. After that, Unicorn is based on qemu 5.0.1.