In this repo, two instructions are extended to RISC-V ISA, which named sm4.key.rf and sm4.enc.rf respectively. The instruction function unit is embedded into the execution pipeline stage of a 32-bit RISC-V processor named scr1.
This project proposes two RISC-V instruction set extensions for the SM4 block cipher. These two instructions are used to implement one iteration of round function of the key expansion algorithm and the encryption algorithm, respectively. The instruction function unit is embedded in the execution pipeline stage of scr1. Cycle-accurate simulation results show that compared with software implementation without extended instructions, the latency of SM4 block cipher is reduced by 85.3%, and the throughput rate is increased by 6.79 times. The implementation results on the Xilinx Spartan-7 FPGA show that the function unit occupies only 247 LUTs. Furthermore, the synthesis results under 180nm process show that the resource overhead of the instruction function unit is only 1790 gates.
#### 1.1.2 key expansion algorithmscr1\src\pipeline\scr1_pipe_idu.sv
scr1\src\pipeline\scr1_pipe_ialu.sv
scr1\src\pipeline\scr1_pipe_exu.sv
scr1\src\pipeline\scr1_pipe_mprf.sv
scr1\src\pipeline\sbox.sv
scr1\src\pipeline\sm4lt.sv
- download the risc-v gnu toolchain from here
- modify riscv-opc.h and riscv-opc.c in riscv-binutils
- riscv-opc.h is in
riscv-binutils/include/opcode
- riscv-opc.c is in
riscv-binutils/opcode
- you can found ours riscv-opc.c in
modify-the-gnu-toolchains\riscv-opc.c
- you can found ours riscv-opc.h in
modify-the-gnu-toolchains\riscv-opc.h
- riscv-opc.h is in
- compile
Example files can be found in scr1\sim\tests\sm4-isa-ext
.
The extended instructions must be used with inline assembly language. The following is an example.
__asm__ __volatile__(
"mv t0, %[src0]\n\t"
"mv t1, %[src1]\n\t"
"mv t2, %[src2]\n\t"
"mv t3, %[src3]"
:
:[src0]"r"(ulbuf[0]), [src1]"r"(ulbuf[1]), [src2]"r"(ulbuf[2]), [src3]"r"(ulbuf[3])
:"t0", "t1", "t2", "t3"
);
while (i < 32)
{
__asm__ __volatile__("sm4_enc_rf t3, %[src]" : :[src]"r"(sk[i]): "t0", "t1", "t2", "t3");
i++;
}
__asm__ __volatile__(
"mv %[dest0], t0\n\t"
"mv %[dest1], t1\n\t"
"mv %[dest2], t2\n\t"
"mv %[dest3], t3"
:[dest0]"=r"(ulbuf[0]), [dest1]"=r"(ulbuf[1]), [dest2]"=r"(ulbuf[2]), [dest3]"=r"(ulbuf[3])
:
:"t0", "t1", "t2", "t3"
);
2670 ns = 267
2310 ns = 231
more simulation results can be found in images
.