LevelST is an HBM-FPGA-based stream accelerator for sparse triangular solvers. It is designed and tested on the Xilinx Alveo U280 FPGA board.
Dependencies:
- TAPA & Autobridge (follow this instruction to install TAPA and other dependencies)
- Vitis 2021.2, Vivado 2021.2
- Xilinx
xilinx_u280_xdma_201920_3
platform shell (more recent platform requires modification on Autobridge source file)
Input matrix format:
The host code takes Matrix Market format. We test on triangular matrices decomposed from sparse matrices in the SuiteSparse collection.
Dataset
All test matrices are located here. There are three types of matrix files:
*_trig
: These are generated by decomposing the matrices in the SuiteSparse collection.*_alt
: These are generated by matrix reordering & decomposition of the matrices in SuiteSparse collection to boost performance.*
: These are the original matrices in the SuiteSparse collection. For testing, we only use the lower triangular portion.
In the dataset, some matrix file has a corresponding JSON file (same name but with .json
extension). Please pass the value of row
to the host executable when testing on these matrices to crop them (detail in software simulation). We will later modify the host code to automatically detect JSON files for cropping.
Compile the host code
make
This will run g++
to compile the host code for you.
Notice: The Makefile is written to execute on a server with a package manager like spack
to link the included files and library binary. You are free to change -I
flags and -L
flags depending on your system setup. Also, remember to set the environment variables LD_LIBRARY_PATH
and CPATH
in .bashrc
Finally, execute the software simulation
./trig-solver
The default matrix is lp1.mtx
provided in this repository. To test other matrices, simply pass an argument by
./trig-solver --file <matrix_file.mtx>
This will run LevelST over the whole matrix. To perform cropping, simply pass an integer as the number of rows you want to restrict. For example, to enforce the number of rows at 200000, run:
./trig-solver --file <matrix_file.mtx> 200000
All arguments in software simulation are also available for cosim and hardware execution.
bash run_tapa.sh
This will generate a folder containing multiple subfolders, where each contains:
- a TCL file for floorplanning constraint
- A bash script to run bitstream generation
- Autobridge log
- HLS code of each module compiled by TAPA
- RTL code
- HLS logs and reports
A rough estimation of area usage is in the Autobridge log. Each subfolder represents a solution generated by Autobridge
Modify the bash script solver.xilinx_u280_xdma_201920_3.hw.xo.tapa/run-n/solver.xilinx_u280_xdma_201920_3.hw_generate_bitstream.sh
by uncomment the second TARGET
variable and DEBUG
variable.
#!/bin/bash
# TARGET=hw
TARGET=hw_emu
DEBUG=-g
Run the bitstream generation for hardware emulation
bash solver.xilinx_u280_xdma_201920_3.hw_generate_bitstream.sh
You will get an xclbin file under vitis_run_hw_emu
folder. Run the emulation by
./trig-solver --bitstream path/to/the/xclbin/file
Use the same bash script without uncommenting. Run the bitstream generation for FPGA fabric. There will be an xclbin file under vitis_run_hw
folder. Run the hardware by
./trig-solver --bitstream path/to/the/xclbin/file
We have already generated the bitstream for you under the bitstream
folder. So you can simply run
./trig-solver --bitstream bitstream/TrigSolver_xilinx_u280_xdma_201920_3_fwd.xclbin
Other useful reports include
TrigSolver_xilinx_u280_xdma_201920_3_fwd.xclbin.info
: information about clock speed and HBM/DDR usagesolver_final.tcl
: the floorplanning constraint we used
Power consumption and on-chip resource utilization are in the vitis_run_hw
folder.