To install hdev, please proceed by following these steps:
- Downloading the installer
- Running the installer
- Installing prerequisite software
- System and Vivado configuration
- Generating device configuration files
- Generating device information files
- Enabling hdev on a cluster
git clone https://github.com/fpgasystems/sgrt_install.git
Before running the installer, please ensure the following prerrequisites:
- The user executing the installer possesses sudo capabilities.
- Confirm the existence of the
/tmp
folder on the targeted server, and ensure that both$USER
and root have write permissions on it.
Once these prerequisites are confirmed, proceed with the installation:
sudo ./sgrt_install/run.sh
The first thing you need to do after running the installer is to provide a non-existing path where you want to install hdev. For example, designating /opt
will install hdev in /opt/hdev
.
After that, the installer will cotinue asking server-related and tool path questions. The following information is intended to assist you in making the correct path selections:
- LOCAL_PATH: This parameter designates a directory where the user (
$USER
) must have the required privileges to conduct read, write, and application execution operations. By default, this path is configured as/local/home/$USER
. - MY_DRIVERS_PATH: This parameter specifies a directory where the user (
$USER
) should possess the necessary permissions to employ thermmod
andinsmod
system calls. By default, this path is configured as/tmp/devices_acap_fpga_drivers
(where inserted driver files would be removed after a server reboot). - MY_PROJECTS_PATH: This parameter designates a directory where the user (
$USER
) must have the required privileges to conduct read, write, and application execution operations. The default setting is/home/$USER/my_projects
, where/home/$USER
typically corresponds to an NFS hard drive accessible from all servers within a cluster. - GITHUB_CLI_PATH: This field specifies the path to a valid GitHub CLI installation, with the default location set at
/usr/bin
. - ROCM_PATH: This field specifies the path to a valid ROCm installation, with the default location set at
/opt/rocm
. - UPDATES_PATH: This parameter designates a directory where sudo users have the required privileges to conduct read, write, and application execution operations. By default, this path is configured as
/tmp
. - XILINX_PLATFORMS_PATH: This parameter designates the path to the Xilinx platforms installed on the server. The default value is configured as
/opt/xilinx/platforms
. - XILINX_TOOLS_PATH: This field specifies the path to the Xilinx tools (Vivado, Vitis, Vitis_HLS) installed on the server. The default value is established as
/tools/Xilinx/
. - XILINXD_LICENSE_FILE: A list of verified license servers for Xilinx tools.
- XRT_PATH: This parameter designates the path to a valid Xilinx RunTime installation, with the default setting positioned at
/opt/xilinx/xrt
.
Please note that you have the flexibility to utilize any other environment variable distinct from $USER
to define your paths.
To ensure proper functionality, the following tools must be present on the server for hdev to run:
For those servers with reconfigurable devices, the following criteria apply:
- XRT (Xilinx RunTime): To ensure proper operation, a valid XRT version must be present in the designated
XRT_PATH
. - Vivado and Vitis_HLS: In order to run hdev effectively, it is mandatory to have valid versions of Vivado and Vitis_HLS installed within the specified
XILINX_TOOLS_PATH
. - Vitis: The inclusion of the Vitis Development Core is optional but can be beneficial. If you choose to install it, please ensure that it is also placed within the
XILINX_TOOLS_PATH
directory for seamless integration.
Finally, as a vital requirement, all the Xilinx accelerator cards mounted on the deployment server must have their deployment target platform toolkit available within the designated XILINX_PLATFORMS_PATH
directory.
For servers equipped with GPUs, a valid HIP/ROCm release must be present in the designated ROCM_PATH
directory.
Besides the tools listed above, the following are also required to make hdev fully operative:
- curl
- GitHub CLI
- jq
- python3
- uncrustify
The user groups all_users and vivado_developers should be added to /etc/sudoers.d
:
ALL ALL=NOPASSWD:$CLI_PATH/common/get_booking_system_servers_list,$CLI_PATH/program/vitis,$CLI_PATH/program/revert
all_users group contents.
vivado_developers ALL=(ALL) NOPASSWD:/sbin/reboot,/sbin/insmod,/sbin/rmmod,/sbin/iptables,$CLI_PATH/program/fpga_chmod,$CLI_PATH/program/pci_hot_plug,$CLI_PATH/program/vivado,$CLI_PATH/program/rescan,/usr/sbin/modprobe,$CLI_PATH/set/write
vivado_developer group contents.
where $CLI_PATH
represents hdev CLI path, for example /opt/hdev/cli
and must be declared as environmental varible.
In order to use the Vivado workflow, hedv requires to install cable drivers for Xilinx boards and configure udev rules.
- Install cable drivers:
cd $XILINX_VIVADO/data/xicom/cable_drivers/lin64/install_script/install_drivers/
./install_drivers
where $XILINX_VIVADO
is an environment variable related to XILINX_TOOLS_PATH
.
- Configure udev rules:
sudo sed -i '/^ACTION=="add", ATTR{idVendor}=="0403", ATTR{manufacturer}=="Xilinx"/c ACTION=="add", ATTR{idVendor}=="0403", ATTR{manufacturer}=="Xilinx", MODE:="666", GROUP:="vivado_developers"' /etc/udev/rules.d/52-xilinx-ftdi-usb.rules
where the vivado_developers group relates to the section above.
Installing cable drivers and configuring udev rules.
An essential hdev component are the device configuration files. Each server running hdev requires three files: one for networking devices, one for adaptable devices (ACAPs, ASoCs and FPGAs) and another for GPUs. These files are assumed to be correct, and what follows helps you generate them accurately.
A devices_network configuration file is located in $CLI_PATH/devices_network
and looks like this:
There is one row per networking device, and the columns represent the following information:
- Device Index: An autogenerated integer value, starting from 1.
- BDF: Identify the Bus Device Function (BDF) of your networking device using the
lspci | grep Ethernet
command. For NICs with multiple ports, capture only the function zero (e.g., 23:00.0). - Device Type: Must be set to "nic."
- Device Name: A representative string that identifies the vendor or model of your NIC.
- IP Addresses: Assign an IP address to each NIC port based on your networking configuration plan.
- MAC Addresses: Assign a MAC address to each NIC port based on your networking configuration plan.
A devices_acap_fpga configuration file is located in $CLI_PATH/devices_acap_fpga
and looks like this:
There is one row per reconfigurable device, and the columns represent the following information:
- Device Index: An autogenerated integer value, starting from 1.
- Upstream Port: Identify Xilinx reconfigurable devices' upstream ports using the command:
lspci | grep Xilinx | grep '\.0 '
(e.g.,a1:00.0
). - Root Port: Find the root port related to the upstream port using:
sudo lspci -t
(e.g.,a1:00.0
corresponds toa0:03.1
). - LnkCtl: Discover LnkCtl capabilities related to the root port using:
sudo lspci -vvv -s
(e.g.,sudo lspci -vvv -s a0:03.1
corresponds to58
). - Device Type: Select between "acap", "asoc", or "fpga" based on the Platform’s XSA Name revealed by the
xbutil examine
command. - Device Name: The device name can be found in the Vivado GUI when you open the hardware target of interest, e.g.,
xcu280_u55c_0
. - Serial Number: Obtain the serial number of the device of interest using the
sudo xbmgmt examine
command. - IP Addresses: Assign two IP addresses (one for each QSFP interface, separated by a slash) according to your IP configuration plan.
- MAC Addresses: Retrieve the MAC addresses corresponding to the IP addresses above for the device of interest using the
xbutil examine
command. - Platform: Determine the platform of the device of interest using the
xbutil examine
command.
Getting root port (step 3) and device name (step 6).
A devices_gpu configuration file is located in $CLI_PATH/devices_gpu
and looks like this:
As before, there is one row per GPU, and the columns represent the following information:
- Device Index: An autogenerated integer value, starting from 1.
- Bus: Determine the bus value using the
rocm-smi --showbus
command. - Device Type: Set to "gpu."
- GPU ID: Obtain the GPU ID using the
rocm-smi -i
command. - Serial Number: Find the serial number using the
rocm-smi --showserial
command. - Unique ID: Retrieve the unique ID using the
rocm-smi --showuniqueid
command.
Alongside the device configuration files, each server running hdev requires the $CLI_PATH/platforminfo
file, which contains pertinent details about clock speed, available resources, and memory. The Xilinx tool platforminfo
can assist in obtaining the appropriate values for these files.
platforminfo for three different servers: one mounting an Alveo U250 board, one mounting a U280 board, and one mounting one U55C and one Versal VCK5000.
Under the following assumptions, hdev can program bitstreams on remote servers’ ACAPs and FPGAs:
- hdev is successfully installed on all the servers you wish to include in your managed cluster.
- The remote servers are on the same IP network.
- You have the necessary SSH access permissions to interact with the remote servers.
- All target servers have replicated copies of the five SERVERS_LIST files, which are located in the
$CLI_PATH/constants
directory. To illustrate, here's an example of such files in a real hdev cluster:
The files ACAP_SERVERS_LIST, BUILD_SERVERS_LIST, FPGA_SERVERS_LIST, GPU_SERVERS_LIST, and VIRTUALIZED_SERVERS_LIST are replicated on all servers in the cluster.
For larger clusters with a significant amount of servers, consider using infrastrcture automation platforms for System and Vivado configuration and Generating device configuration files. As mentioned in Operating the cluster, ETHZ-HACC is using Ansible.