This project is based on the CUDA Scheduling Examiner framework, which made it convenient to configure and examine the behavior of multiple GPU-sharing tasks on NVIDIA. The HIP plugin framework is architecturally similar, but has been overhauled to use more intuitive terminology and to remove any NVIDIA-specific code. (Which, unfortunately means that this project is unable to plot which compute units blocks are assigned to.)
To cite this work, either link the repository or cite the following paper:
@inproceedings{otterness2021amd,
title={Exploring {AMD} {GPU} Scheduling Details by Experimenting With ``Worst Practices''},
author={Otterness, Nathan and Anderson, James H.},
booktitle={International Conference on Real-Time Networks and Systems (RTNS)},
year={2021}
}
Some of this project may require modified versions of AMD's HIP framework or other ROCm components (as it is intended to support my research). If any such modifications are needed, they should be included in this repository. As of ROCm 3.7, however, this repository should not require any non-standard HIP functionality for most basic usage.
Components related to setting GPU deadlines will not work unless the Linux
kernel module and patches located in
this repository, are present and
loaded. However, this is not required if you aren't using locking; the
framework will run without it as long as configs don't include the
job_deadline
setting.
Compiling this project requires HIP, and hipcc
must be on your PATH
. Only
Linux is supported for now, and only AMD GPUs. (Compilation should also work
on NVIDIA, but is not a priority and may require modifying the makefile.)
To build:
git clone https://github.com/yalue/hip_plugin_framework
cd hip_plugin_framework
make
To test it, run:
./bin/runner configs/simple.json
The configuration files specify parameters passed to each plugin along with
some global settings for the entire framework. See configs/simple.json
for a
minimal working example, using the Mandelbrot-set plugin.
The layout of each configuration file is as follows:
{
"name": <String. Required. The name of this scenario.>,
"max_iterations": <Number. Required. Default cap on the number of iterations
for each plugin. 0 = unlimited.>,
"max_time": <Number. Required. Default cap on the number of number of seconds
to run each plugin. 0 = unlimited.>,
"use_processes": <Boolean, defaulting to false. If this is true, each
plugin is run in a separate process. Normally, they run as threads.>
"gpu_device_id": <Number. Required. The HIP device to use.>,
"base_result_directory": <String, defaulting to "./results". This is the
directory into which individual JSON files from each plugin will be
written. It must already exist.>,
"pin_cpus": <Boolean. Optional, defaults to false. If true, attempt to pin
plugins to CPU cores, evenly distributed across cores. If true, individual
plugin's cpu_core settings are ignored.>,
"do_warmup": <Boolean. Optional, defaults to false. If true, the framework
will run a warmup iteration of each plugin immediately after
initialization. The times from the warmup iteration will not be included in
result logs, so this option can be used to make sure code and data is
brought into the relevant caches, if possible, prior to the first
iteration.>,
"sync_every_iteration": <Boolean. Optional, defaults to false. If true,
iterations of each plugins start only when all plugins have completed their
previous iteration. By default, each plugin only waits for its own previous
iteration to complete.>,
"omit_block_times": <Boolean. Optional, defaults to false. If true, block
times will not be listed in results JSON files. Instead, the block_times
field will always be set to an empty array. This can save time and space
when running experiments where you don't care about block times. They'll
still be recorded by plugins--just not written to the output file.>,
"plugins": [
{
"filename": <String. Required. The path to the plugin shared library,
relative to the current working directory.>,
"log_name": <String. Optional. The filename of the JSON log for this
particular plugin. If not provided, this plugin's log will be given a
default name based on its filename, process and thread ID. If this
doesn't start with '/', it will be relative to base_result_directory.
This can be /dev/null to throw away logs.>,
"label:": <String. Optional. A label or name for this specific plugin, to
be copied to its output file.>,
"thread_count": <Number or array. Required, but may be ignored. The
number of HIP threads that each block of this plugin should use. May
also be an array with up to 3 integers, specifying a multi-dimensional
block size.>,
"block_count": <Number or array. Required, but may be ignored. The number
of HIP blocks this plugin's kernels should use. May also be an array
with up to 3 integers, specifying a multi-dimensional grid size.>,
"additional_info": <A JSON object of any format. Optional. This can be
used to pass additional plugin-specific configuration parameters.>,
"max_iterations": <Number. Optional. If specified, overrides the default
max_iterations for this plugin alone. 0 = unlimited. If this is
provided for any plugin, then sync_every_iteration must be false.>,
"max_time": <Number. Optional. If specified, overrides the default
max_time for this plugin alone. 0 = unlimited.>,
"release_time": <Number. Optional. If set, this plugin will sleep for the
given number of seconds (between initialization and the start of the
first iteration) before beginning execution.>,
"initialization_delay": <Number. Optional. If set, the framework will
sleep for the given number of seconds before calling the plugin's
initialization function. Intended to be used as a crude mechanism for
enforcing initialization order.>,
"job_deadline": <Number. Optional. If set, must be a floating-point
number of seconds specifying a relative deadline within which each
of the plugin's iterations must complete. (In other words, the relative
deadline is updated at the start of every iteration.) It is an error
to set this if the GPU locking modules isn't available (see the
Prerequisites section of this README). If not present, no deadlines
will be set. Must be positive if set.>,
"cpu_core": <Number. Optional. If specified, and pin_cpus is false, the
system will attempt to pin this plugin onto the given CPU core.>
"compute_unit_mask": <Optional. Can be an array of booleans, or a string
of binary digits, or a hexadecimal string starting with "0x". This will
be used by the framework to attempt to configure the CUs on which the
plugin's kernels are allowed to run. If the value is a string, it is
interpreted as a list of bits, where if bit i is 1, then execution is
allowed on CU i. If the value is an array of booleans execution is
allowed on CU i if entry i is true. If this is not provided or too
short, then unspecified values default to being enabled.>
},
{
<more plugin instances can be listed here>
}
]
}
Additionally, configurations support the insertion of comments via the usage of "comment" keys, which will be ignored at runtime.
Each plugin, when run, will generate a JSON log file at the location specified in the configuration. If the plugin did not complete successfully, the JSON file may be in an invalid state. Times will be recorded as floating-point numbers of seconds. The format of the log file is:
{
"scenario_name": "<Scenario name>",
"plugin_name": "<Plugin name>",
"label": "<This plugin's label, if given in the config>",
"max_resident_threads": <The maximum number of threads that can be assigned
to the GPU at a time (from all plugins in the scenario)>,
"data_size": <Data size>,
"release_time": <Requested release time in seconds>,
"PID": <pid>,
"TID": <The thread ID, if plugins were run as threads>,
"times": [
{},
{
"cpu_times": [
<The CPU time before the copy_in function was called>,
<The CPU time after the copy_out function returned>
],
"copy_in_times": [
<The CPU time before the copy_in function was called>,
<The CPU time after the copy_in function returned>
],
"execute_times": [
<The CPU time when the execute function was called>,
<The CPU time after the execute function returned>
],
"copy_out_times": [
<The CPU time when the copy_out function was called>,
<The CPU time after the copy_out function returned>
],
"cpu_core": <The current CPU core being used>
},
{
"kernel_name": <The name of this particular kernel. May be omitted.>,
"block_count": <The number of blocks in this kernel invocation.>,
"thread_count": <The number of threads per block in this invocation.>,
"shared_memory": <The amount of shared memory used by this kernel.>,
"kernel_launch_times": [<CPU time immediately before the kernel launch.>,
<CPU time immediately after kernel launch returned.>,
<CPU time immediately after hipStreamSynchronize returned. This will
be set to 0 if hipStreamSynchronize isn't called for this kernel.>],
"block_times": [<Start time>, <End time>, ..., <This may be empty if the
plugin doesn't record block times. This contains block start and end
times in *millions of GPU cycles*, not seconds.>]
},
...
]
}
Notice that the first entry in the "times" array will be blank and should be
ignored. The times array will contain two types of objects: one will contain
CPU times and one type will apply to kernel times. An object containing CPU
times will contain a "cpu_times"
key. A single CPU times object will
encompass all kernel times following it, up until another CPU times object.
Each plugin must be contained in a shared library and abide by the interface
specified in src/plugin_interface.h
. In particular, the library must export
a RegisterPlugin
function, which provides the addresses of further functions
to the framework. Plugins should preferably never use global state and instead
use the user_data
pointer returned by the initialize function to track all
state. The reason for this is that we want to be able to run multiple instances
of a single plugin at a time--global variables prevent instances of a single
plugin from being independent. Similarly to global variables, plugins should
use a user-created HIP stream in order to avoid unnecessarily blocking each
other by hipDeviceSynchronize
(or similar) calls.
The most important piece of information that each plugin provides is the
TimingInformation
struct, which it must fill in during its copy_out
function. This struct will contain a list of KernelTimes
structs, one for
each kernel invocation called during execute
. Each KernelTimes
struct will
contain the kernel start and end times and, if possible, individual block start
and end times (we recognize this may be quite obnoxious to add to some plugins,
so block_times
are treated as quite desirable, but still optional). The
plugin is responsible for ensuring that the buffers provided in the
TimingInformation
struct remain valid at least until another plugin function
is called. They will not be freed by the caller.
In general, the comments in plugin_interface.h
provide an explanation for
the actions that every plugin-provided function is expected to carry out. The
src/mandelbrot.cpp
file contains a well-commented example of a working plugin
implementation.
In addition to plugin_interface.h
, plugin_utilities.h
and
plugin_hip_utilities.h
define a library of utility functions that may be
used by plugins.
Plugins are invoked by the framework as follows:
-
The shared library file is loaded using the
dlopen()
function, and theRegisterPlugin
function is located usingdlysym()
. -
Depending on the configuration, either a new process or new thread will be created for each plugin.
-
In its own thread or process, the plugin's
initialize
function will be called, in which the plugin should allocate and initialize all of the local state necessary for one instance of itself. -
When the plugin begins running, a single "iteration" will consist of the plugin's
copy_in
,execute
, andcopy_out
functions being called, in that order. -
When enough time has elapsed or the maximum number of iterations has been reached, the plugin's
cleanup
function will be called, to allow for the plugin to clean up and free its local state. -
If any of the plugin's functions, apart from
initialize
return an error, the framework will still call the plugin'scleanup
function, and then cease calling further functions from the plugin.
Even though HIP supports C++, contributions to this project should use the C programming language when possible. C or HIP source code should adhere to the parts of the Google C++ Style Guide that apply to the C language.
Scripts should remain in the scripts/
directory and should be written in
python when possible. For now, there is no explicit style guide for python
scripts apart from trying to maintain a consistent style within each file.
This project contains several plugins that have been ported from the Rodinia
benchmark suite. The
code for these plugins are located in the src/third_party/rodinia_plugins
directory. The four plugins, backprop
, dwt2d
, gaussian
, and
particlefilter
have been modified to clean up code, remove global variables,
and to adhere to the plugin interface required by the project.
Compile these plugins by running make rodinia_plugins
in the main project
directory. Some sample configs exist for testing these plugins in configs/
.
For example, running ./bin/runner configs/particlefilter.json
launches a
single instance of the particlefilter
plugin.