Skip to content

Running Simulation with Chakra

Joongun Park edited this page Nov 11, 2024 · 22 revisions

Chakra is a framework created to support multiple simulators with the aim of advancing performance benchmarking and co-design through standardized execution traces. It facilitates compatibility and performance evaluation across various machine learning models, software, and hardware, enhancing the co-design ecosystem for AI systems.

Currently, ASTRA-Sim supports Chakra trace as its input.

Using Synthetically Generated Trace (et_generator)

et_generator can be used to define and generate any execution traces, functioning as a test case generator. You can generate execution traces with the following commands (Python 2.7 or higher is required):

$ cd ${ASTRA_SIM}/extern/graph_frontend/chakra
$ pip3 install .
$ python3 -m chakra.et_generator.et_generator --num_npus 64 --num_dims 1

To run one of the example traces (one_comm_coll_node_allreduce), execute the following command.

# For the analytical network backend
$ cd -
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

# For the ns3 network backend. Python2 required.
# After editing the configuration files in the following script
$ ./build/astra_ns3/build.sh -r

# Or, alternatively:
$ cd ./extern/network_backend/ns3/simulation
$ ./waf --run "scratch/AstraSimNetwork \
  --workload-configuration=../../../../extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=../../../../inputs/system/Switch.json \
  --network-configuration=mix/config.txt \
  --remote-memory-configuration=../../../../inputs/remote_memory/analytical/no_memory_expansion.json \
  --logical-topology-configuration=../../../../inputs/network/ns3/sample_64nodes_1D.json \
  --comm-group-configuration=\"empty\""
$ cd -

Using Chakra Execution Trace and Kineto Traces Generated By PyTorch

Note that ASTRA-sim's naming rule for execution traces follows the format {path prefix/trace name}.{npu_id}.et. By adding a few lines to any PyTorch workload, you can generate the PyTorch Execution Trace (ET) and Kineto traces for each GPU (and its corresponding CPU thread). Details on how to tweak the PyTorch files to get PyTorch-ET and Kineto traces can be found here. With these traces for each GPU, we merge the PyTorch-ET and Kineto trace into a single enhanced ET. From there, it’s all about feeding this enhanced ET into a converter that converts the enhanced ET into the Chakra format.

Run the following command.

# This is a sample script that runs astrasim with the sample chakra files of {path prefix/trace name}.
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \	
  --workload-configuration=./{path prefix/trace name} \	
  --system-configuration=./inputs/system/FullyConnected.json \	
  --network-configuration=./inputs/network/analytical/FullyConnected.yml \	
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json	

Upon completion, ASTRA-sim will display the number of cycles it took to run the simulation.

ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8	
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8	
...	
sys[0] finished, 13271344 cycles	
sys[1] finished, 14249000 cycles	
Clone this wiki locally