Running Simulation with Chakra

Chakra is a framework created to support multiple simulators with the aim of advancing performance benchmarking and co-design through standardized execution traces. It facilitates compatibility and performance evaluation across various machine learning models, software, and hardware, enhancing the co-design ecosystem for AI systems.

Currently, ASTRA-Sim supports Chakra trace as its input.

Using Synthetically Generated Trace (et_generator)

et_generator can be used to define and generate any execution traces, functioning as a test case generator. You can generate execution traces with the following commands (Python 2.7 or higher is required):

$ cd ${ASTRA_SIM}/extern/graph_frontend/chakra
$ pip3 install .
$ python3 -m chakra.et_generator.et_generator --num_npus 64 --num_dims 1

To run one of the example traces (one_comm_coll_node_allreduce), execute the following command.

# For the analytical network backend
$ cd -
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

# For the ns3 network backend. Python2 required.
# After editing the configuration files in the following script
$ ./build/astra_ns3/build.sh -r

# Or, alternatively:
$ cd ./extern/network_backend/ns3/simulation
$ ./waf --run "scratch/AstraSimNetwork \
  --workload-configuration=../../../../extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=../../../../inputs/system/Switch.json \
  --network-configuration=mix/config.txt \
  --remote-memory-configuration=../../../../inputs/remote_memory/analytical/no_memory_expansion.json \
  --logical-topology-configuration=../../../../inputs/network/ns3/sample_64nodes_1D.json \
  --comm-group-configuration=\"empty\""
$ cd -

Using Chakra Execution Trace and Kineto Traces Generated By PyTorch

Note that ASTRA-sim's naming rule for execution traces follows the format {path prefix/trace name}.{npu_id}.et. By adding a few lines to any PyTorch workload, you can generate the PyTorch Execution Trace (ET) and Kineto traces for each GPU (and its corresponding CPU thread). Details on how to tweak the PyTorch files to get PyTorch-ET and Kineto traces can be found here. With these traces for each GPU, we merge the PyTorch-ET and Kineto trace into a single enhanced ET. From there, it’s all about feeding this enhanced ET into a converter that converts the enhanced ET into the Chakra format.

Run the following command.

# This is a sample script that runs astrasim with the sample chakra files of {path prefix/trace name}.
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \	
  --workload-configuration=./{path prefix/trace name} \	
  --system-configuration=./inputs/system/FullyConnected.json \	
  --network-configuration=./inputs/network/analytical/FullyConnected.yml \	
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

Upon completion, ASTRA-sim will display the number of cycles it took to run the simulation.

ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8	
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8	
...	
sys[0] finished, 13271344 cycles	
sys[1] finished, 14249000 cycles

Home
- Overview
Community
Chakra Schema Release Notes
- Chakra Schema
- v0.0.4
Tools & Applications
- Replay Tools
- Simulation Tools
Getting Started
Chakra Framework Explanation
Resources
- Publications
- Videos
- Google Drive
- FAQ
- Design Documents
  - Chakra Execution Trace Observer Internals
Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Simulation with Chakra

Using Synthetically Generated Trace (et_generator)

Using Chakra Execution Trace and Kineto Traces Generated By PyTorch

Clone this wiki locally