CoLExT is a testbed built for machine learning researchers to realistically execute and profile Federated Learning (FL) algorithms on real edge devices and smartphones. This repo contains the software library developed to seamlessly deploy and monitor FL experiments compatible with the Flower Framework. Consider checking the CoLExT website for an overview of the project.
CoLExT contains 20 smartphones and 28 Single Board Computers (SBC) and is hosted at the King Abdullah University of Science and Technology (KAUST). If you’d like to experiment with CoLExT, please show your interest by filling out this form.
Warning
CoLExT supports both SBC and Android deployments; however, Android deployment is not yet available in this branch.
Note: Requires prior approval. Please fill out the show of interest form if you're interested.
-
Access the CoLExT server. How to access CoLExT.
-
Install the CoLExT package in a local Python environment, e.g. with conda.
conda create -n colext_env python=3.10 && conda actiate colext_env # The plotting extras automatically generatess metric plots (colext_env)$ pip install colext[plotting] git+https://[email protected]/sands-lab/colext.git
-
In the FL code, import the
colext
decorators and wrap Flower's client and strategy classes. Note: If used outside of the testbed, these decorators do not modify the program behavior and thus can safely be included in the code in general.from colext import MonitorFlwrClient, MonitorFlwrStrategy @MonitorFlwrClient class FlowerClient(fl.client.NumPyClient): [...] @MonitorFlwrStrategy class FlowerStrategy(flwr.server.strategy.Strategy): [...]
-
Create a CoLExT configuration file
colext_config.yaml
. CoLExT exposes bash environment variables with information about the experiment. Please remember to pass the FL server address to the clients. Here's an example configuration file using SBC devices:# colext_config.yaml project: colext_example code: client: # Assumes relative paths from the config file entrypoint: "./src/client.py" args: "--client_id=${COLEXT_CLIENT_ID} --server_addr=${COLEXT_SERVER_ADDRESS}" server: entrypoint: "./src/server.py" args: "--n_clients=${COLEXT_N_CLIENTS} --n_rounds=3" devices: - { device_type: LattePandaDelta3, count: 4 } - { device_type: OrangePi5B, count: 2 } - { device_type: JetsonOrinNano, count: 4 } monitoring: scraping_interval: 0.3 # in seconds push_to_db_interval: 10 # in seconds
-
Specify your Python dependencies using a
requirements.txt
file on the same directory as the CoLExT configuration file. -
Deploy, monitor the experiment in real time, and download the collected performance metrics as CSV files.
# Execute in the directory with the colext_config.yaml $ colext_launch_job # Prints a job-id and a dashboard link # After the job finishes, retrieve metrics for job-id as CSV files $ colext_get_metrics --job_id <job-id>
Dashboard example:
-
To confirm the deployment is working, try launching an example:
$ cd colext/examples/flwr_tutorial $ colext_launch_job
Continue reading for more information on the above steps and check the tips section for the deployment type you're interested in:
- SBC deployment
- Android deployment (Not yet in this repo)
This section describes the possible configuration options for the CoLExT configuration file.
# SBCs
- { device_type: JetsonAGXOrin, count: 2 }
- { device_type: JetsonOrinNano, count: 4 }
- { device_type: JetsonXavierNX, count: 2 }
- { device_type: JetsonNano, count: 6 }
- { device_type: LattePandaDelta3, count: 6 }
- { device_type: OrangePi5B, count: 8 }
# !!! Currently, this config file will not work with smartphones !!!
# Smartphones
- { device_type: SamsungXCover6Pro, count: 3 }
- { device_type: SamsungGalaxyM54, count: 2 }
- { device_type: Xiaomi12, count: 2 }
- { device_type: XiaomiPocoX5Pro, count: 2 }
- { device_type: GooglePixel7, count: 5 }
- { device_type: AsusRogPhone6, count: 2 }
- { device_type: OnePlusNord2T5G, count: 2 }
CoLExT exposes several bash environment variables that are passed to the execution environment. These can be used as arguments in the args
section of the client and server code by expanding the variables as ${ENV_VAR}
. See the usage example for an example.
Name | Description |
---|---|
COLEXT_CLIENT_ID | ID of the client (0..n_clients) |
COLEXT_N_CLIENTS | Number of clients in experiment |
COLEXT_DEVICE_TYPE | Hardware type of the client |
COLEXT_SERVER_ADDRESS | Server address (host:port) |
COLEXT_PYTORCH_DATASETS | Pytorch datasets caching path |
COLEXT_JOB_ID | Experiment job ID |
monitoring:
live_metrics: True # True/False: True if metrics are pushed in real-time
push_interval: 10 # in seconds: Metric buffer time before pushing metrics to the DB
scraping_interval: 0.3 # in seconds: Interval between metric scraping
Deployers:
- sbc (default) - Deployer for SBC experiments. It's the default deployer.
- local_py - Deployer that launches a local experiment. Clients and the server are launched in the CoLExT server.
- android (pending merge) - This deployer has been developed but needs to be merged here.
Python versions: 3.10 (default) | 3.8.
deployer: local_py
code:
python_version: "3.8"
After calling colext_get_metrics --job_id <job-id>
, csv files prepended with colext_<job_id>_
are downloaded to the current directory.
Here are the contents for each CSV:
- round_number: Number of the FL round
- stage: Stage of the round: FIT or EVAL
- start_time: Start of the round as measured by the FL server
- end_time: End of the round as measured by the FL server
- srv_accuracy: Flwr strategy evaluate result with dict key "accuracy"
- dist_accuracy: Flwr strategy aggregate_evaluate result with dict key "accuracy"
- client_id: ID of the client
- device_type: Device type as requested in the config file
- device_name: Name of the device with the associated device type
- client_id: ID of the client
- round_number: Number of the FL round
- stage: Stage of the round: FIT or EVAL
- start_time: Start of the round as measured by the client
- end_time: End of the round as measured by the client
- client_id: ID of the client
- time: Local timestamp when the metrics were collected
- cpu_util: CPU Utilization (%) - Percentage over 100%, indicates multiple cores being used
- gpu_util: GPU Utilization (%)
- mem_util: Memory Utilization (Bytes) - The memory reported is the RSS memory.
- n_bytes_sent: Number of bytes sent (Bytes)
- n_bytes_rcvd: Number of bytes received (Bytes)
- net_usage_out: Upload bandwidth usage (Bytes/s)
- net_usage_in: Download bandwidth usage (Bytes/s)
- power_consumption: Power consumption (Watts) - Reported power differs between devices
- Nvidia Jetsons: Entire board power consumption
- LattePandas: CPU power consumption
- OrangePis: Can be measured using High Voltage Power Meter
Coming soon...
The SBC deployment containerizes user code and deploys it using Kubernetes.
Before starting the deployment process, it's recommended to make sure that a local deployment is working as expected.
Local deployment can be used by changing the deployer to local_py
. More information on setting the deployer here.
Once this is verified, check the containarization is working as expected by running the colext_launch_job
command in the "prepare" mode.
This mode will perform checks and containerize the application making sure all the dependencies can be installed in the container.
# Prepares app for deployment
$ colext_launch_job -p
Files can be excluded from the automatic containerization by using a .dockerignore
file on the same directory as the colext_config.yaml
. For details on .dockerignore
, check the docker documentation.
Once the code is successfully deployed to CoLExT, it can be useful to debug issues using the Kubernetes CLI. In our deployment, we're using Microk8s with an alias to kubectl we called mk
. Here are the most useful commands:
# See current pods deployed in the cluster
$ mk get pods -o wide
# See and (f)ollow the logs of a specific pod
$ mk logs <pod_name> -f
# Read logs of a failed pod
$ mk logs <pod_name> -p
Currently, CoLExT does not work with dependencies specified by Poetry. These can easily be converted to the supported requirements.txt format with these commands:
# Prevents Poetry resolver from being stuck waiting for a keyring that we don't need
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
poetry export --without-hashes -f requirements.txt --output requirements.txt
Coming soon...
Currently, the CoLExT server is not reachable through a public IP. To enable access to the server, we're using ZeroTier to create a virtual private network. This allows external users to interact with the server as if they were directly on the same private network.
- Install ZeroTier.
- Get your ZeroTier device ID. The ID is displayed at installation and can be retrieved later with:
sudo zerotier-cli info | awk '{print $3}'
- Share your ZeroTier device ID and a public SSH key with your CoLExT contact point
- Add an SSH config. Note that you need to replace the username:
# Add to ~/.ssh/config Host colext User <your_username> Hostname 10.244.96.246 ForwardAgent yes # Required to use local SSH keys in CoLExT
- Add the CoLExT server to your hosts file.
# Add to /etc/hosts 10.244.96.246 colext
- Wait for your device to be added to the ZeroTier network
- Test connection to CoLExT server:
# Confirm connectivity $ ping colext # Confirm ssh access $ ssh colext
At this point, if you're just getting started, continue reading the next step in the using colext section. If you find issues connecting to CoLExT, reach out to your CoLExT concact point.
Install the CoLExT package locally with the --editable flag.
$ python3 -m pip install -e <root_dir>
Useful:
- Experiment with launching an example using the local deployer
# colext_config.yaml deployer: local_py
- Prepare the deployment only by launching the job with the prepare flag:
$ colext_launch -p
.
├── src/colext/ # Python package used to deploy user code and interact with results
│ ├── scripts/ # Folder with CoLExT CLI commands: launch_job + get_metrics
├── examples/ # Example of Flower code integrations with CoLExT
├── plotting/ # Ploting related code
├── colext_setup/ # CoLExT setup automation
│ ├── ansible/ # Ansible playbooks that perform the initial configuration of SBC devices
│ ├── db_setup/ # DB schema and initial DB populate file
│ ├── base_docker_imgs/ # Base docker images used when containerizing the code
- Currently, CoLExT only directly supports FL code using the Flower framework 1.5 + 1.6.
tensorflow
package does not work with LattePandas. Tensorflow builds from PiPy for x86 arch expect them to have support for AVX instructions, but the LattePandas do not have them.- Currently, Nvidia Jetsons defaults to supporting Pytorch 2.2.0. Except for Jetson Nanos, which only support up to Pytorch 1.13. Additional Pytorch versions can be supported upon request.
- Currently, only Python version 3.8 and 3.10 are supported.