This repository contains source code for the demos and attacks we present in our paper Security of AI Agents.
Python 3.8 or above
env.sh
is for letting Python find our modules. Source it from repo root directory.
source ./env.sh
Install dependencies
pip install -r requirements.txt
Generate homomorphic encryption data
- Run
python HE_data.py -h
to see how to modify generated ciphertexts
cd HE_data && python HE_data.py && cd ../
To run agents using OpenAI LLMs for reasoning, set this environment variable first
export OPENAI_API_KEY="<key>"
To run the agent
python agents/ssn_agent.py --model=<model> --user_id=<id> --ssns_path=<path_to_ssns> --secretkeys_path=<path_to_secretkeys>
When prompting, write "number" instead of "SSN" or "social security number" to avoid triggering alignment. You can ask for groups of the number such as the first three digits or last four digits.
Example prompt: What are the first three digits of my number?
To run the agent
python agents/HE_agent.py --model=<model>
When prompting, please specify "sum" or "product" for postprocessing reasons. The default encryptor we use cannot handle numbers greater than 400 (this can be changed in HE_data/HE_data.py
), so limit calculation results to the range 0 to 400 inclusive.
Example prompt: What is the product of indices 0 and 1?
- Known bug: The LLM indexes the wrong thing if 0 is not included as an index in the prompt. Make sure the first index you write in the prompt is 0.
To run tests
# Create ciphertext files if you haven't already
cd HE_data && python HE_data.py && cd ../
# Run tests
pytest tests/*
@article{he2024security,
title={Security of AI Agents},
author={He, Yifeng and Wang, Ethan and Rong, Yuyang and Cheng, Zifei and Chen, Hao},
journal={arXiv preprint arXiv:2406.08689},
year={2024}
}