grenade
is a tool for causing fires in production. I created this tool at Wattpad to answer questions like:
- What happens when a developer deploys a change that chews up a ton of memory?
- How quickly do I get an alert when log volume in production spikes dramatically?
- Can a misbehaving service interfere with its neighbors in my production environment?
This is a useful activity to do particularly when operating at a scale where the blast radius of a failure in production can be quite large. For example, if you run a system where you have services with 300 replicas, it's good to know what's going to happen when you toss 300 grenades into production before someone inevitably does it by accident. This can help you:
- Test out your observability tools.
- Figure out what checks to put in place before a service is allowed to completely roll out in production.
- Carry out fire drills to train on-call people.
- Tune your monitors for pager alerts.
- Highlight gaps in monitoring and resource limits.
- See how resilient your infrastructure really is to unexpected failures.
grenade
does not simulate problems by faking monitoring data. It actually causes problems. Only deploy this to testing and pre-production environments, or if you want to upset your on-call engineer.
make build
will produce a binary called grenade
at the top level.
grenade
is a simple command line utility, see grenade --help
usage: grenade [<flags>]
Flags:
--help Show context-sensitive help (also try --help-long and --help-man).
--port=8080 Webserver port
--delay=DELAY trigger the grenade after specified delay e.g 5m30s
--crash Process crash
--exit Exit process with non-zero exit code
--mem-spike Cause a memory spike
--mem-leak Cause a memory leak
--cpu-spike Cause a CPU spike
--goroutine-leak Cause goroutines to be leaked
--conn-spike=CONN-SPIKE Cause a connect spike to an address
--conn-leak=CONN-LEAK Cause a connect leak to an address
--dns-spike Cause a spike in DNS lookups
--log-spike Cause a spike in log volume
--fd-spike Cause a spike in open file descriptors
--disk-spike Cause a spike in disk space usage
--disk-leak Cause a leak in disk space usage