Skip to content

Commit

Permalink
Merge pull request #38 from AlejandroRivera/feature/cpu-monitoring
Browse files Browse the repository at this point in the history
Added a new Profiler for CPU/JVM CPU monitoring using (Sun/Oracle) Java7+ JMX Bean
  • Loading branch information
ajsquared committed Feb 19, 2016
2 parents 09135df + feafbc6 commit 72401f3
Show file tree
Hide file tree
Showing 15 changed files with 250 additions and 51 deletions.
60 changes: 48 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ port | The port number for the server to which the reporter should s
prefix | The prefix for metrics (optional, defaults to statsd-jvm-profiler)
packageWhitelist | Colon-delimited whitelist for packages to include (optional, defaults to include everything)
packageBlacklist | Colon-delimited whitelist for packages to exclude (optional, defaults to exclude nothing)
profilers | Colon-delimited list of profiler class names (optional, defaults to CPUProfiler and MemoryProfiler)
profilers | Colon-delimited list of profiler class names (optional, defaults to `CPUTracingProfiler` and `MemoryProfiler`)
reporter | Class name of the reporter to use (optional, defaults to StatsDReporter)
httpServerEnabled| Determines if the embedded HTTP server should be started. (optional, defaults to `true`)
httpPort | The port on which to bind the embedded HTTP server (optional, defaults to 5005). If this port is already in use, the next free port will be taken.
Expand Down Expand Up @@ -100,28 +100,64 @@ If the `tagMapping` argument is not defined, only the `prefix` tag will be added

If you do not want to include a component of `prefix` as a tag, use the special name `SKIP` in `tagMapping` for that position.

## Metrics
## Profilers

`statsd-jvm-profiler` will profile the following:
`statsd-jvm-profiler` offers 3 profilers: `MemoryProfiler`, `CPUTracingProfiler` and `CPULoadProfiler`.

The metrics for all these profilers will prefixed with the value from the `prefix` argument or it's default value: `statsd-jvm-profiler`.

You can enable specific profilers through the `profilers` argument like so:
1. Memory metrics only: `profilers=MemoryProfiler`
2. CPU Tracing metrics only: `profilers=CPUTracingProfiler`
3. JVM/System CPU load metrics only: `profilers=CPULoadProfiler`

Default value: `profilers=MemoryProfiler:CPUTracingProfiler`

### Garbage Collector and Memory Profiler: `MemoryProfiler`
This profiler will record:

1. Heap and non-heap memory usage
2. Number of GC pauses and GC time
3. Time spent in each function

Assuming you use the default prefix of `statsd-jvm-profiler`, the memory usage metrics will be under `statsd-jvm-profiler.heap` and `statsd-jvm-profiler.nonheap`, the GC metrics will be under `statsd-jvm-profiler.gc`, and the CPU time metrics will be under `statsd-jvm-profiler.cpu.trace`.
Assuming you use the default prefix of `statsd-jvm-profiler`,
the memory usage metrics will be under `statsd-jvm-profiler.heap` and `statsd-jvm-profiler.nonheap`,
the GC metrics will be under `statsd-jvm-profiler.gc`.

Memory and GC metrics are reported once every 10 seconds. The CPU time is sampled every millisecond, but only reported every 10 seconds. The CPU time metrics represent the total time spent in that function.
Memory and GC metrics are reported once every 10 seconds.

Profiling a long-running process or a lot of processes simultaneously will produce a lot of data, so be careful with the capacity of your StatsD instance. The `packageWhitelist` and `packageBlacklist` arguments can be used to limit the number of functions that are reported. Any function whose stack trace contains a function in one of the whitelisted packages will be included.
### CPU Tracing Profiler: `CPUTracingProfiler`
This profiler records the time spent in each function across all Threads.

You can disable either the memory or CPU metrics using the `profilers` argument:
Assuming you use the default prefix of `statsd-jvm-profiler`, the the CPU time metrics will be under `statsd-jvm-profiler.cpu.trace`.

1. Memory metrics only: `profilers=MemoryProfiler`
2. CPU metrics only: `profilers=CPUProfiler`
The CPU time is sampled every millisecond, but only reported every 10 seconds.
The CPU time metrics represent the total time spent in that function.

Profiling a long-running process or a lot of processes simultaneously will produce a lot of data, so be careful with the
capacity of your StatsD instance. The `packageWhitelist` and `packageBlacklist` arguments can be used to limit the number
of functions that are reported. Any function whose stack trace contains a function in one of the whitelisted packages will be included.

The `visualization` directory contains some utilities for visualizing the output of this profiler.

### JVM And System CPU Load Profiler: `CPULoadProfiler`

This profiler will record the JVM's and the overall system's CPU load, if the JVM is capable of providing this information.

Assuming you use the default prefix of `statsd-jvm-profiler`, the JVM CPU load metrics will be under `statsd-jvm-profiler.cpu.jvm`,
and the System CPU load wil be under `statsd-jvm-profiler.cpu.system`.

The reported metrics will be percentages in the range of [0, 100] with 1 decimal precision.

## Visualization
CPU load metrics are sampled and reported once every 10 seconds.

The `visualization` directory contains some utilities for visualizing the output of the profiler.
Important notes:
* This Profiler is not enabled by default. To enable use the argument `profilers=CPULoadProfiler`
* This Profiler relies on Sun/Oracle-specific JVM implementations that offer a JMX bean that might not be available in other JVMs.
Even if you are using the right JVM, there's no guarantee this JMX bean will remain there in the future.
* The minimum required JVM version that offers support for this is for Java 7.
* See [com.sun.management.OperatingSystemMXBean](https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getProcessCpuLoad())
for more information.
* If the JVM doesn't support the required operations, the metrics above won't be reported at all.

## Dynamic Loading of Agent

Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<groupId>com.etsy</groupId>
<artifactId>statsd-jvm-profiler</artifactId>
<version>1.0.3-SNAPSHOT</version>
<version>2.0.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>statsd-jvm-profiler</name>
Expand Down
4 changes: 2 additions & 2 deletions src/main/java/com/etsy/statsd/profiler/Arguments.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
package com.etsy.statsd.profiler;

import com.etsy.statsd.profiler.profilers.CPUProfiler;
import com.etsy.statsd.profiler.profilers.CPUTracingProfiler;
import com.etsy.statsd.profiler.profilers.MemoryProfiler;
import com.etsy.statsd.profiler.reporter.Reporter;
import com.etsy.statsd.profiler.reporter.StatsDReporter;
Expand Down Expand Up @@ -97,7 +97,7 @@ private Class<? extends Reporter<?>> parserReporterArg(String reporterArg) {
private Set<Class<? extends Profiler>> parseProfilerArg(String profilerArg) {
Set<Class<? extends Profiler>> parsedProfilers = new HashSet<>();
if (profilerArg == null) {
parsedProfilers.add(CPUProfiler.class);
parsedProfilers.add(CPUTracingProfiler.class);
parsedProfilers.add(MemoryProfiler.class);
} else {
for (String p : profilerArg.split(":")) {
Expand Down
12 changes: 10 additions & 2 deletions src/main/java/com/etsy/statsd/profiler/Profiler.java
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ public Profiler(Reporter reporter, Arguments arguments) {
public abstract TimeUnit getTimeUnit();

/**
* CPUProfiler can emit some metrics that indicate the upper and lower bound on the length of stack traces
* CPUTracingProfiler can emit some metrics that indicate the upper and lower bound on the length of stack traces
* This is helpful for querying this data for some backends (such as Graphite) that do not have rich query languages
* Reporters can override this to disable these metrics
*
Expand Down Expand Up @@ -76,13 +76,21 @@ protected void recordGaugeValue(String key, long value) {
reporter.recordGaugeValue(key, value);
}

/**
* @see #recordGaugeValue(String, long)
*/
protected void recordGaugeValue(String key, double value) {
recordedStats++;
reporter.recordGaugeValue(key, value);
}

/**
* Record multiple gauge values
* This is useful for reporters that can send points in batch
*
* @param gauges A map of gauge names to values
*/
protected void recordGaugeValues(Map<String, Long> gauges) {
protected void recordGaugeValues(Map<String, ? extends Number> gauges) {
recordedStats++;
reporter.recordGaugeValues(gauges);
}
Expand Down
105 changes: 105 additions & 0 deletions src/main/java/com/etsy/statsd/profiler/profilers/CPULoadProfiler.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
package com.etsy.statsd.profiler.profilers;

import com.google.common.collect.ImmutableMap;

import com.etsy.statsd.profiler.Arguments;
import com.etsy.statsd.profiler.Profiler;
import com.etsy.statsd.profiler.reporter.Reporter;

import java.lang.management.ManagementFactory;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import javax.management.Attribute;
import javax.management.AttributeList;
import javax.management.InstanceNotFoundException;
import javax.management.MBeanServer;
import javax.management.MalformedObjectNameException;
import javax.management.ObjectName;
import javax.management.ReflectionException;

/**
* This profiler retrieves CPU values for the JVM and System from the "OperatingSystem" JMX Bean.
* <p>
* This profiler relies on a JMX bean that might not be available in all JVM implementations.
* We know for sure it's available in Sun/Oracle's JRE 7+, but there are no guarantees it
* will remain there for the foreseeable future.
*
* @see <a href="http://stackoverflow.com/questions/3044841/cpu-usage-mbean-on-sun-jvm">StackOverflow post</a>
*
* @author Alejandro Rivera
*/
public class CPULoadProfiler extends Profiler {

public static final long PERIOD = 10;
private static final Map<String, String> ATTRIBUTES_MAP = ImmutableMap.of("ProcessCpuLoad", "cpu.jvm",
"SystemCpuLoad", "cpu.system");

private AttributeList list;

public CPULoadProfiler(Reporter reporter, Arguments arguments) {
super(reporter, arguments);
try {
MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
ObjectName os = ObjectName.getInstance("java.lang:type=OperatingSystem");
list = mbs.getAttributes(os, ATTRIBUTES_MAP.keySet().toArray(new String[ATTRIBUTES_MAP.size()]));
} catch (InstanceNotFoundException | ReflectionException | MalformedObjectNameException e) {
list = null;
}

}

/**
* Profile memory usage and GC statistics
*/
@Override
public void profile() {
recordStats();
}

@Override
public void flushData() {
recordStats();
}

@Override
public long getPeriod() {
return PERIOD;
}

@Override
public TimeUnit getTimeUnit() {
return TimeUnit.SECONDS;
}

@Override
protected void handleArguments(Arguments arguments) { /* No arguments needed */ }

/**
* Records all memory statistics
*/
private void recordStats() {
if (list == null) {
return;
}

Attribute att;
Double value;
String metric;
for (Object o : list) {
att = (Attribute) o;
value = (Double) att.getValue();

if (value == null || value == -1.0) {
continue;
}

metric = ATTRIBUTES_MAP.get(att.getName());
if (metric == null) {
continue;
}

value = ((int) (value * 1000)) / 10.0d; // 0-100 with 1-decimal precision
recordGaugeValue(metric, value);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
*
* @author Andrew Johnson
*/
public class CPUProfiler extends Profiler {
public class CPUTracingProfiler extends Profiler {
private static final String PACKAGE_WHITELIST_ARG = "packageWhitelist";
private static final String PACKAGE_BLACKLIST_ARG = "packageBlacklist";

Expand All @@ -35,7 +35,7 @@ public class CPUProfiler extends Profiler {
private final long reportingFrequency;


public CPUProfiler(Reporter reporter, Arguments arguments) {
public CPUTracingProfiler(Reporter reporter, Arguments arguments) {
super(reporter, arguments);
traces = new CPUTraces();
profileCount = 0;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,24 +50,32 @@ public void recordGaugeValue(String key, long value) {
recordGaugeValues(gauges);
}

/**
* @see #recordGaugeValue(String, long)
*/
@Override
public void recordGaugeValue(String key, double value) {
Map<String, ? extends Number> gauges = ImmutableMap.of(key, value);
recordGaugeValues(gauges);
}

/**
* Record multiple gauge values in InfluxDB
*
* @param gauges A map of gauge names to values
*/
@Override
public void recordGaugeValues(Map<String, Long> gauges) {
public void recordGaugeValues(Map<String, ? extends Number> gauges) {
long time = System.currentTimeMillis();
BatchPoints batchPoints = BatchPoints.database(database)
.build();
for (Map.Entry<String, Long> gauge: gauges.entrySet()) {
BatchPoints batchPoints = BatchPoints.database(database).build();
for (Map.Entry<String, ? extends Number> gauge: gauges.entrySet()) {
batchPoints.point(constructPoint(time, gauge.getKey(), gauge.getValue()));
}
client.write(batchPoints);
}

/**
* InfluxDB has a rich query language and does not need the bounds metrics emitted by CPUProfiler
* InfluxDB has a rich query language and does not need the bounds metrics emitted by CPUTracingProfiler
* As such we can disable emitting these metrics
*
* @return false
Expand Down Expand Up @@ -106,7 +114,7 @@ protected void handleArguments(Arguments arguments) {
Preconditions.checkNotNull(database);
}

private Point constructPoint(long time, String key, long value) {
private Point constructPoint(long time, String key, Number value) {
Point.Builder builder = Point.measurement(key)
.time(time, TimeUnit.MILLISECONDS)
.field(VALUE_COLUMN, value);
Expand Down
9 changes: 7 additions & 2 deletions src/main/java/com/etsy/statsd/profiler/reporter/Reporter.java
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,21 @@ public Reporter(Arguments arguments) {
*/
public abstract void recordGaugeValue(String key, long value);

/**
* @see #recordGaugeValue(String, long)
*/
public abstract void recordGaugeValue(String key, double value);

/**
* Record multiple gauge values
* This is useful for reporters that can send points in batch
*
* @param gauges A map of gauge names to values
*/
public abstract void recordGaugeValues(Map<String, Long> gauges);
public abstract void recordGaugeValues(Map<String, ? extends Number> gauges);

/**
* CPUProfiler can emit some metrics that indicate the upper and lower bound on the length of stack traces
* CPUTracingProfiler can emit some metrics that indicate the upper and lower bound on the length of stack traces
* This is helpful for querying this data for some backends (such as Graphite) that do not have rich query languages
* Reporters can override this to disable these metrics
*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,30 @@ public void recordGaugeValue(String key, long value) {
client.recordGaugeValue(key, value);
}

/**
* @see #recordGaugeValue(String, long)
*/
@Override
public void recordGaugeValue(String key, double value) {
client.recordGaugeValue(key, value);
}

/**
* Record multiple gauge values in StatsD
* This simply loops over calling recordGaugeValue
*
* @param gauges A map of gauge names to values
*/
@Override
public void recordGaugeValues(Map<String, Long> gauges) {
for (Map.Entry<String, Long> gauge : gauges.entrySet()) {
recordGaugeValue(gauge.getKey(), gauge.getValue());
public void recordGaugeValues(Map<String, ? extends Number> gauges) {
for (Map.Entry<String, ? extends Number> gauge : gauges.entrySet()) {
if (gauge.getValue() instanceof Long) {
client.recordGaugeValue(gauge.getKey(), gauge.getValue().longValue());
} else if (gauge.getValue() instanceof Double) {
client.recordGaugeValue(gauge.getKey(), gauge.getValue().doubleValue());
} else {
throw new IllegalArgumentException("Unexpected Number type: " + gauge.getValue().getClass().getSimpleName());
}
}
}

Expand Down
6 changes: 3 additions & 3 deletions src/main/java/com/etsy/statsd/profiler/util/CPUTraces.java
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* @author Andrew Johnson
*/
public class CPUTraces {
private Map<String, Long> traces;
private Map<String, Number> traces;
private int max = Integer.MIN_VALUE;
private int min = Integer.MAX_VALUE;

Expand All @@ -33,8 +33,8 @@ public void increment(String traceKey, long inc) {
* It only returns traces that have been updated since the last flush
*
*/
public Map<String, Long> getDataToFlush() {
Map<String, Long> result = traces;
public Map<String, Number> getDataToFlush() {
Map<String, Number> result = traces;
traces = new HashMap<>();
return result;
}
Expand Down
Loading

0 comments on commit 72401f3

Please sign in to comment.