-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
177 lines (147 loc) · 6.23 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include=FALSE}
ragg_png <- function(..., res = 192) {
ragg::agg_png(..., res = res, units = "in")
}
knitr::opts_chunk$set(
collapse=TRUE,
warning=FALSE,
message=FALSE,
comment="#>",
fig.path="man/figures/README-"
);
```
# jamma
The goal of jamma is to create MA-plots with several useful and powerful
capabilities that are intended to provide a more thorough understanding
of the data.
The main function provided is `jammaplot()`. It is distinct from similar
MA-plot functions in that it uses smooth scatter by default, and in
fact inspired the creation of a custom smooth scatter function provided
by `jamba::plotSmoothScatter()`.
## Package Reference
A full online function reference is available via the pkgdown
documentation:
[Full jamma command reference](https://jmw86069.github.io/jamma)
### Example MA-plot
A reasonable example MA-plot can be created using data from the `affydata`
package, if installed.
```{r Dilution, results="hide", fig.height=8, fig.width=8}
library(jamma);
library(jamba);
if (suppressPackageStartupMessages(require(affydata))) {
data(Dilution);
edata <- log2(1+Biobase::exprs(Dilution));
jammaplot(edata);
}
```
### What is a smooth scatter plot, and why is it important for MA-plots?
MA-plots are typically created for gene expression data, historically
used for microarray data, which contains tens of thousands of rows.
Most MA-plot tools combat the number of points either by displaying
single pixel points (pch="." in R base plotting), or adding transparency.
A secondary issue is that these plots take a while to render when drawing
individual points. This effect is amplified when running on a remote server,
since each individual point is transmitted over the network for rendering.
Also when saving a figure, certain file types save each
point as an object, making the file size surprisingly large. If the file
is printed to paper (ha!) the printer can take a long time to prepare the
image for printing. And the volume of data is not currently getting smaller
with new technologies.
First, we show the same MA-plot using single pixel points:
```{r pch1, results="hide", fig.height=4, fig.width=8, dependson="Dilution"}
if (exists("edata")) {
jammaplot(edata[,2:3], ylim=c(-1.5,1.5), titleCexFactor=0.8,
smoothScatterFunc=function(x, col="navy", ...){plot(x=x, pch=".",col="#000077",...)},
maintitle="plot(pch='.')");
}
```
The overall range of points is clearly shown, but the density of points is
not clear from that plot. Adding alpha transparency helps somewhat:
```{r pchAlpha, results="hide", fig.height=4, fig.width=8}
if (exists("edata")) {
jammaplot(edata[,2:3], ylim=c(-1.5,1.5), titleCexFactor=0.8,
smoothScatterFunc=function(x, col="navy", ...){plot(x=x, pch=".",col="#00007711",...)},
maintitle="plot(pch='.', alpha=0.07)");
}
```
The transparency helps visualize the massive number of points in the middle,
but now has made all the fun outlier points almost invisible. The typical
next step in R is to use smoothScatter(), shown below using its default
color ramp:
```{r Smooth, results="hide", fig.height=4, fig.width=8}
if (exists("edata")) {
jammaplot(edata[,2:3],
xlim=c(6, 14),
ylim=c(-1.5,1.5),
titleCexFactor=0.8,
smoothScatterFunc=function(colramp,...){
smoothScatter(...,colramp=jamba::getColorRamp(colramp, n=NULL))},
colramp="Blues",
maintitle="smoothScatter()");
}
```
Again, the visualization is improved, but the default "Blues" color ramp
(credit Brewer colors from RColorBrewer) could perhaps be improved.
```{r Smoove, results="hide", fig.height=4, fig.width=8}
if (exists("edata")) {
jammaplot(edata[,2:3],
xlim=c(6, 14),
ylim=c(-1.5,1.5),
titleCexFactor=0.8,
smoothScatterFunc=function(colramp,...){
smoothScatter(...,colramp=jamba::getColorRamp(colramp, n=NULL))},
maintitle="smoothScatter()");
}
```
Now the figure depicts the full range of data, while also conveying the
truly massive number of points in the central region. Only two smaller issues
remain.
First, not visible here, the underyling data is plotted using tiny
rectangles. For the reasons described above, a large number of rectangles
can be problematic when saving as a vector image (PDF, SVG), when printing
on paper, or when rendering the figure across a remote network connection.
The solution is to use a rasterized image, instead of individual rectangles,
which can be compressed and resized.
Second, the pixel size used for the point density is flattened horizontally,
because the default density function uses the range of data, and not the
plot visible range. When the density function is applied to plot coordinates,
there is often some distortion. Visually small effect, but when there are
20 panels onscreen, the inconsistency becomes much more obvious.
The plotSmoothScatter function resolves both the issues described, with some
enhancements. It uses a density function based upon plot space, but also adds
detail, so smaller features are less blurry.
```{r plotSmoove, results="hide", fig.height=4, fig.width=8}
if (exists("edata")) {
jammaplot(edata[,2:3],
ylim=c(-1.5,1.5),
titleCexFactor=0.8,
maintitle="plotSmoothScatter()");
}
```
It looks like a small effect here, but the density around single points is
now circular. When rendering a density map of plotted data points, it should
represent the true density of points as accurately as possible.
To demonstrate some other color effects, the plotSmoothScatter function also
fills the complete plot panel with the correct background color, which is not
done by smoothScatter().
```{r SmoothViridis, results="hide", fig.height=8, fig.width=8}
if (exists("edata")) {
par("mfrow"=c(2,2));
jammaplot(edata[,2:3],
xlim=c(6, 14),
ylim=c(-2,2),
titleCexFactor=0.8,
colramp="viridis",
doPar=FALSE,
smoothScatterFunc=function(colramp,...){
smoothScatter(...,colramp=jamba::getColorRamp(colramp, n=NULL))},
maintitle="smoothScatter(colramp='viridis')");
jammaplot(edata[,2:3], ylim=c(-2,2), titleCexFactor=0.8,
colramp="viridis", doPar=FALSE,
maintitle="plotSmoothScatter(colramp='viridis')");
}
```