Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Geom.point #1258

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ This is a log of major changes in Gadfly between releases. It is not exhaustive.
Each release typically has a number of minor bug fixes beyond what is listed here.

# Version 1.1.0
* Speed up `Geom.point` (#1258)
* Add `alpha` aesthetic, `Scale.alpha_continuous` and `Scale.alpha_discrete` (#1252)
* Add `limits=(min= , max= )` to `Stat.histogram` (#1249)
* Add dodged boxplots (#1246)
Expand Down
43 changes: 27 additions & 16 deletions src/geom/point.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,34 +47,45 @@ function render(geom::PointGeometry, theme::Gadfly.Theme, aes::Gadfly.Aesthetics
Gadfly.assert_aesthetics_equal_length("Geom.point", aes, :x, :y)

default_aes = Gadfly.Aesthetics()
default_aes.shape = Function[Shape.circle]
default_aes.color = discretize_make_ia(RGBA{Float32}[theme.default_color])
default_aes.shape = Function[theme.point_shapes[1]]
default_aes.color = RGBA{Float32}[theme.default_color]
default_aes.size = Measure[theme.point_size]
default_aes.alpha = [theme.alphas[1]]
default_aes.alpha = Float64[theme.alphas[1]]
aes = inherit(aes, default_aes)

if eltype(aes.size) <: Int
aes_size = if eltype(aes.size) <: Int
size_min, size_max = extrema(aes.size)
size_range = size_max - size_min
point_size_range = theme.point_size_max - theme.point_size_min
interpolate_size(x) = theme.point_size_min + (x-size_min) / size_range * point_size_range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was even the point of creating this function originally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scale.size_continuous should handle the mapping from the size aesthetic values to 0-1 (proportion), as for the e.g. alpha aesthetic, so the scaling for size wasn't properly implemented in grammar of graphics style (unfortunately). This would be good to implement when I do Guide.sizekey.

theme.point_size_min .+ ((aes.size .- size_min) ./ size_range .* point_size_range)
else
aes.size
end

aes_alpha = eltype(aes.alpha) <: Int ? theme.alphas[aes.alpha] : aes.alpha

aes_shape = eltype(aes.shape) <: Function ? aes.shape : theme.point_shapes[aes.shape]
strokef = aes.color_key_continuous != nothing && aes.color_key_continuous ?
theme.continuous_highlight_color : theme.discrete_highlight_color

CT, ST, SZT, AT = eltype(aes.color), eltype(aes_shape), eltype(aes_size), eltype(aes_alpha)
groups = collect(Tuple{CT, SZT, ST, AT}, Compose.cyclezip(aes.color, aes_size, aes_shape, aes_alpha))
ug = unique(groups)
ctx = context()

for (x, y, color, size, shape, alpha) in Compose.cyclezip(aes.x, aes.y, aes.color, aes.size, aes.shape, aes_alpha)
shapefun = typeof(shape) <: Function ? shape : theme.point_shapes[shape]
sizeval = typeof(size) <: Int ? interpolate_size(size) : size
strokecolor = aes.color_key_continuous != nothing && aes.color_key_continuous ?
theme.continuous_highlight_color(color) :
theme.discrete_highlight_color(color)
class = svg_color_class_from_label(aes.color_label([color])[1])
if length(groups)==1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be length(ug) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember thinking about this (and perhaps even testing it) but there was a reason I didn't use length(ug) here, and maybe it was because of my point below (length(groups)==0), so using groups leads to simpler logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But unique should still work on zero-length arrays?

julia> unique([])
0-element Array{Any,1}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example:

p1 = plot(x=rand(1000), y=rand(1000), color=[RGB(1,0,0)], Geom.point)
p2 = plot(x=rand(1000), y=rand(1000), color=fill("a",1000), Geom.point, 
    Scale.color_discrete_manual("red"))

For length(groups), these 2 plots are different: p1 optimises speed (length(groups)==1), p2 is slower but has full SVGJS functionality (length(groups)>1). Using length(ug) here would mean length(ug)==1 for both p1 and p2, thus not giving users the above choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i correct in thinking p2 has a single extra call to context? if so, is that really noticeably slower?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some stats:

gp, theme = Geom.point(), Theme()

aes = Gadfly.Aesthetics()
aes.x, aes.y, aes.color = rand(1000), rand(1000), [RGB(1,0,0)]
@benchmark render($gp, $theme, $aes ) samples=10

BenchmarkTools.Trial: 
  memory estimate:  95.69 KiB
  allocs estimate:  3129
  --------------
  minimum time:     128.000 μs (0.00% GC)
  median time:      144.000 μs (0.00% GC)
  mean time:        153.692 μs (0.00% GC)
  maximum time:     246.724 μs (0.00% GC)
  --------------

aes.color = fill(RGB(1,0,0), 1000)
@benchmark render($gp, $theme, $aes ) samples=10

BenchmarkTools.Trial: 
  memory estimate:  466.22 KiB
  allocs estimate:  13195
  --------------
  minimum time:     2.078 ms (0.00% GC)
  median time:      2.094 ms (0.00% GC)
  mean time:        2.122 ms (0.00% GC)
  maximum time:     2.245 ms (0.00% GC)
  --------------

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And compared to master. The timing is similar for both cases, so I only include the first case:

aes.x, aes.y, aes.color = rand(1000), rand(1000), [RGB(1,0,0)]
@benchmark render($gp, $theme, $aes ) samples=10

BenchmarkTools.Trial: 
  memory estimate:  4.69 MiB
  allocs estimate:  107096
  --------------
  minimum time:     6.578 ms (0.00% GC)
  median time:      6.917 ms (0.00% GC)
  mean time:        13.487 ms (49.08% GC)
  maximum time:     60.599 ms (88.65% GC)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. must be the views then that slow p2 down.

my question was motivated by a desire for full SVGJS support in all cases. the following change gets that at a cost of only 15% in performance according to my benchmarking just now, a small price in my mind:

diff --git a/src/geom/point.jl b/src/geom/point.jl
index 283edc32..51328d78 100644
--- a/src/geom/point.jl
+++ b/src/geom/point.jl
@@ -74,9 +74,10 @@ function render(geom::PointGeometry, theme::Gadfly.Theme, aes::Gadfly.Aesthetics
 
     if length(groups)==1
         color, size, shape, alpha = groups[1]
-        compose!(ctx, (context(),
+        class = svg_color_class_from_label(aes.color_label([color])[1])
+        compose!(ctx, (context(), (context(),
             shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha),
-            svgclass("marker")))
+            svgclass("marker")), svgclass(class)))
     elseif length(groups)>1
         for g in ug
             i = findall(x->isequal(x, g), groups)

you might also consider putting ug = unique(groups) in the elseif clause as that's the only place it is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of where you would want to turn off the points in a "single-color" plot? And if you wanted to do that, why not just use case 2 above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geom.point combined with any other Geom of a different color (density2d, ellipse, abline you name it). i don't think a small decrement in performance outweighs forcing the user to switch to a different grammer.

color, size, shape, alpha = groups[1]
compose!(ctx, (context(),
(context(), shapefun([x], [y], [sizeval]), svgclass("marker")),
fill(color), stroke(strokecolor), fillopacity(alpha),
svgclass(class)))
shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha),
svgclass("marker")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now svgclass(class) is left out. in the case where there is only one color, but the user adds a key, one would not be able to toggle the visibility of the points dynamically with SVGJS. admittedly a corner case, but the fix is easy: wrap the whole thing in one more context which adds svgclass(class) much as you've done below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should be favoring "corner cases" over speed.

elseif length(groups)>1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this just be an else clause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because length(groups)==0 is possible, e.g. in a Geom.subplot_grid plot that has a panel with no points (example).

Copy link
Member

@bjarthur bjarthur Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't understand why we need the nearly identical yet separate code blocks in the ifand elseif clauses here. couldn't the entire if block be deleted and the elseif conditional changed to if length(ug)>0? much simpler, equally fast, and the corner case is covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, if i understand things correctly, the elseif conditional could even be removed, leaving just the code block within. if length(groups)==0 then there is no g in ug and so the for loop would just be skipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, an elseif conditional is necessary here because if length(aes.x) ≠ length(groups), then the code will error later on view(aes.x, i).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. now i understand.

for g in ug
i = findall(x->isequal(x, g), groups)
color, size, shape, alpha = g
class = svg_color_class_from_label(aes.color_label([color])[1])
compose!(ctx, (context(),
(context(), shape(view(aes.x,i), view(aes.y,i), [size]), svgclass("marker")),
fill(color), stroke(strokef(color)), fillopacity(alpha), svgclass(class)))
end
end

compose!(ctx, linewidth(theme.highlight_width))
Expand Down