-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Geom.point #1258
base: master
Are you sure you want to change the base?
Faster Geom.point #1258
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1258 +/- ##
==========================================
+ Coverage 90.33% 90.35% +0.02%
==========================================
Files 36 36
Lines 3943 3952 +9
==========================================
+ Hits 3562 3571 +9
Misses 381 381
Continue to review full report at Codecov.
|
more performance is certainly great, particular of several fold, however this PR messes up zoom in SVGJS. if the zoom box is wide and short, for example, the circles become ellipses. moreover, even for square zoom boxes the circles don't stay the same size, they get bigger. master branch: this PR: the problem is that you dropped i'd suggest saving the SVG to a file, and modifying it directly to see where you can insert the marker class to make it behave properly. this is the relevant code on master:
and in this PR:
dynamic javascript is one of Gadfly's distinguishing features, so it's very important to preserve this capability i feel. also just noticed that two copies of Gadfly/src/gadfly.js are being included in SVGJS files, as well as two copies of Compose/deps/snap.svg-min.js. this is on master branch so nothing to do with this PR. |
The latest update keeps the SVGJS functionality, and is faster than master (SVG: 2.5x, PNG: 5x, for the |
I ran the regression tests. I get differences in plots that have overlapping points (e.g. point_size_numerical.jl), because the order that the points are plotted in is not necessarily the same for "multiple points in a single context" vs. "single points in multiple contexts". But the plot is otherwise exactly the same, so this shouldn't be of any concern. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much better! just a couple comments.
theme.continuous_highlight_color(color) : | ||
theme.discrete_highlight_color(color) | ||
class = svg_color_class_from_label(aes.color_label([color])[1]) | ||
if length(groups)==1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be length(ug)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember thinking about this (and perhaps even testing it) but there was a reason I didn't use length(ug)
here, and maybe it was because of my point below (length(groups)==0)
, so using groups
leads to simpler logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But unique
should still work on zero-length arrays?
julia> unique([])
0-element Array{Any,1}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example:
p1 = plot(x=rand(1000), y=rand(1000), color=[RGB(1,0,0)], Geom.point)
p2 = plot(x=rand(1000), y=rand(1000), color=fill("a",1000), Geom.point,
Scale.color_discrete_manual("red"))
For length(groups)
, these 2 plots are different: p1
optimises speed (length(groups)==1
), p2
is slower but has full SVGJS functionality (length(groups)>1
). Using length(ug)
here would mean length(ug)==1
for both p1
and p2
, thus not giving users the above choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am i correct in thinking p2
has a single extra call to context
? if so, is that really noticeably slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some stats:
gp, theme = Geom.point(), Theme()
aes = Gadfly.Aesthetics()
aes.x, aes.y, aes.color = rand(1000), rand(1000), [RGB(1,0,0)]
@benchmark render($gp, $theme, $aes ) samples=10
BenchmarkTools.Trial:
memory estimate: 95.69 KiB
allocs estimate: 3129
--------------
minimum time: 128.000 μs (0.00% GC)
median time: 144.000 μs (0.00% GC)
mean time: 153.692 μs (0.00% GC)
maximum time: 246.724 μs (0.00% GC)
--------------
aes.color = fill(RGB(1,0,0), 1000)
@benchmark render($gp, $theme, $aes ) samples=10
BenchmarkTools.Trial:
memory estimate: 466.22 KiB
allocs estimate: 13195
--------------
minimum time: 2.078 ms (0.00% GC)
median time: 2.094 ms (0.00% GC)
mean time: 2.122 ms (0.00% GC)
maximum time: 2.245 ms (0.00% GC)
--------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And compared to master. The timing is similar for both cases, so I only include the first case:
aes.x, aes.y, aes.color = rand(1000), rand(1000), [RGB(1,0,0)]
@benchmark render($gp, $theme, $aes ) samples=10
BenchmarkTools.Trial:
memory estimate: 4.69 MiB
allocs estimate: 107096
--------------
minimum time: 6.578 ms (0.00% GC)
median time: 6.917 ms (0.00% GC)
mean time: 13.487 ms (49.08% GC)
maximum time: 60.599 ms (88.65% GC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting. must be the view
s then that slow p2
down.
my question was motivated by a desire for full SVGJS support in all cases. the following change gets that at a cost of only 15% in performance according to my benchmarking just now, a small price in my mind:
diff --git a/src/geom/point.jl b/src/geom/point.jl
index 283edc32..51328d78 100644
--- a/src/geom/point.jl
+++ b/src/geom/point.jl
@@ -74,9 +74,10 @@ function render(geom::PointGeometry, theme::Gadfly.Theme, aes::Gadfly.Aesthetics
if length(groups)==1
color, size, shape, alpha = groups[1]
- compose!(ctx, (context(),
+ class = svg_color_class_from_label(aes.color_label([color])[1])
+ compose!(ctx, (context(), (context(),
shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha),
- svgclass("marker")))
+ svgclass("marker")), svgclass(class)))
elseif length(groups)>1
for g in ug
i = findall(x->isequal(x, g), groups)
you might also consider putting ug = unique(groups)
in the elseif
clause as that's the only place it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of where you would want to turn off the points in a "single-color" plot? And if you wanted to do that, why not just use case 2 above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Geom.point combined with any other Geom of a different color (density2d, ellipse, abline you name it). i don't think a small decrement in performance outweighs forcing the user to switch to a different grammer.
svgclass(class))) | ||
shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha), | ||
svgclass("marker"))) | ||
elseif length(groups)>1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't this just be an else
clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because length(groups)==0
is possible, e.g. in a Geom.subplot_grid
plot that has a panel with no points (example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still don't understand why we need the nearly identical yet separate code blocks in the if
and elseif
clauses here. couldn't the entire if
block be deleted and the elseif
conditional changed to if length(ug)>0
? much simpler, equally fast, and the corner case is covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, if i understand things correctly, the elseif
conditional could even be removed, leaving just the code block within. if length(groups)==0
then there is no g
in ug
and so the for
loop would just be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, an elseif
conditional is necessary here because if length(aes.x) ≠ length(groups)
, then the code will error later on view(aes.x, i)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. now i understand.
fill(color), stroke(strokecolor), fillopacity(alpha), | ||
svgclass(class))) | ||
shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha), | ||
svgclass("marker"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now svgclass(class)
is left out. in the case where there is only one color, but the user adds a key, one would not be able to toggle the visibility of the points dynamically with SVGJS. admittedly a corner case, but the fix is easy: wrap the whole thing in one more context which adds svgclass(class)
much as you've done below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we should be favoring "corner cases" over speed.
This is ready to merge, if there are no more comments. |
size_min, size_max = extrema(aes.size) | ||
size_range = size_max - size_min | ||
point_size_range = theme.point_size_max - theme.point_size_min | ||
interpolate_size(x) = theme.point_size_min + (x-size_min) / size_range * point_size_range |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was even the point of creating this function originally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scale.size_continuous
should handle the mapping from the size
aesthetic values to 0-1 (proportion), as for the e.g. alpha
aesthetic, so the scaling for size
wasn't properly implemented in grammar of graphics style (unfortunately). This would be good to implement when I do Guide.sizekey
.
@bjarthur have you tried building the Gadfly docs on master using Documenter 0.22.2? I'm hitting an error which originates from: Reverting to Documenter 0.21.5 works. |
if i activate the Project.toml in Gadfly/docs and then Pkg update, Documenter doesn't go past 0.20.0. haven't figured out what's holding it back yet, but the docs build with that version and julia 1.1. |
is this ready to merge? |
Not yet. |
NEWS.md
This PR:
Geom.point
For this example (with 53940 points):
Compared to master:
time to second plot: 2.7x faster than master for SVG,
and 6x faster than master for PNG.