Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster Geom.point #1258

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

Mattriks
Copy link
Member

@Mattriks Mattriks commented Mar 10, 2019

  • I've added an entry to NEWS.md
  • I've run the regression tests
  • I've built the docs and confirmed these changes don't cause new errors

This PR:

  • Speeds up Geom.point

For this example (with 53940 points):

using RDatasets
Diamonds = dataset("ggplot2","diamonds") 
p = plot(Diamonds, x=:Carat, y=:Price, color=:Cut, Geom.point,
    Theme(discrete_highlight_color=c->nothing, alphas=[0.3])
)
@time draw(SVG(), p)  # to compile @time
@time draw(SVG(), p)
draw(PNG(), p)
@time draw(PNG(), p)

Compared to master:
time to second plot: 2.7x faster than master for SVG,
and 6x faster than master for PNG.

@codecov-io
Copy link

codecov-io commented Mar 10, 2019

Codecov Report

Merging #1258 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1258      +/-   ##
==========================================
+ Coverage   90.33%   90.35%   +0.02%     
==========================================
  Files          36       36              
  Lines        3943     3952       +9     
==========================================
+ Hits         3562     3571       +9     
  Misses        381      381
Impacted Files Coverage Δ
src/geom/point.jl 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9df68de...1b9b0ac. Read the comment docs.

@bjarthur
Copy link
Member

more performance is certainly great, particular of several fold, however this PR messes up zoom in SVGJS. if the zoom box is wide and short, for example, the circles become ellipses. moreover, even for square zoom boxes the circles don't stay the same size, they get bigger.

master branch:

Screen Shot 2019-03-10 at 8 38 07 AM

this PR:

Screen Shot 2019-03-10 at 8 37 50 AM

the problem is that you dropped svgclass("marker").

i'd suggest saving the SVG to a file, and modifying it directly to see where you can insert the marker class to make it behave properly.

this is the relevant code on master:

<g class="geometry" id="img-969b1255-234">
  <g stroke-width="0.3" id="img-969b1255-235">
    <g class="color_RGBA{Float32}(0.0f0,0.74736935f0,1.0f0,1.0f0)" fill-opacity="1" stroke="#FFFFFF" fill="#00BFFF" id="img-969b1255-236">
      <g class="marker" id="img-969b1255-237">
        <g transform="translate(134.42,7)" id="img-969b1255-238">
          <circle cx="0" cy="0" r="0.9" class="primitive"/>
        </g>    
      </g>    
    </g>    
    <g class="color_RGBA{Float32}(0.0f0,0.74736935f0,1.0f0,1.0f0)" fill-opacity="1" stroke="#FFFFFF" fill="#00BFFF" id="img-969b1255-239">
      <g class="marker" id="img-969b1255-240">
        <g transform="translate(76.62,47.66)" id="img-969b1255-241">
          <circle cx="0" cy="0" r="0.9" class="primitive"/>
        </g>    
      </g>    
    </g>    
    <g class="color_RGBA{Float32}(0.0f0,0.74736935f0,1.0f0,1.0f0)" fill-opacity="1" stroke="#FFFFFF" fill="#00BFFF" id="img-969b1255-242">
      <g class="marker" id="img-969b1255-243">
        <g transform="translate(18.83,88.33)" id="img-969b1255-244">
          <circle cx="0" cy="0" r="0.9" class="primitive"/>
        </g>    
      </g>    
    </g>    
  </g>    
</g>    

and in this PR:

<g class="geometry" id="img-969b1255-234">
  <g stroke-width="0.3" id="img-969b1255-235">
    <g class="color_RGBA{Float32}(0.0f0,0.74736935f0,1.0f0,1.0f0)" fill-opacity="1" stroke="#FFFFFF" fill="#00BFFF" id="img-969b1255-236">
      <g transform="translate(18.83,88.33)" id="img-969b1255-237">
        <circle cx="0" cy="0" r="0.9" class="primitive"/>
      </g>    
      <g transform="translate(76.62,47.66)" id="img-969b1255-238">
        <circle cx="0" cy="0" r="0.9" class="primitive"/>
      </g>    
      <g transform="translate(134.42,7)" id="img-969b1255-239">
        <circle cx="0" cy="0" r="0.9" class="primitive"/>
      </g>    
    </g>    
  </g>    
</g>    

dynamic javascript is one of Gadfly's distinguishing features, so it's very important to preserve this capability i feel.

also just noticed that two copies of Gadfly/src/gadfly.js are being included in SVGJS files, as well as two copies of Compose/deps/snap.svg-min.js. this is on master branch so nothing to do with this PR.

@Mattriks
Copy link
Member Author

Missed that - thanks! Geom.beeswarm and boxplot also use a "multiple points/single context" approach, and the svgclass("marker") is within the same context. Relevant lines here and here. I will test that.

@Mattriks
Copy link
Member Author

The latest update keeps the SVGJS functionality, and is faster than master (SVG: 2.5x, PNG: 5x, for the Diamonds example). Still need to do the regression tests.

@Mattriks
Copy link
Member Author

I ran the regression tests. I get differences in plots that have overlapping points (e.g. point_size_numerical.jl), because the order that the points are plotted in is not necessarily the same for "multiple points in a single context" vs. "single points in multiple contexts". But the plot is otherwise exactly the same, so this shouldn't be of any concern.

Copy link
Member

@bjarthur bjarthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better! just a couple comments.

theme.continuous_highlight_color(color) :
theme.discrete_highlight_color(color)
class = svg_color_class_from_label(aes.color_label([color])[1])
if length(groups)==1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be length(ug) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember thinking about this (and perhaps even testing it) but there was a reason I didn't use length(ug) here, and maybe it was because of my point below (length(groups)==0), so using groups leads to simpler logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But unique should still work on zero-length arrays?

julia> unique([])
0-element Array{Any,1}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example:

p1 = plot(x=rand(1000), y=rand(1000), color=[RGB(1,0,0)], Geom.point)
p2 = plot(x=rand(1000), y=rand(1000), color=fill("a",1000), Geom.point, 
    Scale.color_discrete_manual("red"))

For length(groups), these 2 plots are different: p1 optimises speed (length(groups)==1), p2 is slower but has full SVGJS functionality (length(groups)>1). Using length(ug) here would mean length(ug)==1 for both p1 and p2, thus not giving users the above choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am i correct in thinking p2 has a single extra call to context? if so, is that really noticeably slower?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some stats:

gp, theme = Geom.point(), Theme()

aes = Gadfly.Aesthetics()
aes.x, aes.y, aes.color = rand(1000), rand(1000), [RGB(1,0,0)]
@benchmark render($gp, $theme, $aes ) samples=10

BenchmarkTools.Trial: 
  memory estimate:  95.69 KiB
  allocs estimate:  3129
  --------------
  minimum time:     128.000 μs (0.00% GC)
  median time:      144.000 μs (0.00% GC)
  mean time:        153.692 μs (0.00% GC)
  maximum time:     246.724 μs (0.00% GC)
  --------------

aes.color = fill(RGB(1,0,0), 1000)
@benchmark render($gp, $theme, $aes ) samples=10

BenchmarkTools.Trial: 
  memory estimate:  466.22 KiB
  allocs estimate:  13195
  --------------
  minimum time:     2.078 ms (0.00% GC)
  median time:      2.094 ms (0.00% GC)
  mean time:        2.122 ms (0.00% GC)
  maximum time:     2.245 ms (0.00% GC)
  --------------

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And compared to master. The timing is similar for both cases, so I only include the first case:

aes.x, aes.y, aes.color = rand(1000), rand(1000), [RGB(1,0,0)]
@benchmark render($gp, $theme, $aes ) samples=10

BenchmarkTools.Trial: 
  memory estimate:  4.69 MiB
  allocs estimate:  107096
  --------------
  minimum time:     6.578 ms (0.00% GC)
  median time:      6.917 ms (0.00% GC)
  mean time:        13.487 ms (49.08% GC)
  maximum time:     60.599 ms (88.65% GC)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. must be the views then that slow p2 down.

my question was motivated by a desire for full SVGJS support in all cases. the following change gets that at a cost of only 15% in performance according to my benchmarking just now, a small price in my mind:

diff --git a/src/geom/point.jl b/src/geom/point.jl
index 283edc32..51328d78 100644
--- a/src/geom/point.jl
+++ b/src/geom/point.jl
@@ -74,9 +74,10 @@ function render(geom::PointGeometry, theme::Gadfly.Theme, aes::Gadfly.Aesthetics
 
     if length(groups)==1
         color, size, shape, alpha = groups[1]
-        compose!(ctx, (context(),
+        class = svg_color_class_from_label(aes.color_label([color])[1])
+        compose!(ctx, (context(), (context(),
             shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha),
-            svgclass("marker")))
+            svgclass("marker")), svgclass(class)))
     elseif length(groups)>1
         for g in ug
             i = findall(x->isequal(x, g), groups)

you might also consider putting ug = unique(groups) in the elseif clause as that's the only place it is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of where you would want to turn off the points in a "single-color" plot? And if you wanted to do that, why not just use case 2 above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Geom.point combined with any other Geom of a different color (density2d, ellipse, abline you name it). i don't think a small decrement in performance outweighs forcing the user to switch to a different grammer.

svgclass(class)))
shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha),
svgclass("marker")))
elseif length(groups)>1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this just be an else clause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because length(groups)==0 is possible, e.g. in a Geom.subplot_grid plot that has a panel with no points (example).

Copy link
Member

@bjarthur bjarthur Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't understand why we need the nearly identical yet separate code blocks in the ifand elseif clauses here. couldn't the entire if block be deleted and the elseif conditional changed to if length(ug)>0? much simpler, equally fast, and the corner case is covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, if i understand things correctly, the elseif conditional could even be removed, leaving just the code block within. if length(groups)==0 then there is no g in ug and so the for loop would just be skipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, an elseif conditional is necessary here because if length(aes.x) ≠ length(groups), then the code will error later on view(aes.x, i).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. now i understand.

fill(color), stroke(strokecolor), fillopacity(alpha),
svgclass(class)))
shape(aes.x, aes.y, [size]), fill(color), stroke(strokef(color)), fillopacity(alpha),
svgclass("marker")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now svgclass(class) is left out. in the case where there is only one color, but the user adds a key, one would not be able to toggle the visibility of the points dynamically with SVGJS. admittedly a corner case, but the fix is easy: wrap the whole thing in one more context which adds svgclass(class) much as you've done below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should be favoring "corner cases" over speed.

@Mattriks
Copy link
Member Author

This is ready to merge, if there are no more comments.

size_min, size_max = extrema(aes.size)
size_range = size_max - size_min
point_size_range = theme.point_size_max - theme.point_size_min
interpolate_size(x) = theme.point_size_min + (x-size_min) / size_range * point_size_range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was even the point of creating this function originally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scale.size_continuous should handle the mapping from the size aesthetic values to 0-1 (proportion), as for the e.g. alpha aesthetic, so the scaling for size wasn't properly implemented in grammar of graphics style (unfortunately). This would be good to implement when I do Guide.sizekey.

@Mattriks
Copy link
Member Author

Mattriks commented Apr 8, 2019

@bjarthur have you tried building the Gadfly docs on master using Documenter 0.22.2? I'm hitting an error which originates from: display_dict(::Context) at .julia\packages\Documenter\8vUAI\src\Utilities\Utilities.jl:651

Reverting to Documenter 0.21.5 works.

@bjarthur
Copy link
Member

bjarthur commented Apr 9, 2019

if i activate the Project.toml in Gadfly/docs and then Pkg update, Documenter doesn't go past 0.20.0. haven't figured out what's holding it back yet, but the docs build with that version and julia 1.1.

@bjarthur
Copy link
Member

is this ready to merge?

@Mattriks
Copy link
Member Author

Not yet.

@Mattriks Mattriks mentioned this pull request Apr 19, 2019
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants