Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added inspect_points datashader operation #4794

Merged
merged 42 commits into from
Jan 27, 2021
Merged

Conversation

jlstevens
Copy link
Contributor

@jlstevens jlstevens commented Jan 19, 2021

WIP.

Here is the example code

import numpy as np
from spatialpandas.geometry import PointArray
from spatialpandas import GeoDataFrame
from holoviews.operation.datashader import rasterize, dynspread, inspect_sample
import holoviews as hv
hv.extension('bokeh')

def gaussian_df(specs=(1.5,0,1.0),num=10000):
    np.random.seed(1)
    xs, ys = np.random.normal(specs[0],specs[2],num), np.random.normal(specs[1],specs[2],num)
    return xs,ys

xs, ys = gaussian_df()
sdf = GeoDataFrame({'geometry':PointArray((xs, ys)),'id':list(range(len(xs)))})
rasterized = dynspread(rasterize(hv.Points(sdf)))
rasterized * inspect_sample(rasterized).opts(color='red', tools=['hover'], size=6)

github1

Getting tap behavior is as easy as adding streams=[hv.streams.Tap] to the operation:

github1

TODO

  • Make sure vdims of original data are preserved for hover information
  • Find out why the initial render looks blank until a small zoom in/out occurs
  • Handle normal dataframes (getting the x/y from the raster kdims), geopandas, and spatially indexed spatialpandas dataframes.
  • Tests
  • Documentation

@jlstevens
Copy link
Contributor Author

I might prefer inspect_point in the end as Points is the return type and it is only suited to datashaded points right now. You can imagine fancier things in future e.g inspect_segments, inspect_paths, inspect_spikes, inspect_polygons etc etc.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far!

holoviews/operation/datashader.py Outdated Show resolved Hide resolved
@jbednar
Copy link
Member

jbednar commented Jan 19, 2021

Can you maybe make the default marker shape an o (open circle) so that the marker doesn't cover up the datashaded data point?

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 19, 2021

Operations shouldn't set those kinds of options. It is easy enough to add fill_alpha=0. e.g:

An X marker is another good choice:

Would be happy to document these as examples of course.

@jlstevens jlstevens changed the title Added inspect_sample datashader operation Added inspect_points datashader operation Jan 19, 2021
@jlstevens
Copy link
Contributor Author

Here is an example of how I expect the drilldown table to work:

drilldown

@jlstevens
Copy link
Contributor Author

Added point_count parameter so this operation can live up to the name inspect_points (i.e plural). Here is an example with (a maximum of) ten points shown within an expanded mask size of 50:

point_count

@jlstevens
Copy link
Contributor Author

I think this PR will probably need some of the fixes planned in #4792

@jlstevens
Copy link
Contributor Author

One comment Philipp made in #4796 (now closed) is that this operation should handle RGBs as well as RGBAs. I suggest raising an exception unless we want to declare a specific color as a null value.

@jbednar
Copy link
Member

jbednar commented Jan 21, 2021

I suggest raising an exception unless we want to declare a specific color as a null value.

Is it essential to detect a null value, or just helpful? If it's not essential, I'd just print a helpful warning and then just skip checking for nulls, unless that makes it significantly more complex.

@jlstevens
Copy link
Contributor Author

Is it essential to detect a null value, or just helpful?

It is not essential but if you can't detect null you can't optimize when inspecting sparse data. A warning seems fine.

@jlstevens
Copy link
Contributor Author

To make sure this isn't forgotten, I realized one bug we have: the null test should be over the mask area and not just a point test (i.e use the pixels parameter).

We can also have an optimization where we pick the first sample we find falling within a single pixel (assuming anything after that doesn't matter).

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 22, 2021

Other things I would like to attempt before merging:

  • Support normal dataframes without a .cx method using normal slicing, issuing a warning recommending spatialpandas/geopandas. Maybe this warning could be disabled by a parameter if people really want to avoid those dependencies.
  • Output a Polygon in the shape of the mask instead of the point. Seeing the actual search area seems really useful and would make things like drilldown histograms more useful.

Other than that (and any unaddressed items above), I am very happy with this new operation! At a later date we can consider inspect_segments, inspect_paths, inspect_polygons etc..

Edit: Glancing at the code the plain dataframe support looks like it might already be there. The idea of a warning remains though...

@jlstevens
Copy link
Contributor Author

Added an inspect_mask operation to show the mask that can be used as follows:

I think it could be improved further: ideally inspector.mask(rasterized) could be simply inspector.mask but then the processed element would need to be linked between the two operations.

@jlstevens
Copy link
Contributor Author

@jbednar @philippjfr Ready for review. Hoping for some feedback on the API (e.g parameter names, the mask operation, arrangement of the transformers. Once things settle, I'll add unit tests. Happy to add docs in this PR but can also open a new PR if we want the code merged sooner.

Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very useful. I find the names transformer, hits_transformer, and points_transformer confusing (and verbose), though. E.g. doesn't transformer also transform hits, just as much as hits_transformer does? Maybematch_transform, matches_to_points, matches_to_hits?

It would be nice if this operation can be passed Points as an argument instead of being specialized for points, but I can't see how that could work.

Will it be able to e.g. select an entire connected trajectory, given one point on that trajectory?

holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Outdated Show resolved Hide resolved
Maximum number of points to display within the mask of size
pixels. Points are prioritized by distance from the cursor
point. This means that the default value of one shows the single
closest sample to the cursor.""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not sure that's well defined; seems like all datapoints falling into the cursor's pixel are equivalent, and they don't need to be the closest (e.g. to the cursor pixel's center) to be appropriate; I'd just take any of them as there's no particular order that can be established on them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just take any of them as there's no particular order that can be established on them.

That is a possible optimization and not true in all cases.

If the datashading is occurring at screen resolution and bokeh's hover/tap is at the same resolution then it is true (well, the above is perfectly accurate but limited by the precision supplied by Bokeh), but it is very easy to ask rasterize or datashade to output rasters where that is not true at all (i.e bins much bigger than screen pixels by specifying custom x_sampling and y_sampling).

As currently defined, this inspector will work correctly for all datashaded output and this docstring is correct. For a lot of common datashader output where things are configured to automatically make datashaded bins match screen pixels, then your suggested change would work because Bokeh isn't giving anything past screen pixel precision (and even this might not be the case, for all I know a mouse pointer can specify an x,y position that is finer than a pixel! What happens if you stick your screen into 640x480 resolution?).

Because of all this, I think the docstring is well defined and correct even if the precision with which the cursor's position is specified is limited.

holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Outdated Show resolved Hide resolved
holoviews/operation/datashader.py Show resolved Hide resolved
@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 25, 2021

I've addressed quite a few of your suggestions, thanks! The remaining items need a bit more discussion though.

Will it be able to e.g. select an entire connected trajectory, given one point on that trajectory?

No. This operation is explicitly inspect_points and I've mentioned earlier that there could be inspect_paths, inspect_segments etc. What you suggest will be possible when we flesh out this suite of operations.

Maybe match_transform, matches_to_points, matches_to_hits?

Happy to use these though I don't find them particularly compelling either. Maybe masked_transform, masked_to_points and masked_to_hits? I don't see enough of a difference between the semantics of 'matches' and 'hits' otherwise. @philippjfr any other suggestions? What do you prefer?

@philippjfr
Copy link
Member

Could then provide a generic inspect operation which wraps around element specific ones much like rasterize does.

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 25, 2021

Could then provide a generic inspect operation which wraps around element specific ones much like rasterize does.

I like this but how would inspect know what kind of thing was datashaded/rasterized? Or in other words, what kind of element was turned into a raster? I think this might be possible by inspecting .pipeline but not via .dataset...what would you suggest?

setup.py Outdated Show resolved Hide resolved
@jlstevens
Copy link
Contributor Author

The dask pinning is now in #4803 and we can decide what to do there. Tests on this PR will be failing unless we reintroduce the pin, fix the underlying issue (e.g update the tests?) or rebase after that PR is merged to master.

@jlstevens
Copy link
Contributor Author

Regarding naming, here are some generic (so that we have support inspect_polygons) semantic names we want to improve:

common_dataframe_transform: Suggestions: transform_element, match_transform
output_element_data_transform: Suggestions transform, matches_to_points, mask_to_points
output_dataframe_parameter_transform: Suggestions transform_hits, matches_to_hits, mask_to_hits

@jlstevens
Copy link
Contributor Author

jlstevens commented Jan 27, 2021

I've rebased off master which now has the dask testing fixed. Before I can write tests, it looks like dynamic=False needs fixed:

@jlstevens
Copy link
Contributor Author

Ok, dynamic=False works if you apply it to both operations:

@jlstevens
Copy link
Contributor Author

@jbednar @philippjfr All tests are passing and I am happy with the API. There are a few things left to do but I think we can merge now and cut a dev release:

  • Unit tests for the polygon inspector.
  • Documentation

@jlstevens
Copy link
Contributor Author

Going ahead and merging.

@jlstevens jlstevens merged commit 4cd9fb1 into master Jan 27, 2021
@philippjfr philippjfr deleted the highlight_operation branch April 25, 2022 14:37
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants