-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post-order element traversal #80
Comments
I may be out of depth here, but the data in the R-Tree is only at the leaf-nodes right? So what does a post-order traversal mean for a R-Tree? Also, does locate_with_selection_function help? It offers to selectively open each intermediate bounding box based on a custom predicate. |
I think the idea is that we begin with unions of the smallest subsets (the deepest leaf nodes?) before moving up, and it's this that requires the post-order (reverse post-order, so moving from right-to-left would also work I think?); unless I'm misunderstanding that's where the efficiency comes from: the closer you get to the root, the less work there is to do. I'm not sure how the locate-with-selection-function would help here, but I've never used it and I may well be being dense about that. |
Got it. The issue is that the R-Tree itself has no user-data available at the intermediate nodes, so all the iterations have no concept of post-order. I suppose you want to be able to do a map-reduce op on the R-Tree, where each intermediate (aka Here is one suggestion (just thinking aloud): we add a function to the trait SelectionFunction<...> {
fn finished_unpacking_parent(...) {}
} This can(?) then be used to drive a post-order / map-reduce op. Unfortunately, the |
This is my understanding too, yes.
Don't we still have the problem of not knowing where to start, though? |
The I would support introducing a new trait to support this type of map-reduce-fold if we could brain-storm a little bit. There are a couple of diff. use-cases here:
Now, unfortunately the naive trait defn. I could think of to handle the above seems to involve many intermediate |
So something like fn bottom_up_fold_reduce<T, S, I, F, R>(tree: &RTree<T>, mut init: I, mut fold: F, mut reduce: R) -> S
where
T: RTreeObject,
I: FnMut() -> S,
F: FnMut(S, &T) -> S,
R: FnMut(S, S) -> S,
{
fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &mut I, fold: &mut F, reduce: &mut R) -> S
where
T: RTreeObject,
I: FnMut() -> S,
F: FnMut(S, &T) -> S,
R: FnMut(S, S) -> S,
{
parent
.children()
.iter()
.fold(init(), |accum, child| match child {
RTreeNode::Leaf(value) => fold(accum, value),
RTreeNode::Parent(parent) => {
let value = inner(parent, init, fold, reduce);
reduce(accum, value)
}
})
}
inner(tree.root(), &mut init, &mut fold, &mut reduce)
} ? |
I think what is also nice about this way of doing things is that it has a straight-forward parallel version using Rayon which could come in handy for large data sets: fn parallel_bottom_up_fold_reduce<T, S, I, F, R>(tree: &RTree<T>, init: I, fold: F, reduce: R) -> S
where
T: RTreeObject,
RTreeNode<T>: Send + Sync,
S: Send,
I: Fn() -> S + Send + Sync,
F: Fn(S, &T) -> S + Send + Sync,
R: Fn(S, S) -> S + Send + Sync,
{
fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &I, fold: &F, reduce: &R) -> S
where
T: RTreeObject,
RTreeNode<T>: Send + Sync,
S: Send,
I: Fn() -> S + Send + Sync,
F: Fn(S, &T) -> S + Send + Sync,
R: Fn(S, S) -> S + Send + Sync,
{
parent
.children()
.into_par_iter()
.fold(init, |accum, child| match child {
RTreeNode::Leaf(value) => fold(accum, value),
RTreeNode::Parent(parent) => {
let value = inner(parent, init, fold, reduce);
reduce(accum, value)
}
})
.reduce(init, reduce)
}
inner(tree.root(), &init, &fold, &reduce)
} |
Thanks for fleshing out the details @adamreichold ! Looks quite on track; like the easy rayon support. Just a few minor comments / points to discuss:
|
👍
Both std and rayon seem to pass the accumulator by value. I think, if it was really expensive to move and not optimized properly, one would box it?
I do not know the details of those auxiliary data structures, but maybe moving them into the closures would help? (At least in the serial case. Not sure whether thread-local storage is worth that in the parallel case.) Maybe the above could also accommodate this differently: In any case, I think the main take away could also be that we do not really need to make these decisions in a general context as the traversal/reduction can be written against the existing rstar API (and I don't see any obvious performance gains from having access to its internals) so when geo implements cascaded unions, it can use code like the above but tailored to its specific use case. Or would you rather say that this is important/complex enough to be exposed by rstar itself? Should it then also optionally provide the parallel variant? |
Adding a vote for this feature https://discord.com/channels/598002550221963289/598002550221963291/1068880307664724060 Use case is similar to geopolars/geopolars#20 |
The single-threaded implementation of this will land in |
I didn't investigate the AABB hint or auxiliary allocation for Here's what I ended up with: fn bottom_up_fold_reduce<T, S, I, F, R>(
tree: &RTree<T>,
mut init: I,
mut fold: F,
mut reduce: R,
) -> S
where
T: RTreeObject,
I: FnMut() -> S,
F: FnMut(S, &T) -> S,
R: FnMut(S, S) -> S,
{
fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &mut I, fold: &mut F, reduce: &mut R) -> S
where
T: RTreeObject,
I: FnMut() -> S,
F: FnMut(S, &T) -> S,
R: FnMut(S, S) -> S,
{
parent
.children()
.iter()
.fold(init(), |accum, child| match child {
RTreeNode::Leaf(value) => fold(accum, value),
RTreeNode::Parent(parent) => {
let value = inner(parent, init, fold, reduce);
reduce(accum, value)
}
})
}
inner(tree.root(), &mut init, &mut fold, &mut reduce)
} with let init = || MultiPolygon::<T>::new(vec![]);
let fold = |mut accum: MultiPolygon<T>, poly: &Polygon<T>| -> MultiPolygon<T> {
accum = accum.union(poly);
accum
};
let reduce = |accum1: MultiPolygon<T>, accum2: MultiPolygon<T>| -> MultiPolygon<T> {
accum1.union(&accum2)
}; I'm very happy to keep the implementation in consumer libraries, but it would be good to capture the mechanism for this kind of full traversal and intermediate processing in the crate's docs somewhere – it's complex and deep knowledge of the rstar's internal API is rare enough. |
Cascaded unions are a desirable feature for the
geo
library. However, the algorithm requires post-order traversal of the tree in order to work, but iteration order is currently not specified.The text was updated successfully, but these errors were encountered: