You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current QuadTree implementation (the core of gplt.quadtree) only performs a split if every subpartition satisfies the splitting rules. This was done because this algorithm is easy to implement this way.
However, it has serious limitations for visualization purposes. For interpretability, the best splitting rule to use would be nmin which would guarantee that every sub-partition contains at least n observations. This combined with the current splitting rules is fine if the data is relatively homogeneously distributed. However, when there are gaps in the data due to the underlying topography, this causes the plot to miss splits, which ruins the whole "more splits means more data" thing gplt.quadtree has going on:
The solution is to change the partitioning algorithm. Instead of splitting when all four sub-partition meets the splitting criteria, consolidate contiguous sub-partitions that don't meet the criteria until you have n >= 2 that do. E.g. generate rectangular and/or L-shaped partitions.
There are a couple of reasons why this is an involved change:
The current code passes around (xmin, xmax, ymin, ymax) values, we now have to design an abstraction that handles irregularly shaped partitions instead.
This is compute-intensive code. The result has to be relatively well-optimized.
The text was updated successfully, but these errors were encountered:
The current
QuadTree
implementation (the core ofgplt.quadtree
) only performs a split if every subpartition satisfies the splitting rules. This was done because this algorithm is easy to implement this way.However, it has serious limitations for visualization purposes. For interpretability, the best splitting rule to use would be
nmin
which would guarantee that every sub-partition contains at leastn
observations. This combined with the current splitting rules is fine if the data is relatively homogeneously distributed. However, when there are gaps in the data due to the underlying topography, this causes the plot to miss splits, which ruins the whole "more splits means more data" thinggplt.quadtree
has going on:The documentation currently gets around this by recommending you use
nmax
instead, butnmax
is just not that good a splitting rule:Here's the
nmax
equivalent:The solution is to change the partitioning algorithm. Instead of splitting when all four sub-partition meets the splitting criteria, consolidate contiguous sub-partitions that don't meet the criteria until you have
n >= 2
that do. E.g. generate rectangular and/or L-shaped partitions.There are a couple of reasons why this is an involved change:
(xmin, xmax, ymin, ymax)
values, we now have to design an abstraction that handles irregularly shaped partitions instead.The text was updated successfully, but these errors were encountered: