Investigate PQ speed ups by reducing distance calculations #124

jkni · 2023-10-12T15:21:46Z

At this point, profiles of our PQ look like it's almost entirely using distance work. Barring large parameter changes or a paradigm shift in how we quantize, it seems like the next avenue is reducing distance calculations. There are a few approaches out there that focus on this. Several use kd-trees of different types (main tradeoff is how the tree is split, stopping points, and how entangled the clustering is with the construction of the kd-tree). There also exist kd-tree approaches for choosing initial centroids by leaf destiny. Interesting papers in this direction include "Fast K-Means Clustering Via K-D Trees, Sampling, and Parallelism" by Crowell which provides a good survey of the field and "Tree-Based Algorithm for Stable and Efficient Data Clustering" by Aljabbouli, Albizri, and Harfouche.

An alternative approach that might require fewer modifications is presented in "A fast k-means clustering algorithm using cluster center displacement" (Jim Z.C. Lai, Tsung-Jen Huang, Yi-Ching Liaw), which also focuses on reducing distance calculations, but it does not require a specialized data structure.

These approaches are unlikely to move the needle on quality of our quantization and should be considered primarily if we need efficiency improvements on top of our current approach.

jkni changed the title ~~Investigate PQ speed ups~~ Investigate PQ speed ups by reducing distance calculations Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate PQ speed ups by reducing distance calculations #124

Investigate PQ speed ups by reducing distance calculations #124

jkni commented Oct 12, 2023

Investigate PQ speed ups by reducing distance calculations #124

Investigate PQ speed ups by reducing distance calculations #124

Comments

jkni commented Oct 12, 2023