You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At this point, profiles of our PQ look like it's almost entirely using distance work. Barring large parameter changes or a paradigm shift in how we quantize, it seems like the next avenue is reducing distance calculations. There are a few approaches out there that focus on this. Several use kd-trees of different types (main tradeoff is how the tree is split, stopping points, and how entangled the clustering is with the construction of the kd-tree). There also exist kd-tree approaches for choosing initial centroids by leaf destiny. Interesting papers in this direction include "Fast K-Means Clustering Via K-D Trees, Sampling, and Parallelism" by Crowell which provides a good survey of the field and "Tree-Based Algorithm for Stable and Efficient Data Clustering" by Aljabbouli, Albizri, and Harfouche.
An alternative approach that might require fewer modifications is presented in "A fast k-means clustering algorithm using cluster center displacement" (Jim Z.C. Lai, Tsung-Jen Huang, Yi-Ching Liaw), which also focuses on reducing distance calculations, but it does not require a specialized data structure.
These approaches are unlikely to move the needle on quality of our quantization and should be considered primarily if we need efficiency improvements on top of our current approach.
The text was updated successfully, but these errors were encountered:
jkni
changed the title
Investigate PQ speed ups
Investigate PQ speed ups by reducing distance calculations
Oct 12, 2023
At this point, profiles of our PQ look like it's almost entirely using distance work. Barring large parameter changes or a paradigm shift in how we quantize, it seems like the next avenue is reducing distance calculations. There are a few approaches out there that focus on this. Several use kd-trees of different types (main tradeoff is how the tree is split, stopping points, and how entangled the clustering is with the construction of the kd-tree). There also exist kd-tree approaches for choosing initial centroids by leaf destiny. Interesting papers in this direction include "Fast K-Means Clustering Via K-D Trees, Sampling, and Parallelism" by Crowell which provides a good survey of the field and "Tree-Based Algorithm for Stable and Efficient Data Clustering" by Aljabbouli, Albizri, and Harfouche.
An alternative approach that might require fewer modifications is presented in "A fast k-means clustering algorithm using cluster center displacement" (Jim Z.C. Lai, Tsung-Jen Huang, Yi-Ching Liaw), which also focuses on reducing distance calculations, but it does not require a specialized data structure.
These approaches are unlikely to move the needle on quality of our quantization and should be considered primarily if we need efficiency improvements on top of our current approach.
The text was updated successfully, but these errors were encountered: