Skip to content

Commit

Permalink
write distributed_kmeans centroids and assignments to hive tables (fa…
Browse files Browse the repository at this point in the history
…cebookresearch#4017)

Summary:
Pull Request resolved: facebookresearch#4017

Exposing an option to run kmeans centroids and assignments to hive table which should bring us close in parity with Digraph's Kmeans API. This is needed for cluster balance data quality checks for large scale centroids

Reviewed By: kuarora

Differential Revision: D64835789

fbshipit-source-id: 95cbea00bb6b4733c03836049bc379be813bf9e5
  • Loading branch information
mengdilin authored and facebook-github-bot committed Nov 6, 2024
1 parent a11c1db commit cfd4804
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions benchs/bench_fw/descriptors.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ class DatasetDescriptor:

embedding_column: Optional[str] = None

embedding_id_column: Optional[str] = None

sampling_rate: Optional[float] = None

# sampling column for xdb
Expand Down

0 comments on commit cfd4804

Please sign in to comment.