-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for COSINE in fused ADC #329
Conversation
@@ -132,7 +132,8 @@ static void runOneGraph(List<? extends Set<FeatureId>> featureSets, | |||
} | |||
|
|||
indexes.forEach((features, index) -> { | |||
try (var cs = new ConfiguredSystem(ds, index, cv)) { | |||
try (var cs = new ConfiguredSystem(ds, index instanceof OnDiskGraphIndex ? new CachingGraphIndex((OnDiskGraphIndex) index) : index, cv, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this logic to where indexes are created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I hesitated because it pushes the compression grid into the build methods, which otherwise don't care about these compression configurations. I don't really have strong feelings either way given that change, so happy to go with whatever you'd prefer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, this WFM
*/ | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am impressed that this worked first try!
@@ -131,11 +131,71 @@ default void bulkShuffleQuantizedSimilarity(ByteSequence<?> shuffles, int codebo | |||
} | |||
} | |||
|
|||
// default implementation used here because Panama SIMD can't express necessary SIMD operations and degrades to scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
How does performance compare to DP?
There's some overhead, comparable to the overhead of regular PQ cosine to regular PQ dot product. On openai-v3-large-1536-100k, PQ(192,256), LVQ/Fused ADC, as a fairly representative run based on what I've seen locally: COSINE
DOT_PRODUCT
|
Ship it! |
This PR also renamed QuickADCPQDecoder to FusedADCPQDecoder for better consistency with naming elsewhere, flattens the type hierarchy in FusedADCPQDecoder, and reduces duplicated C code with some force-inlined functions.