Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for COSINE in fused ADC #329

Merged
merged 1 commit into from
May 24, 2024
Merged

Implement support for COSINE in fused ADC #329

merged 1 commit into from
May 24, 2024

Conversation

jkni
Copy link
Collaborator

@jkni jkni commented May 24, 2024

This PR also renamed QuickADCPQDecoder to FusedADCPQDecoder for better consistency with naming elsewhere, flattens the type hierarchy in FusedADCPQDecoder, and reduces duplicated C code with some force-inlined functions.

@jkni jkni requested a review from jbellis May 24, 2024 19:20
@@ -132,7 +132,8 @@ static void runOneGraph(List<? extends Set<FeatureId>> featureSets,
}

indexes.forEach((features, index) -> {
try (var cs = new ConfiguredSystem(ds, index, cv)) {
try (var cs = new ConfiguredSystem(ds, index instanceof OnDiskGraphIndex ? new CachingGraphIndex((OnDiskGraphIndex) index) : index, cv,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this logic to where indexes are created?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I hesitated because it pushes the compression grid into the build methods, which otherwise don't care about these compression configurations. I don't really have strong feelings either way given that change, so happy to go with whatever you'd prefer here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, this WFM

*/


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am impressed that this worked first try!

@@ -131,11 +131,71 @@ default void bulkShuffleQuantizedSimilarity(ByteSequence<?> shuffles, int codebo
}
}

// default implementation used here because Panama SIMD can't express necessary SIMD operations and degrades to scalar
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunate!

Copy link
Owner

@jbellis jbellis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

How does performance compare to DP?

@jkni
Copy link
Collaborator Author

jkni commented May 24, 2024

LGTM.

How does performance compare to DP?

There's some overhead, comparable to the overhead of regular PQ cosine to regular PQ dot product. On openai-v3-large-1536-100k, PQ(192,256), LVQ/Fused ADC, as a fairly representative run based on what I've seen locally:

COSINE

 Query top 100/200 recall 0.9798 in 3.90s after 42,044,504 nodes visited

DOT_PRODUCT

 Query top 100/200 recall 0.9770 in 3.75s after 42,250,616 nodes visited

@jbellis
Copy link
Owner

jbellis commented May 24, 2024

Ship it!

@jkni jkni merged commit d8a2b49 into main May 24, 2024
6 checks passed
@jkni jkni deleted the cosine-fused-adc branch May 24, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants