[Experimental][StarCode] KV Cache Injection #2080
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature Description
The results of my experimentation with the
tiny_starcoder
model.Findings:
past_key_values.{attn_block_id}.values
andpast_key_values.{attn_block_id}.keys
, but as a join array of keys and values. Did not get to look into breaking those two down, but by analyzing the onnx graph I do not see why we could not do itcausal_mask
input, that applies the appropriate permutation to the input to patch this.This is an experimental branch, for which I will, for now, stop the development due to other priorities. To revisit in the future.