Skip to content

Commit

Permalink
[Cherry-Pick][Text Generation] Terminate the inference when kv cache …
Browse files Browse the repository at this point in the history
…is full (#1447)

* [Fix] Remove erronous LIB.kv_cache input when using external kv cache management (#1337)

* initial commit

* initial commit

* cleanup

* cleanup2

* initial commit

* initial commit

* Needs to be >=
  • Loading branch information
dbogunowicz authored Dec 1, 2023
1 parent 39e21d3 commit e94dcac
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions src/deepsparse/transformers/pipelines/text_generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -829,6 +829,11 @@ def engine_forward(
generated_tokens.append(token)
generated_logits.append(logits)

if session.total_num_processed_tokens >= session.capacity:
# if the kv cache is full, stop generation
finished_reason.append(FinishReason.CAPACITY)
break

if (
token == self.tokenizer.eos_token_id
and not self.force_max_tokens
Expand Down

0 comments on commit e94dcac

Please sign in to comment.