Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Text Generation][V2] NonKVCachePipeline #1417

Closed
wants to merge 7 commits into from

Conversation

dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Nov 17, 2023

Feature Description

Added the TestGenerationPipelineNoKVCache. This pipeline processes the prompt and returns the new token. That's it.
Its main functionality is mapping prompt tokens to logits, instrumental for computing the perplexity of the model given a dataset

Testing

Updated the integration tests to cover the case of non-kv-cache inference.

Example Use

from deepsparse.v2.text_generation import TextGenerationPipelineNoCache

prompt = ["Some funny prompt", "Why are you so"]

pipeline = TextGenerationPipelineNoCache(model_path="hf:mgoin/TinyStories-1M-ds",
                                         onnx_model_name="model-orig.onnx",
                                         sequence_length=20)

out = pipeline(prompt=prompt,
               include_prompt_logits=True,
               generation_kwargs=dict(output_scores=True))

for gen in out.generations:
    print(gen)
text='.' score=array([[ 2.9344807 , -0.03345669, -4.11256   , ..., -6.9316325 ,
        -4.6005425 ,  1.1827914 ],
       [ 7.008805  , -0.11603884, -7.1837015 , ..., -7.0405912 ,
        -2.386351  , -2.2007818 ],
       [ 6.348213  , -2.2960157 , -6.433192  , ..., -6.5930486 ,
        -5.8315077 , -0.58804405],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ]], dtype=float32) finished=True finished_reason='length' # notice that logits get zero padding from the end, this is because all logits need to have the same shape (the length of the longest prompt in the input +1)
text=' sad' score=array([[ 2.560934 ,  1.1993233, -6.670935 , ..., -7.3002615, -3.823823 ,
         1.8125833],
       [-1.1050931, -2.4256568, -7.3015127, ..., -6.1500154, -4.074909 ,
         1.8155754],
       [ 6.172593 , -2.2252593, -9.146653 , ..., -7.70834  , -4.810748 ,
         0.3985293],
       [ 1.4988875,  1.0973434, -4.4714937, ..., -4.8026247, -1.1791464,
         1.6924176]], dtype=float32) finished=True finished_reason='length'

Next steps

  • Create a parentTextGenerationPipeline operator that can either choose to use the kv-cache or non-kv cache version of the pipeline, depending on the topology of the ONNX model
  • Move the overwriting of the transformer inputs to some high-level function
  • Use the V2 pipeline for Perplexity calculation
  • swap GraphRouter for LinearRouter in TextGenerationPipelineNoKVCache

@dbogunowicz dbogunowicz changed the base branch from main to v2 November 17, 2023 15:46
@dbogunowicz dbogunowicz changed the base branch from v2 to feature/damian/v2/factor_out_transformation_utils November 20, 2023 13:31
@dbogunowicz dbogunowicz changed the base branch from feature/damian/v2/factor_out_transformation_utils to v2 November 20, 2023 13:33
src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved
src/deepsparse/v2/text_generation/pipeline.py Outdated Show resolved Hide resolved
src/deepsparse/v2/text_generation/nl_engine_operator.py Outdated Show resolved Hide resolved
@dbogunowicz dbogunowicz changed the title [Text Generation][V2] NonKVCachePipeline [WiP][Text Generation][V2] NonKVCachePipeline Nov 27, 2023
@dbogunowicz dbogunowicz marked this pull request as ready for review November 27, 2023 14:11
@dbogunowicz dbogunowicz changed the title [WiP][Text Generation][V2] NonKVCachePipeline [Text Generation][V2] NonKVCachePipeline Nov 28, 2023
Base automatically changed from v2 to main December 6, 2023 15:37
bfineran and others added 6 commits December 18, 2023 16:08
… router and image classification pipeline/operators/example (#1325)

* initial functionality and working example with image classification

* remove testing image

* update args

* initial functionality and working example with image classification

* remove testing image

* pr comments

* defines schemas for operators and test

* add image classification test, PR comments

* fix input/output handling in pipeline and operator base classes to be more generic; remove context

* add additional operator input message

* typo fix
* [v2] EngineOperator updates to make continuous batching easier

* test fixes
…ity (#1348)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings
…generation functionality (#1356)

* initial functionality and working example with image classification

* remove testing image

* rebase fixes

* initial functionality and working example with image classification

* text gen

* updates func

* prompt inference, initial functionality

* remove image; update state docstring

* Fix typo

* add todo for split/join

* remove context, clean-up args, remove prefill_preprocess_operaator

* fix docstrings

* initial functionality and working example with image classification

* updates func

* prompt inference, initial functionality

* finish generation operators and update routes

* further breakdown operators

* add operators

* fix can_operate condition

* update can_operate to not rely on the inference_state

* rebase + update

* fix condition

* fix capacity settting again

* typo fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants