Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Group Index Conditioning #405

Merged
merged 6 commits into from
Sep 3, 2024
Merged

Group Index Conditioning #405

merged 6 commits into from
Sep 3, 2024

Conversation

kylesayrs
Copy link

@kylesayrs kylesayrs commented Aug 30, 2024

Purpose

  • Support models quantized with actorder="weight", which is a special kind of activation ordering which orders the weight but not the group, therefore not requiring g_idx

Changes

  • Update activation-ordering config model to support new config options, while still being backwards compatible with actorder=True/False
  • Replace logic for actorder==True with has_g_idx==True
    • This handles the actorder=weight case, since that case does not load g_idx tensors
  • Add e2e tests of activation ordering models

Testing

Accuracy

Full Precision

vllm (pretrained=Qwen/Qwen2-0.5B-Instruct,add_bos_token=True), gen_kwargs: (None), limit: 1000.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|?  |0.394|?  |0.0155|
|     |       |strict-match    |     5|exact_match|?  |0.393|?  |0.0155|

No Activation Ordering

vllm (pretrained=/home/ksayers/llm-compressor/qwen_group_only,add_bos_token=True), gen_kwargs: (None), limit: 1000.0, num_fewshot: 5, batch_size: auto                               
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|                                                                                                             
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|                                                                                                             
|gsm8k|      3|flexible-extract|     5|exact_match|?  |0.228|?  |0.0133|                                                                                                             
|     |       |strict-match    |     5|exact_match|?  |0.217|?  |0.0130| 

Group Activation Ordering

vllm (pretrained=/home/ksayers/llm-compressor/qwen_actorder_group,add_bos_token=True), gen_kwargs: (None), limit: 1000.0, num_fewshot: 5, batch_size: auto                           
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|                                                                                                             
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|                                                                                                             
|gsm8k|      3|flexible-extract|     5|exact_match|?  |0.241|?  |0.0135|                                                                                                             
|     |       |strict-match    |     5|exact_match|?  |0.236|?  |0.0134|

Weight-only Activation Ordering

vllm (pretrained=/home/ksayers/llm-compressor/qwen_actorder_weight,add_bos_token=True), gen_kwargs: (None), limit: 1000.0, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|?  |0.254|?  |0.0138|
|     |       |strict-match    |     5|exact_match|?  |0.213|?  |0.0130|

Latency

Full Precision

Avg latency: 0.612304248350362 seconds
10% percentile latency: 0.6050410680472851 seconds
25% percentile latency: 0.6059907954186201 seconds
50% percentile latency: 0.6098776273429394 seconds
75% percentile latency: 0.6107742763124406 seconds
90% percentile latency: 0.6114723403006792 seconds
99% percentile latency: 0.6905647566355766 seconds

No Activation Ordering

Avg latency: 0.451185735501349 seconds
10% percentile latency: 0.4459945809096098 seconds
25% percentile latency: 0.44648625515401363 seconds
50% percentile latency: 0.4473606375977397 seconds
75% percentile latency: 0.4487249795347452 seconds
90% percentile latency: 0.45016821939498186 seconds
99% percentile latency: 0.5247877304255963 seconds

Group Activation Ordering

Avg latency: 0.47711189221590755 seconds
10% percentile latency: 0.47196690943092107 seconds
25% percentile latency: 0.4725433448329568 seconds
50% percentile latency: 0.47307876218110323 seconds
75% percentile latency: 0.4750088737346232 seconds
90% percentile latency: 0.4756721451878548 seconds
99% percentile latency: 0.5509468417987229 seconds

Weight Activation Ordering

Avg latency: 0.4507347485671441 seconds
10% percentile latency: 0.4456333613023162 seconds
25% percentile latency: 0.446005588863045 seconds
50% percentile latency: 0.44688841979950666 seconds
75% percentile latency: 0.44848238583654165 seconds
90% percentile latency: 0.4493972914293408 seconds
99% percentile latency: 0.5238330581225455 seconds

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

  • Comment /ready on the PR
  • Add ready label to the PR
  • Enable auto-merge.

🚀

@kylesayrs kylesayrs changed the base branch from act-order to main September 1, 2024 17:18
@kylesayrs kylesayrs changed the base branch from main to act-order September 1, 2024 17:18
@kylesayrs kylesayrs changed the base branch from act-order to main September 1, 2024 17:20
@kylesayrs kylesayrs changed the base branch from main to act-order September 1, 2024 17:20
@kylesayrs kylesayrs changed the title Kylesayrs/g idx act order Group Index Conditioning Sep 1, 2024
@kylesayrs kylesayrs self-assigned this Sep 1, 2024
@kylesayrs kylesayrs merged commit cc2c9ab into act-order Sep 3, 2024
1 check passed
@kylesayrs kylesayrs deleted the kylesayrs/g-idx-act-order branch September 3, 2024 20:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant