Support for `targets` and `ignore` in `Sparsity Compressors` #182

rahul-tuli · 2024-10-06T23:25:57Z

This PR adds support for using targets and ignore in sparsity compressors.

All BaseSparsity.compress(...) methods now accept a compression_targets argument.
The compression_targets argument is populated directly by the ModelCompressor.

Verification

The functionality has been verified using the following script:

Verification Script

from transformers import AutoTokenizer
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.transformers.compression.sparsity_config import SparsityConfigMetadata

MODEL_ID = "nm-testing/llama2.c-stories42M-pruned2.4"

model = SparseAutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

def check_first_layer(save_dir, check_compressed=True):
    from safetensors import safe_open
    with safe_open(f"{save_dir}/model.safetensors", framework="pt", device=0) as f:
        layer_0_keys = [key for key in f.keys() if "model.layers.0" in key]
        if check_compressed:
            assert any("compressed" in key for key in layer_0_keys)
        else:
            assert not any("compressed" in key for key in layer_0_keys)

sparsity_config = SparsityConfigMetadata.from_pretrained(model, compress=True)
SAVE_DIR = MODEL_ID.split("/")[1] + "-2of4-compressed"
model.save_pretrained(SAVE_DIR, sparsity_config=sparsity_config)
tokenizer.save_pretrained(SAVE_DIR)
# First layer is compressed
check_first_layer(SAVE_DIR, check_compressed=True)

sparsity_config.ignore.append("re:model.layers.0.*")
SAVE_DIR = MODEL_ID.split("/")[1] + "-2of4-ignored-first-layer"
model.save_pretrained(SAVE_DIR, sparsity_config=sparsity_config)
tokenizer.save_pretrained(SAVE_DIR)
# First layer is not compressed
check_first_layer(SAVE_DIR, check_compressed=False)

The script passes successfully without any assertions.

tests/test_quantization/lifecycle/test_apply.py

src/compressed_tensors/compressors/sparse_compressors/base.py

kylesayrs · 2024-10-07T14:22:58Z

src/compressed_tensors/compressors/sparse_compressors/base.py

        :return: compressed state dict
        """
        compressed_dict = {}
        _LOGGER.debug(
            f"Compressing model with {len(model_state)} parameterized layers..."
        )
        for name, value in tqdm(model_state.items(), desc="Compressing model"):
+            prefix = name.rsplit(".", 1)[0]
+            if compression_targets and prefix not in compression_targets:


So if a weight is not in the list of targets, it's not recorded in the state dict at all?

Good catch!

kylesayrs

I'm not sure how/if this is related to #822 (it's listed as a dependency)

Doesn't this list of targets need to be accounted for during decompression?
Don't these changes throw away any weights which are not targeted for sparse compression?

dsikka · 2024-10-07T16:40:55Z

src/compressed_tensors/quantization/lifecycle/apply.py

@@ -276,6 +277,29 @@ def find_name_or_class_matches(
        return matches


+def find_compression_targets(


Can this be swapped into the quantization lifecycle as well?
If that is the not the case, I would call this find_sparsity_targets to make it clear that this is not used by quantization targets

Agree on the naming, this seems likes a general utility function that, given a module input, will resolve all nested modules that match it. Ideally we should reflect that name as well as keeping this in a general utility package

This is a more general function, that just takes in targets list for example ["Linear"] and expands it to all Linear modules, while ignoring things in the ignore_list for example ["lm_head"]; this is not tied to sparsity or quantization, I have changed the name to expand_targets which more closely matches what the function does, The thought behind putting it in this file is because this file already contains such general functions like find_name_or_class_matches

dsikka · 2024-10-07T16:54:04Z

src/compressed_tensors/compressors/model_compressors/model_compressor.py

@@ -276,8 +277,9 @@ def compress(
                )

        if self.sparsity_compressor is not None:
+            compression_targets = self._find_sparse_compression_targets(model=model)


do we not care about adding something similar like a sparsity scheme, similar to a quantization scheme, for each layer we want to target, describing how the layer was targeted? How is this list applied during compress/in the config file?

compressed-tensors/src/compressed_tensors/quantization/lifecycle/apply.py

Line 170 in c2455b7

submodule.quantization_scheme = _scheme_from_targets(

Agree we should be adding a sparsity config -- ideally we would have one compression config that represents the state of a given module/param and saves how it was encoded and therefore how to load it

Right now, we have just one sparsity config for the entire model i.e one kind of sparse compression is applied to the entire model (we do not have support for different sparse compressors for different layers), if we want to attach a SparsityConfig to each submodule (which would enable having potentially different SparseComprressor for each layer), that is a bigger change that we should talk about.

@dsikka the name was misleading before; during compression, the set of targets found here, are used to determine if compress_weight should be called for that module or not, for example if lm_head is not in this list, it will be igored during sparse_compression. Let's take this offline if you need more context.

src/compressed_tensors/compressors/model_compressors/model_compressor.py

markurtz · 2024-10-18T00:55:19Z

src/compressed_tensors/quantization/lifecycle/apply.py

@@ -276,6 +277,29 @@ def find_name_or_class_matches(
        return matches


+def find_compression_targets(


Agree on the naming, this seems likes a general utility function that, given a module input, will resolve all nested modules that match it. Ideally we should reflect that name as well as keeping this in a general utility package

markurtz · 2024-10-18T00:56:46Z

src/compressed_tensors/compressors/model_compressors/model_compressor.py

@@ -276,8 +277,9 @@ def compress(
                )

        if self.sparsity_compressor is not None:
+            compression_targets = self._find_sparse_compression_targets(model=model)


Agree we should be adding a sparsity config -- ideally we would have one compression config that represents the state of a given module/param and saves how it was encoded and therefore how to load it

markurtz · 2024-10-18T00:59:11Z

src/compressed_tensors/compressors/sparse_compressors/base.py

        """
        Compresses a dense state dict using bitmask compression

        :param model_state: state dict of uncompressed model
+        :param compression_targets: optional set of layer prefixes to compress, if None
+            compress all layers (for backwards compatibility)


What are we holding backwards compatibility with? Ideally this should default to only compressing models that we detect the 50% sparsity threshold for

For older configs, we will not have targets, this handles those cases;

For newer flow, compression_targets will only contain modules that are over 50% sparse once this lands: vllm-project/llm-compressor#822

rahul-tuli · 2024-10-23T14:52:45Z

I'm not sure how/if this is related to #822 (it's listed as a dependency)

Doesn't this list of targets need to be accounted for during decompression?

Don't these changes throw away any weights which are not targeted for sparse compression?

Point 1: Decompression takes care of that using COMPRESSION_PARAM_NAMES
Point 2: Fixed

It is listed as a dependency for #822 because without this we cannot enable sparse compression + quantization compression. These changes are needed for #822 to work fine.

rahul-tuli mentioned this pull request Oct 7, 2024

Enable Sparse compression vllm-project/llm-compressor#822

Open

3 tasks

rahul-tuli requested review from mgoin, kylesayrs, dsikka and horheynm October 7, 2024 13:59

rahul-tuli marked this pull request as ready for review October 7, 2024 13:59

kylesayrs reviewed Oct 7, 2024

View reviewed changes

tests/test_quantization/lifecycle/test_apply.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 7, 2024

View reviewed changes

src/compressed_tensors/compressors/sparse_compressors/base.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 7, 2024

View reviewed changes

dsikka reviewed Oct 7, 2024

View reviewed changes

markurtz self-requested a review October 14, 2024 13:35

markurtz requested changes Oct 18, 2024

View reviewed changes

rahul-tuli added 2 commits October 23, 2024 14:49

Add targets and ignore support to BaseSparsityCompressor

286c081

Address review comments

e5bfd8a

rahul-tuli force-pushed the add-targets-and-ignore-support branch from 400c6c3 to e5bfd8a Compare October 23, 2024 14:50

Update tests

1a7cdba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `targets` and `ignore` in `Sparsity Compressors` #182

Support for `targets` and `ignore` in `Sparsity Compressors` #182

rahul-tuli commented Oct 6, 2024 •

edited

Loading

kylesayrs Oct 7, 2024

rahul-tuli Oct 23, 2024

kylesayrs left a comment

dsikka Oct 7, 2024

markurtz Oct 18, 2024

rahul-tuli Oct 23, 2024

dsikka Oct 7, 2024 •

edited

Loading

markurtz Oct 18, 2024

rahul-tuli Oct 23, 2024

markurtz Oct 18, 2024

markurtz Oct 18, 2024

markurtz Oct 18, 2024

rahul-tuli Oct 23, 2024

rahul-tuli commented Oct 23, 2024 •

edited

Loading

		@@ -276,6 +277,29 @@ def find_name_or_class_matches(
		return matches


		def find_compression_targets(

Support for targets and ignore in Sparsity Compressors #182

Are you sure you want to change the base?

Support for targets and ignore in Sparsity Compressors #182

Conversation

rahul-tuli commented Oct 6, 2024 • edited Loading

Verification

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylesayrs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsikka Oct 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul-tuli commented Oct 23, 2024 • edited Loading

Support for `targets` and `ignore` in `Sparsity Compressors` #182

Support for `targets` and `ignore` in `Sparsity Compressors` #182

rahul-tuli commented Oct 6, 2024 •

edited

Loading

dsikka Oct 7, 2024 •

edited

Loading

rahul-tuli commented Oct 23, 2024 •

edited

Loading