Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for targets and ignore in Sparsity Compressors #182

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Oct 6, 2024

This PR adds support for using targets and ignore in sparsity compressors.

  • All BaseSparsity.compress(...) methods now accept a compression_targets argument.
  • The compression_targets argument is populated directly by the ModelCompressor.

Verification

The functionality has been verified using the following script:

Verification Script
from transformers import AutoTokenizer
from llmcompressor.modifiers.quantization import QuantizationModifier
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
from llmcompressor.transformers.compression.sparsity_config import SparsityConfigMetadata

MODEL_ID = "nm-testing/llama2.c-stories42M-pruned2.4"

model = SparseAutoModelForCausalLM.from_pretrained(
    MODEL_ID, device_map="auto", torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

def check_first_layer(save_dir, check_compressed=True):
    from safetensors import safe_open
    with safe_open(f"{save_dir}/model.safetensors", framework="pt", device=0) as f:
        layer_0_keys = [key for key in f.keys() if "model.layers.0" in key]
        if check_compressed:
            assert any("compressed" in key for key in layer_0_keys)
        else:
            assert not any("compressed" in key for key in layer_0_keys)

sparsity_config = SparsityConfigMetadata.from_pretrained(model, compress=True)
SAVE_DIR = MODEL_ID.split("/")[1] + "-2of4-compressed"
model.save_pretrained(SAVE_DIR, sparsity_config=sparsity_config)
tokenizer.save_pretrained(SAVE_DIR)
# First layer is compressed
check_first_layer(SAVE_DIR, check_compressed=True)

sparsity_config.ignore.append("re:model.layers.0.*")
SAVE_DIR = MODEL_ID.split("/")[1] + "-2of4-ignored-first-layer"
model.save_pretrained(SAVE_DIR, sparsity_config=sparsity_config)
tokenizer.save_pretrained(SAVE_DIR)
# First layer is not compressed
check_first_layer(SAVE_DIR, check_compressed=False)

The script passes successfully without any assertions.

@rahul-tuli rahul-tuli marked this pull request as ready for review October 7, 2024 13:59
:return: compressed state dict
"""
compressed_dict = {}
_LOGGER.debug(
f"Compressing model with {len(model_state)} parameterized layers..."
)
for name, value in tqdm(model_state.items(), desc="Compressing model"):
prefix = name.rsplit(".", 1)[0]
if compression_targets and prefix not in compression_targets:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if a weight is not in the list of targets, it's not recorded in the state dict at all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how/if this is related to #822 (it's listed as a dependency)

  1. Doesn't this list of targets need to be accounted for during decompression?
  2. Don't these changes throw away any weights which are not targeted for sparse compression?

@@ -276,6 +277,29 @@ def find_name_or_class_matches(
return matches


def find_compression_targets(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be swapped into the quantization lifecycle as well?
If that is the not the case, I would call this find_sparsity_targets to make it clear that this is not used by quantization targets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the naming, this seems likes a general utility function that, given a module input, will resolve all nested modules that match it. Ideally we should reflect that name as well as keeping this in a general utility package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more general function, that just takes in targets list for example ["Linear"] and expands it to all Linear modules, while ignoring things in the ignore_list for example ["lm_head"]; this is not tied to sparsity or quantization, I have changed the name to expand_targets which more closely matches what the function does, The thought behind putting it in this file is because this file already contains such general functions like find_name_or_class_matches

@@ -276,8 +277,9 @@ def compress(
)

if self.sparsity_compressor is not None:
compression_targets = self._find_sparse_compression_targets(model=model)
Copy link
Contributor

@dsikka dsikka Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not care about adding something similar like a sparsity scheme, similar to a quantization scheme, for each layer we want to target, describing how the layer was targeted? How is this list applied during compress/in the config file?

submodule.quantization_scheme = _scheme_from_targets(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should be adding a sparsity config -- ideally we would have one compression config that represents the state of a given module/param and saves how it was encoded and therefore how to load it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we have just one sparsity config for the entire model i.e one kind of sparse compression is applied to the entire model (we do not have support for different sparse compressors for different layers), if we want to attach a SparsityConfig to each submodule (which would enable having potentially different SparseComprressor for each layer), that is a bigger change that we should talk about.

@dsikka the name was misleading before; during compression, the set of targets found here, are used to determine if compress_weight should be called for that module or not, for example if lm_head is not in this list, it will be igored during sparse_compression. Let's take this offline if you need more context.

@markurtz markurtz self-requested a review October 14, 2024 13:35
@@ -276,6 +277,29 @@ def find_name_or_class_matches(
return matches


def find_compression_targets(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the naming, this seems likes a general utility function that, given a module input, will resolve all nested modules that match it. Ideally we should reflect that name as well as keeping this in a general utility package

@@ -276,8 +277,9 @@ def compress(
)

if self.sparsity_compressor is not None:
compression_targets = self._find_sparse_compression_targets(model=model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should be adding a sparsity config -- ideally we would have one compression config that represents the state of a given module/param and saves how it was encoded and therefore how to load it

"""
Compresses a dense state dict using bitmask compression

:param model_state: state dict of uncompressed model
:param compression_targets: optional set of layer prefixes to compress, if None
compress all layers (for backwards compatibility)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we holding backwards compatibility with? Ideally this should default to only compressing models that we detect the 50% sparsity threshold for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For older configs, we will not have targets, this handles those cases;

For newer flow, compression_targets will only contain modules that are over 50% sparse once this lands: vllm-project/llm-compressor#822

@rahul-tuli rahul-tuli force-pushed the add-targets-and-ignore-support branch from 400c6c3 to e5bfd8a Compare October 23, 2024 14:50
@rahul-tuli
Copy link
Member Author

rahul-tuli commented Oct 23, 2024

I'm not sure how/if this is related to #822 (it's listed as a dependency)

  1. Doesn't this list of targets need to be accounted for during decompression?
  2. Don't these changes throw away any weights which are not targeted for sparse compression?

Point 1: Decompression takes care of that using COMPRESSION_PARAM_NAMES
Point 2: Fixed

It is listed as a dependency for #822 because without this we cannot enable sparse compression + quantization compression. These changes are needed for #822 to work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants