-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for targets
and ignore
in Sparsity Compressors
#182
base: main
Are you sure you want to change the base?
Conversation
:return: compressed state dict | ||
""" | ||
compressed_dict = {} | ||
_LOGGER.debug( | ||
f"Compressing model with {len(model_state)} parameterized layers..." | ||
) | ||
for name, value in tqdm(model_state.items(), desc="Compressing model"): | ||
prefix = name.rsplit(".", 1)[0] | ||
if compression_targets and prefix not in compression_targets: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if a weight is not in the list of targets, it's not recorded in the state dict at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how/if this is related to #822 (it's listed as a dependency)
- Doesn't this list of targets need to be accounted for during decompression?
- Don't these changes throw away any weights which are not targeted for sparse compression?
@@ -276,6 +277,29 @@ def find_name_or_class_matches( | |||
return matches | |||
|
|||
|
|||
def find_compression_targets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be swapped into the quantization lifecycle as well?
If that is the not the case, I would call this find_sparsity_targets
to make it clear that this is not used by quantization targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the naming, this seems likes a general utility function that, given a module input, will resolve all nested modules that match it. Ideally we should reflect that name as well as keeping this in a general utility package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more general function, that just takes in targets list for example ["Linear"] and expands it to all Linear modules, while ignoring things in the ignore_list for example ["lm_head"]; this is not tied to sparsity or quantization, I have changed the name to expand_targets
which more closely matches what the function does, The thought behind putting it in this file is because this file already contains such general functions like find_name_or_class_matches
@@ -276,8 +277,9 @@ def compress( | |||
) | |||
|
|||
if self.sparsity_compressor is not None: | |||
compression_targets = self._find_sparse_compression_targets(model=model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not care about adding something similar like a sparsity scheme, similar to a quantization scheme, for each layer we want to target, describing how the layer was targeted? How is this list applied during compress/in the config file?
submodule.quantization_scheme = _scheme_from_targets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should be adding a sparsity config -- ideally we would have one compression config that represents the state of a given module/param and saves how it was encoded and therefore how to load it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we have just one sparsity config for the entire model i.e one kind of sparse compression is applied to the entire model (we do not have support for different sparse compressors for different layers), if we want to attach a SparsityConfig to each submodule (which would enable having potentially different SparseComprressor for each layer), that is a bigger change that we should talk about.
@dsikka the name was misleading before; during compression, the set of targets found here, are used to determine if compress_weight should be called for that module or not, for example if lm_head
is not in this list, it will be igored during sparse_compression. Let's take this offline if you need more context.
src/compressed_tensors/compressors/model_compressors/model_compressor.py
Outdated
Show resolved
Hide resolved
@@ -276,6 +277,29 @@ def find_name_or_class_matches( | |||
return matches | |||
|
|||
|
|||
def find_compression_targets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on the naming, this seems likes a general utility function that, given a module input, will resolve all nested modules that match it. Ideally we should reflect that name as well as keeping this in a general utility package
@@ -276,8 +277,9 @@ def compress( | |||
) | |||
|
|||
if self.sparsity_compressor is not None: | |||
compression_targets = self._find_sparse_compression_targets(model=model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should be adding a sparsity config -- ideally we would have one compression config that represents the state of a given module/param and saves how it was encoded and therefore how to load it
""" | ||
Compresses a dense state dict using bitmask compression | ||
|
||
:param model_state: state dict of uncompressed model | ||
:param compression_targets: optional set of layer prefixes to compress, if None | ||
compress all layers (for backwards compatibility) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we holding backwards compatibility with? Ideally this should default to only compressing models that we detect the 50% sparsity threshold for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For older configs, we will not have targets, this handles those cases;
For newer flow, compression_targets will only contain modules that are over 50% sparse once this lands: vllm-project/llm-compressor#822
400c6c3
to
e5bfd8a
Compare
Point 1: Decompression takes care of that using COMPRESSION_PARAM_NAMES It is listed as a dependency for #822 because without this we cannot enable sparse compression + quantization compression. These changes are needed for #822 to work fine. |
This PR adds support for using
targets
andignore
in sparsity compressors.BaseSparsity.compress(...)
methods now accept acompression_targets
argument.compression_targets
argument is populated directly by theModelCompressor
.Verification
The functionality has been verified using the following script:
Verification Script
The script passes successfully without any assertions.