You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using only the anonymizer scanner with the LLM Guard API, and I noticed significant memory (RAM) increase when processing larger inputs, with the memory usage never decreasing afterward. For example, when processing small inputs, memory consumption is around 2GB. However, when I pass inputs of around 4-5k characters, the memory usage increases to 4-5GB and stays at that level even after the processing is complete. If I input something excessive, like 15k characters, memory usage spikes to 240GB (likely starting to write to disk at that point).
I experience this behavior with all default settings, except for removing some scanners from the scanners.yaml file. Is this expected behavior, or is there an issue with memory management when using the anonymizer scanner?
The text was updated successfully, but these errors were encountered:
From my use of LLM Guard's Anonymizescanner, I can confirm the above behavior. For longer inputs, crazy amounts of memory are allocated. I checked the source code, but there seems to be no way to influence this behavior with a param to the Anonymize constructor.
I wondered though, if in the model config in ner_mapping.py, the dict key chunk_size is respected when running the models. I would expect a fixe chunk size of 600, as it is set in the config, to not lead to such extensive memory usage. But maybe this assumption is wrong and it's still about the original input's size.
I am using only the anonymizer scanner with the LLM Guard API, and I noticed significant memory (RAM) increase when processing larger inputs, with the memory usage never decreasing afterward. For example, when processing small inputs, memory consumption is around 2GB. However, when I pass inputs of around 4-5k characters, the memory usage increases to 4-5GB and stays at that level even after the processing is complete. If I input something excessive, like 15k characters, memory usage spikes to 240GB (likely starting to write to disk at that point).
I experience this behavior with all default settings, except for removing some scanners from the
scanners.yaml
file. Is this expected behavior, or is there an issue with memory management when using the anonymizer scanner?The text was updated successfully, but these errors were encountered: