[Bug]: AUPRO metric in Anomalib engine throws torchmetric error #2424

marietteschonfeld · 2024-11-19T11:34:11Z

Describe the bug

I'm trying to use the AUPRO metric within the engine to measure segmentation performance. I use the following code:

from typing import Any
from src.anomalib import TaskType
from src.anomalib.engine import Engine
from src.anomalib.models import Padim
from src.anomalib.data import MVTec

category = 'grid'

# Create the datamodule
datamodule = MVTec(num_workers=0,category=category)
datamodule.prepare_data()  # Downloads the dataset if it's not in the specified `root` directory
datamodule.setup()

model = Padim(backbone="resnet18")

# # start training
engine = Engine(task=TaskType.SEGMENTATION, image_metrics=["AUROC"], pixel_metrics=["AUPRO"])
engine.fit(model=model, datamodule=datamodule)

# # load best model from checkpoint before evaluating
test_results = engine.test(
    model=model,
    datamodule=datamodule,
    ckpt_path=engine.trainer.checkpoint_callback.best_model_path,
)

When adding "AUPRO" to the list of segmentation metrics, I get the following error:

Traceback (most recent call last):
  File "/Users/marietteschonfeld/anomalib/anomalib_runner.py", line 22, in <module>
    test_results = engine.test(
  File "/Users/marietteschonfeld/anomalib/src/anomalib/engine/engine.py", line 696, in test
    return self.trainer.test(model, dataloaders, ckpt_path, verbose, datamodule)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 748, in test
    return call._call_and_handle_interrupt(
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 788, in _test_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run
    self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 398, in _restore_modules_and_callbacks
    self.restore_model()
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 275, in restore_model
    self.trainer.strategy.load_model_state_dict(
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 371, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "/Users/marietteschonfeld/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 185, in load_state_dict
    self._load_metrics(state_dict)
  File "/Users/marietteschonfeld/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 191, in _load_metrics
    self._add_metrics("pixel", state_dict)
  File "/Users/marietteschonfeld/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 215, in _add_metrics
    metrics.add_metrics(metrics_cls())
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/torchmetrics/collections.py", line 476, in add_metrics
    raise ValueError(f"Encountered two metrics both named {name}")
ValueError: Encountered two metrics both named AUPRO

I am working in a forked repository, but I was able to reproduce this error when starting with a fresh environment and installing from source. Installing with pip in a fresh environment gave a host of different errors that seemed unrelated to this problem.

Dataset

MVTec

Model

PADiM

Steps to reproduce the behavior

Install anomalib from source:

yes | conda create -n anomalib_env python=3.10
conda activate anomalib_env
git clone https://github.com/openvinotoolkit/anomalib.git
cd anomalib
pip install -e .
anomalib install

But I had to install torch and lightning manually. I am wondering whether me being on MacOS is the root issue.
I also tried renaming the AUPRO files, the error just changed to
ValueError: Encountered two metrics both named temp_AUPRO

OS information

OS: MacOS Sonoma 14.4
Python version: 3.10.15
Anomalib version: 2.0.0.dev0
PyTorch version: 2.5.1
Torchmetrics version: 1.5.2
Lightning version: 2.4.0
GPU models and configuration: MPS

Expected behavior

I expected a warning that AUPRO could not be used yet as a metric inside the engine, or the metric to be calculated like it would be for AUROC.

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

No response

Configuration YAML

None

Logs

`/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/timm/models/layers/__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
INFO:src.anomalib.data.image.mvtec:Found the dataset.
INFO:anomalib.models.components.base.anomaly_module:Initializing Padim model.
INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/resnet18.a1_in1k)
INFO:timm.models._hub:[timm/resnet18.a1_in1k] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.
INFO:timm.models._builder:Missing keys (fc.weight, fc.bias) discovered while loading pretrained weights. This is expected if model is being adapted.
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
INFO:src.anomalib.data.image.mvtec:Found the dataset.
/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py:182: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer

  | Name                  | Type                     | Params | Mode 
---------------------------------------------------------------------------
0 | model                 | PadimModel               | 2.8 M  | train
1 | _transform            | Compose                  | 0      | train
2 | normalization_metrics | MetricCollection         | 0      | train
3 | image_threshold       | F1AdaptiveThreshold      | 0      | train
4 | pixel_threshold       | F1AdaptiveThreshold      | 0      | train
5 | image_metrics         | AnomalibMetricCollection | 0      | train
6 | pixel_metrics         | AnomalibMetricCollection | 0      | train
---------------------------------------------------------------------------
2.8 M     Trainable params
0         Non-trainable params
2.8 M     Total params
11.131    Total estimated model params size (MB)
15        Modules in train mode
69        Modules in eval mode
/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=7` in the `DataLoader` to improve performance.
Epoch 0:   0%|                                                                                                                                                                                         | 0/9 [00:00<?, ?it/s]/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py:132: `training_step` returned `None`. If this was on purpose, ignore this warning...
Epoch 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:04<00:00,  2.18it/sINFO:src.anomalib.models.image.padim.lightning_model:Aggregating the embedding extracted from the training set.                                                                                         | 0/? [00:00<?, ?it/s]
INFO:src.anomalib.models.image.padim.lightning_model:Fitting a Gaussian to the embedding collected from the training set.
Epoch 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:37<00:00,  0.24it/s, pixel_AUPRO=0.830]`Trainer.fit` stopped: `max_epochs=1` reached.                                                                                                                                                                               
Epoch 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:37<00:00,  0.24it/s, pixel_AUPRO=0.830]
INFO:anomalib.callbacks.timer:Training took 37.44 seconds
/Users/marietteschonfeld/anomalib/src/anomalib/engine/engine.py:391: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(ckpt_path, map_location=model.device)
INFO:src.anomalib.data.image.mvtec:Found the dataset.
Restoring states from the checkpoint path at /Users/marietteschonfeld/anomalib/results/Padim/MVTec/grid/v2/weights/lightning/model.ckpt
INFO:anomalib.models.components.base.anomaly_module:Loading AUPRO metrics from state dict
Traceback (most recent call last):
  File "/Users/marietteschonfeld/anomalib/anomalib_runner.py", line 22, in <module>
    test_results = engine.test(
  File "/Users/marietteschonfeld/anomalib/src/anomalib/engine/engine.py", line 696, in test
    return self.trainer.test(model, dataloaders, ckpt_path, verbose, datamodule)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 748, in test
    return call._call_and_handle_interrupt(
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 47, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 788, in _test_impl
    results = self._run(model, ckpt_path=ckpt_path)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 950, in _run
    self._checkpoint_connector._restore_modules_and_callbacks(ckpt_path)
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 398, in _restore_modules_and_callbacks
    self.restore_model()
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py", line 275, in restore_model
    self.trainer.strategy.load_model_state_dict(
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 371, in load_model_state_dict
    self.lightning_module.load_state_dict(checkpoint["state_dict"], strict=strict)
  File "/Users/marietteschonfeld/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 185, in load_state_dict
    self._load_metrics(state_dict)
  File "/Users/marietteschonfeld/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 191, in _load_metrics
    self._add_metrics("pixel", state_dict)
  File "/Users/marietteschonfeld/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 215, in _add_metrics
    metrics.add_metrics(metrics_cls())
  File "/opt/anaconda3/envs/anomalib_env_aupro/lib/python3.10/site-packages/torchmetrics/collections.py", line 476, in add_metrics
    raise ValueError(f"Encountered two metrics both named {name}")
ValueError: Encountered two metrics both named AUPRO`

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: AUPRO metric in Anomalib engine throws torchmetric error #2424

[Bug]: AUPRO metric in Anomalib engine throws torchmetric error #2424

marietteschonfeld commented Nov 19, 2024 •

edited

Loading

[Bug]: AUPRO metric in Anomalib engine throws torchmetric error #2424

[Bug]: AUPRO metric in Anomalib engine throws torchmetric error #2424

Comments

marietteschonfeld commented Nov 19, 2024 • edited Loading

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

marietteschonfeld commented Nov 19, 2024 •

edited

Loading