Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Auto] Training of Auto* models always fail with Expected a parent error on macOS #1203

Open
yarnabrina opened this issue Nov 17, 2024 · 4 comments
Labels

Comments

@yarnabrina
Copy link
Contributor

What happened + What you expected to happen

If I try to train a Auto* model on my office macbook, it always fails for each of the trials, and finally failed complaining that no trials succeeded.

Final Error

RuntimeError: No best trial found for the given metric: loss. This means that no trial has reported this metric, or all values reported for this metric are NaN. To not ignore NaN values, you can set the filter_nan_and_inf arg to False.

Before this, each individual trial report something like this:

(_train_tune pid=14003) /path/to/venv/lib/python3.10/site-packages/ray/tune/integration/pytorch_lightning.py:198: `ray.tune.integration.pytorch_lightning.TuneReportCallback` is deprecated. Use `ray.tune.integration.pytorch_lightning.TuneReportCheckpointCallback` instead.
(_train_tune pid=14003) Seed set to 4
2024-11-17 21:10:46,358    ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_3968b_00000
Traceback (most recent call last):
  File "/path/to/venv/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/path/to/venv/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/path/to/venv/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/path/to/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 2753, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/path/to/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 904, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=14003, ip=127.0.0.1, actor_id=6e8fabc6440a5f0593a2936d01000000, repr=_train_tune)
  File "/path/to/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 331, in train
    raise skipped from exception_cause(skipped)
  File "/path/to/venv/lib/python3.10/site-packages/ray/air/_internal/util.py", line 104, in run
    self._ret = self._target(*self._args, **self._kwargs)
  File "/path/to/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 45, in <lambda>
    training_func=lambda: self._trainable_func(self.config),
  File "/path/to/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 250, in _trainable_func
    output = fn()
  File "/path/to/venv/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 130, in inner
    return trainable(config, **fn_kwargs)
  File "/path/to/venv/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py", line 214, in _train_tune
    _ = self._fit_model(
  File "/path/to/venv/lib/python3.10/site-packages/neuralforecast/common/_base_auto.py", line 362, in _fit_model
    model = model.fit(
  File "/path/to/venv/lib/python3.10/site-packages/neuralforecast/common/_base_recurrent.py", line 535, in fit
    return self._fit(
  File "/path/to/venv/lib/python3.10/site-packages/neuralforecast/common/_base_model.py", line 355, in _fit
    trainer = pl.Trainer(**model.trainer_kwargs)
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 425, in __init__
    self._callback_connector.on_trainer_init(
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 79, in on_trainer_init
    _validate_callbacks_list(self.trainer.callbacks)
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in _validate_callbacks_list
    stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in <listcomp>
    stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)]
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/model_helpers.py", line 42, in is_overridden
    raise ValueError("Expected a parent")
ValueError: Expected a parent

This traceback corresponds to when I use AutoRNN as provided snippet below, but same error happens for AutoLSTM, AutoTFT, etc. as well. If I use non-Auto models like RNN or LSTM, they work fine. If I try the same snippet on my personal laptop running Ubuntu WSL on Windows 11, it works there but fails on macbook.

I expect the code snippet to work always.

Versions / Dependencies

Neural Forecast 1.7.5
Python 3.10.15
macOS Sequoia 15.1

Reproduction script

import pandas
from neuralforecast import NeuralForecast
from neuralforecast.auto import AutoRNN

actuals = pandas.DataFrame(
    {
        "product": [1, 1, 1, 1, 2, 2, 2, 2],
        "date": pandas.to_datetime(
            [
                "2024-01-01",
                "2024-01-02",
                "2024-01-03",
                "2024-01-04",
                "2024-01-01",
                "2024-01-02",
                "2024-01-03",
                "2024-01-04",
            ]
        ),
        "sales": [1, 2, 3, 4, 5, 6, 7, 8],
    }
)

model = NeuralForecast([AutoRNN(2, refit_with_val=True)], freq="D")

model.fit(df=actuals, val_size=2, id_col="product", time_col="date", target_col="sales")

Issue Severity

High: It blocks me from completing my task.

@yarnabrina yarnabrina added the bug label Nov 17, 2024
@marcopeix
Copy link
Contributor

Hello! Can you try in a clean environment with a fresh Pytorch installation and try again? On my end, I can't reproduce this error and I also run it on Mac, but that should fix it.

@yarnabrina
Copy link
Contributor Author

Observation 1

I created a fresh python 3.10 environment and ran the below command:

> python3 -m pip install -U neuralforecast --extra-index-url https://download.pytorch.org/whl/cpu

This installs numpy 2.1.3 and then the following code fails complaining that it is incompatible with numpy v2.

>>> from neuralforecast import NeuralForecast

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.1.3 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "<stdin>", line 1, in <module>
  File "/path/to/venv/lib/python3.10/site-packages/neuralforecast/__init__.py", line 3, in <module>
    from .core import NeuralForecast
  File "/path/to/venv/lib/python3.10/site-packages/neuralforecast/core.py", line 17, in <module>
    import pytorch_lightning as pl
  File "/path/to/venv/lib/python3.10/site-packages/pytorch_lightning/__init__.py", line 25, in <module>
    from lightning_fabric.utilities.seed import seed_everything  # noqa: E402
  File "/path/to/venv/lib/python3.10/site-packages/lightning_fabric/__init__.py", line 30, in <module>
    from lightning_fabric.fabric import Fabric  # noqa: E402
  File "/path/to/venv/lib/python3.10/site-packages/lightning_fabric/fabric.py", line 35, in <module>
    import torch
  File "/path/to/venv/lib/python3.10/site-packages/torch/__init__.py", line 1477, in <module>
    from .functional import *  # noqa: F403
  File "/path/to/venv/lib/python3.10/site-packages/torch/functional.py", line 9, in <module>
    import torch.nn.functional as F
  File "/path/to/venv/lib/python3.10/site-packages/torch/nn/__init__.py", line 1, in <module>
    from .modules import *  # noqa: F403
  File "/path/to/venv/lib/python3.10/site-packages/torch/nn/modules/__init__.py", line 35, in <module>
    from .transformer import TransformerEncoder, TransformerDecoder, \
  File "/path/to/venv/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in <module>
    device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
/path/to/venv/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
  device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
>>>

Observation 2

I deleted that environment and created a new one, and installed neuralforecast with additional constraint of numpy<2. In this, I could not reproduce the reported error.

Observation 3

Since it is not possible for me to have just neuralforecast and not other dependencies I need for my work, I proceeded to install those (sktime, xgboost, lightgbm, catboost, fastapi, etc.) in this fresh environment, and once again I faced the error.

I am using intel chip Macbook in case that matters.

@elephaint
Copy link
Contributor

How did you create the new environment?

I'd install just with pip install neuralforecast

@yarnabrina
Copy link
Contributor Author

I used venv.

python3.10 -m venv neuralforecast_auto_test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants