Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DeepAR] Model training error when using deepAR #837

Closed
Layonhuuu opened this issue Dec 8, 2023 · 16 comments
Closed

[DeepAR] Model training error when using deepAR #837

Layonhuuu opened this issue Dec 8, 2023 · 16 comments
Labels

Comments

@Layonhuuu
Copy link

Layonhuuu commented Dec 8, 2023

What happened + What you expected to happen

When using probability prediction, I reviewed the tutorial documentation for the corresponding module and successfully ran the code in the tutorial using the NHITS model. Then I want to try using the DeepAR model on my own data, but the model encountered an error.
My code is as follows:

data = pd.read_csv("test9.csv")
data['ds'] = pd.to_datetime(data['ds'], unit='s')
# data shape : [561999 rows x 17 columns]
data.head(4)
#    unique_id                  ds         y  ...      square_root_amplitudes     abs_means     rmss
# 0          1 1970-01-01 00:00:01 -0.057770  ...  0.304688  0.247314  2.43750
# 1          1 1970-01-01 00:00:02 -0.052124  ...  0.304932  0.247681  2.43945
# 2          1 1970-01-01 00:00:03 -0.013611  ...  0.304443  0.247070  2.43555
# 3          1 1970-01-01 00:00:04  0.005581  ...  0.300781  0.241943  2.40625


horizon =24  
models = [DeepAR(input_size = 2, 
                               h = horizon, 
                               futr_exog_list = ['trend', 'season', 'resid','V',
                                  'mean_values','variances','max_values','min_values','peak_to_peaks','kurtosiss',
                                  'skewnesss','square_root_amplitudes','abs_means ','rmss'
                                 ], 
                               hist_exog_list = None, 
                               stat_exog_list = None, 
                 lstm_n_layers = 2,
                 lstm_hidden_size = 128,
                 lstm_dropout = 0.1,
                 scaler_type = 'standard',
                 random_seed = 1,
                ), 
          ]
nf = NeuralForecast(models=models, freq='s')
nf.fit(df = data,
       val_size = 2048
       )

Versions / Dependencies

neuralforecast version = 1.6.4

Python Version = 3.9.18

Reproduction script

ERROR is as follows:

  File f:\postguaduate\vibration\various model test\nixtla\2023-12-8-1-deepar-problem.py:39
    nf.fit(df = data,

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\neuralforecast\core.py:274 in fit
    model.fit(self.dataset, val_size=val_size)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\neuralforecast\common\_base_windows.py:734 in fit
    trainer.fit(self, datamodule=datamodule)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\trainer.py:544 in fit
    call._call_and_handle_interrupt(

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\call.py:44 in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\trainer.py:580 in _fit_impl
    self._run(model, ckpt_path=ckpt_path)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\trainer.py:989 in _run
    results = self._run_stage()

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\trainer.py:1035 in _run_stage
    self.fit_loop.run()

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\fit_loop.py:202 in run
    self.advance()

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\fit_loop.py:359 in advance
    self.epoch_loop.run(self._data_fetcher)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\training_epoch_loop.py:136 in run
    self.advance(data_fetcher)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\training_epoch_loop.py:240 in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\optimization\automatic.py:187 in run
    self._optimizer_step(batch_idx, closure)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\optimization\automatic.py:265 in _optimizer_step
    call._call_lightning_module_hook(

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\call.py:157 in _call_lightning_module_hook
    output = fn(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\core\module.py:1282 in optimizer_step
    optimizer.step(closure=optimizer_closure)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\core\optimizer.py:151 in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\strategies\strategy.py:230 in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\plugins\precision\precision.py:117 in optimizer_step
    return optimizer.step(closure=closure, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\optim\lr_scheduler.py:68 in wrapper
    return wrapped(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\optim\optimizer.py:373 in wrapper
    out = func(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\optim\optimizer.py:76 in _use_grad
    ret = func(self, *args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\optim\adam.py:143 in step
    loss = closure()

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\plugins\precision\precision.py:104 in _wrap_closure
    closure_result = closure()

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\optimization\automatic.py:140 in __call__
    self._result = self.closure(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context
    return func(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\optimization\automatic.py:126 in closure
    step_output = self._step_fn()

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\loops\optimization\automatic.py:315 in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\trainer\call.py:309 in _call_strategy_hook
    output = fn(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\pytorch_lightning\strategies\strategy.py:382 in training_step
    return self.lightning_module.training_step(*args, **kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\neuralforecast\models\deepar.py:276 in training_step
    loss = self.loss(y=outsample_y, distr_args=distr_args, mask=mask)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\neuralforecast\losses\pytorch.py:1110 in __call__
    distr = self.get_distribution(distr_args=distr_args, **self.distribution_kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\neuralforecast\losses\pytorch.py:1033 in get_distribution
    distr = self._base_distribution(*distr_args, **distribution_kwargs)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\distributions\studentT.py:61 in __init__
    self._chi2 = Chi2(self.df)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\distributions\chi2.py:25 in __init__
    super().__init__(0.5 * df, 0.5, validate_args=validate_args)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\distributions\gamma.py:58 in __init__
    super().__init__(batch_shape, validate_args=validate_args)

  File D:\Anaconda3\Conda\envs\nixtla\lib\site-packages\torch\distributions\distribution.py:68 in __init__
    raise ValueError(

ValueError: Expected parameter df (Tensor of shape (1024, 1)) of distribution Chi2() to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
...,
        [nan],], device='cuda:0', grad_fn=<MulBackward0>)

Issue Severity

High: It blocks me from completing my task.

@Layonhuuu Layonhuuu added the bug label Dec 8, 2023
@jmoralez
Copy link
Member

jmoralez commented Dec 8, 2023

Hey @Layonhuuu, thanks for using neuralforecast. Can you verify if you have missing values in your target? e.g. data['y'].isnull().sum() should yield 0.

@Layonhuuu
Copy link
Author

Thank you very much for your reply. According to your example, the code runs as follows:
In [4]: data['y'].isnull().sum()
Out[4]: 0

@Layonhuuu
Copy link
Author

Will it have an impact if all unique_id in my data are 1?

@Layonhuuu
Copy link
Author

I found that there is a problem with one of the columns in the exogenous variable, and the problem has been resolved. Thank you very much for your reply!

@cchallu cchallu closed this as completed Dec 11, 2023
@jmoralez
Copy link
Member

@Layonhuuu so the problem missing values in the exogenous features?

@Layonhuuu
Copy link
Author

Sorry to bother you again. Recently, I tried the tutorial about Detect Demand Peaks and found that when the code ran, it gave the following error message, causing Python to crash directly.
the code :
import pandas as pd
from neuralforecast.core import NeuralForecast
from neuralforecast.auto import AutoNHITS
import matplotlib.pyplot as plt
Y_df = pd.read_csv('ERCOT-clean.csv', parse_dates=['ds'])
Y_df = Y_df.query("ds >= '2022-01-01' & ds <= '2022-10-01'")
Y_df.plot(x='ds', y='y', figsize=(20, 7))
models = [AutoNHITS(h=24,
config=None, # Uses default config
num_samples=10
)
]
nf = NeuralForecast(
models=models,
freq='H', )

crossvalidation_df = nf.cross_validation(
df=Y_df,
step_size=24,
n_windows=30 )

the error:
crossvalidation_df = nf.cross_validation(
df=Y_df,
step_size=24,
n_windows=30
)
2023-12-13 16:00:26,180 INFO worker.py:1673 -- Started a local Ray instance.

[symbolize_win32.inc : 53] RAW: SymInitialize() failed: 87

Fatal Python error: Aborted

Thread 0x000036ac (most recent call first):
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\zmq\utils\garbage.py", line 47 in run
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\threading.py", line 980 in _bootstrap_inner
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\threading.py", line 937 in _bootstrap

Main thread:
Current thread 0x000011f4 (most recent call first):
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray_private\worker.py", line 2284 in connect
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray_private\worker.py", line 1675 in init
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray_private\client_mode_hook.py", line 103 in wrapper
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray\tune\tune.py", line 219 in _ray_auto_init
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray\tune\tune.py", line 511 in run
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray\tune\impl\tuner_internal.py", line 645 in _fit_internal
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray\tune\impl\tuner_internal.py", line 526 in fit
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\ray\tune\tuner.py", line 364 in fit
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\neuralforecast\common_base_auto.py", line 259 in _tune_model
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\neuralforecast\common_base_auto.py", line 361 in fit
File "D:\Anaconda3\Conda\envs\nixtla_auto\lib\site-packages\neuralforecast\core.py", line 520 in cross_validation
File "C:\Users\Windows 10\AppData\Local\Temp\ipykernel_17572\512784536.py", line 1 in

Restarting kernel...

@Layonhuuu
Copy link
Author

@Layonhuuu so the problem missing values in the exogenous features?
I tested the column of exogenous variables according to your method and found that it still gave 0. I found it by checking each column of exogenous variables one by one.

@jmoralez
Copy link
Member

Seems to be a problem with ray. Can you try setting:

models = [AutoNHITS(h=24,
config=None, # Uses default config
num_samples=10,
backend='optuna',
)
]

@Layonhuuu
Copy link
Author

There's no problem following your method, what's the reason? Could you please let me know?

@jmoralez
Copy link
Member

Specifying backend='optuna' performs the search with optuna, so ray doesn't get used and that's why it doesn't crash.

@Layonhuuu
Copy link
Author

Sorry to bother you again.QAQ
When using AutodeepAR, the console can output the train at the end of each track as follows_ Loss_ Epoch and valid_ Loss:
[I 2023-12-19 19:44:40,883] Trial 9 finished with value: 0.008946074172854424 and parameters: {'lstm_hidden_size': 256, 'lstm_n_layers': 2, 'lstm_dropout': 0.44937084229182345, 'learning_rate': 0.011507805893968093, 'scaler_type': 'robust', 'max_steps': 400, 'batch_size': 16, 'windows_batch_size': 64, 'random_seed': 8, 'input_size': 8, 'step_size': 1}. Best is trial 8 with value: 0.0034219541121274233. Seed set to 11 Epoch 99: 100%|██████████| 1/1 [00:00<00:00, 28.57it/s, v_num=132, train_loss_step=-2.51, train_loss_epoch=-2.51] Validation: | | 0/? [00:00<?, ?it/s] Validation: 0%| | 0/1 [00:00<?, ?it/s] Validation DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s] Validation DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 3.82it/s] Epoch 199: 100%|██████████| 1/1 [00:00<00:00, 37.04it/s, v_num=132, train_loss_step=-2.65, train_loss_epoch=-2.79, valid_loss=0.00571] Validation: | | 0/? [00:00<?, ?it/s] Validation: 0%| | 0/1 [00:00<?, ?it/s] Validation DataLoader 0: 0%| | 0/1 [00:00<?, ?it/s] Validation DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 4.00it/s]
I would like to ask if Neuralforcast can implement the saving of each train during the training process_ Loss_ Epoch and valid_ Loss, so that I can draw the training process loss change curve for each trail. Can you provide a small example? I noticed that the tutorial document for Neuralforcast did not mention any chapters on related features.
I sincerely look forward to your reply, which will greatly benefit me

@Layonhuuu
Copy link
Author

The code is as follows:
`
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
from neuralforecast.core import NeuralForecast
from neuralforecast.auto import AutoDeepAR

data = pd.read_csv("IMF1.csv")
data['ds'] = pd.to_datetime(data['ds'], unit='s')

val_size = 64320
test_size = 64
320

horizon =1 # day-ahead daily forecast

models = [AutoDeepAR(h=horizon,num_samples=10,backend='optuna',)]
nf = NeuralForecast(models=models, freq='s')
Y_hat_df = nf.cross_validation(df=data, val_size=val_size,
test_size=test_size, n_windows=None)
`

@cchallu cchallu reopened this Dec 19, 2023
@jmoralez
Copy link
Member

jmoralez commented Dec 19, 2023

You should be able to do this with callbacks (introduced in #795), which will be on the next release. Or you can use them now by installing from github.

@Layonhuuu
Copy link
Author

I am greatly beholden to you for your reply!

@jmoralez
Copy link
Member

@Layonhuuu seems like we need a different approach for what you want. Can you please open a new issue asking for that (how to save the train and validation loss in auto models)? So that we can reply there and other people can find it in case they run into that issue as well.

@Layonhuuu
Copy link
Author

OK! My apologies. I have solved the previous problem by using Tensorboard. Thank you very much, but I encountered a difficult problem again. This time I will open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants