Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forecast/predict method #6

Open
EraylsonGaldino opened this issue Aug 31, 2020 · 3 comments
Open

forecast/predict method #6

EraylsonGaldino opened this issue Aug 31, 2020 · 3 comments

Comments

@EraylsonGaldino
Copy link

Why the forecast method need the target value?
mdl1.predict(x, y, step=3)

@jxx123
Copy link
Owner

jxx123 commented Oct 8, 2020

Historical target values (e.g. y(t-1), y(t-2), are used as features to predict the future target value. For example, if you build a
NARX(RandomForestRegressor(), auto_order=2, exog_order=[1], exog_delay=[0]), the predicted value y(t + 1) = f(y(t), y(t - 1), x(t)), the target value y is used to get y(t) and y(t - 1). It won't cheat to use any future values in the prediction. I know it looks a bit odd to input the target values, but it is necessary.

Please see this FAQ https://github.com/jxx123/fireTS#faq for more detailed explanation.

@neerajnj10
Copy link

neerajnj10 commented Dec 30, 2020

Hi @jxx123 , I still do not understand the concept of using the target variable in the predict function. At time (t), I want to make prediction/forecast for time (t+1),but it does not do that, it rather predicts for time (t) only using time t value and time (t-1,t-2 etc), which is not useful from timeseries forecast point of view, since at the present time I want to forecast for future, not what is happening at that time t itself.

for example, here is the output of the predict from the model, where the last timestamp of ypred is same as ytest, which makes sense, but what is confusing is that the prediction at time (t) is same ytest(t), not ytest(t+1) or ytest(t+6), this is doing what any supervised model will do, that is, "on-point" prediction at the time t, unless I convert the timeseries data to supervised learning format (which means shifting the target variable to selected lag period- x(t-1),x(t),y(t-1) as an input to the model predict for y(t+1), check here - https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/ ,), it does not makes sense to me.
Does the model expect the input and output to be prepared in format mentioned above in the link, if not, the example notebooks do not provide explanation for it, except the model parameters supposedly learning from lagged input:

2020-10-06 11:15:00 -0.002697
2020-10-06 11:30:00 -0.003967
2020-10-06 11:45:00 0.000830
2020-10-06 12:00:00 0.002199
2020-10-06 12:15:00 0.002574

essentially the prediction at 2020-10-06 11:15:00 should be equal to the actual value at 2020-10-06 11:30:00, and therefore prediction at 2020-10-06 12:15:00 should be for 2020-10-06 12:30:00, but that's not the case , and if it is not, then this is really NOT timeseries forecasting but simple supervised machine learning prediction, if we need to modify the data format to create lags, I am having hard time understanding it.

Can you provide more clarification, I am using this model from your example notebook using grid search-

tsmdl = NARX(auto_order=6, base_estimator=SVR(C=100, epsilon=0.015, gamma=0.003),exog_delay=[0, 0], exog_order=[3, 3])
tsmdl.fit(Xtrain, ytrain)

ypred = tsmdl.predict(Xtest, ytest, step=6)
ypred = pd.Series(ypred, index=ytest.index)

Thanks!

@jxx123
Copy link
Owner

jxx123 commented Jan 8, 2021

@neerajnj10 sorry for the late reply. The prediction is actually as what you expect, for example, if the prediction step is 6, it is predicting say y(10) based on y(4), y(3) (if the auto_order is 2). I just aligned the predicted value and the actual value nicely (so that it is easier to compute MSE score etc.), for example, the output yrped has the same shape as ytest, and ypred[10] is the predicted value for ytest[10]. Note that ypred[10] is only based on the ytest[4], ytest[3].

Since I aligned the prediction with the actual value, you will notice that the first pred_step + max(auto_order - 1, max(exog_order + exog_delay) - 1) values of the output yrped is NaN, because the first several steps of prediction are not available due to missing information.

For more details, see my documentation here https://firets.readthedocs.io/en/latest/models.html#models.NARX.predict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants