pySpark parallel processing for cross_validation using external regressor #681

mondalankur · 2023-10-30T15:04:36Z

mondalankur
Oct 30, 2023

Currently I am using a pyspark dataframe which has these columns
sample_train_data = ["unique_id", "ds", "y", "external_reg_1"]

I am creating a Statsforecast model with 5 different models, with n_jobs=-1, freq='W'

I am passing the above train_data as a spark dataframe to the cross_validation function.

sdf = spark.createDataFrame(sample_train_data)

cv = model.cross_validation( df = sdf, h = 4, step_size = 1, n_windows = 3)
cv.display() ## This throws the following error -

PythonException: 'KeyError : "['external_reg_1']" not in index"'

Please tell me if there are any other way to solve this problem. It is working fine in pandas but I have a huge time series data in the order of millions. Also, I am restricted to using PySpark only.

jmoralez · 2023-11-01T18:01:16Z

jmoralez
Nov 1, 2023
Maintainer

Hey. This was fixed in #638 and will be in the next release. If you're able to install from the main branch that'd fix it as well.

0 replies

mondalankur · 2023-11-03T12:09:44Z

mondalankur
Nov 3, 2023
Author

Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pySpark parallel processing for cross_validation using external regressor #681

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

pySpark parallel processing for cross_validation using external regressor #681

mondalankur Oct 30, 2023

Replies: 2 comments

jmoralez Nov 1, 2023 Maintainer

mondalankur Nov 3, 2023 Author

mondalankur
Oct 30, 2023

jmoralez
Nov 1, 2023
Maintainer

mondalankur
Nov 3, 2023
Author