pySpark parallel processing for cross_validation using external regressor #681
Replies: 2 comments
-
Hey. This was fixed in #638 and will be in the next release. If you're able to install from the main branch that'd fix it as well. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently I am using a pyspark dataframe which has these columns
sample_train_data = ["unique_id", "ds", "y", "external_reg_1"]
I am creating a Statsforecast model with 5 different models, with n_jobs=-1, freq='W'
I am passing the above train_data as a spark dataframe to the cross_validation function.
sdf = spark.createDataFrame(sample_train_data)
cv = model.cross_validation( df = sdf, h = 4, step_size = 1, n_windows = 3)
cv.display() ## This throws the following error -
PythonException: 'KeyError : "['external_reg_1']" not in index"'
Please tell me if there are any other way to solve this problem. It is working fine in pandas but I have a huge time series data in the order of millions. Also, I am restricted to using PySpark only.
Beta Was this translation helpful? Give feedback.
All reactions