Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARIMA] Invalid memory accesses, integer overflows #937

Closed
filipcacky opened this issue Nov 13, 2024 · 3 comments · Fixed by #939
Closed

[ARIMA] Invalid memory accesses, integer overflows #937

filipcacky opened this issue Nov 13, 2024 · 3 comments · Fixed by #939
Labels

Comments

@filipcacky
Copy link
Contributor

filipcacky commented Nov 13, 2024

What happened

Fitting a SARIMA model with long seasonality and include_mean==True may cause invalid memory accesses and iterator overflows.

The void getQ0(const py::array_t<double> phiv, const py::array_t<double> thetav, py::array_t<double> resv) (and all other c++ functions) uses int as an iterator, std::vector<T>::size() is always downcasted to int.

The function creates a vector of size==nrbar.

  int r = std::max(p, q + 1);
  int np = r * (r + 1) / 2;
  int nrbar = np * (np - 1) / 2;

This puts a limit on seasonal_length to about ~330, after which the integer overflows, making all code after UB. If it overflows to negative, an exception is called, if it overflows to positive, the code will eventually segfault on invalid accesses in inclu2, which are not checked.

I recommend removing all of the static_cast<int>(vector::size()), using size_t instead on all index/iterator variables.

Using std containers instead of raw pointers, example:

void inclu2(int np, const double *xnext, double *xrow, double ynext, double *d, double *rbar, double *thetab);

Should really be:

void inclu2(const std::vector<double>& xnext, std::vector<double>& xrow, double ynext, std::span<double> d, std::vector<double>& rbar, std::vector<double>& thetab) {
  const size_t np = xnext.size();
  ASSERT(np == xrow.size());
  ...
}

While std::vector::operator [] doesn't do bounds checking, most library implementations implement it via std::vector::at in debug builds, which does. This would have no negative performance implications.

Also, is there any way the space complexity for getQ0 could be reduced? ~O(n**4) seems a bit excessive, I'm not really familiar with the algos used.

I would be happy to help with this myself 🙂 Although I'm a bit strapped for time, so it might take a while.

Versions / Dependencies

Any statsforecast version after commit 4bd4364 feat: migrate arima to c++ (#895)

Reproducible example

import numpy as np
from statsforecast.models import ARIMA

model = ARIMA(
    order=(1, 0, 0),
    seasonal_order=(1, 0, 0),
    season_length=660,
    include_mean=True,
)

model.fit(np.zeros(12000))

Issue Severity

Low: It annoys or frustrates me.

@filipcacky filipcacky added the bug label Nov 13, 2024
@jmoralez
Copy link
Member

Hey, thanks a lot for the detailed report and sorry for the troubles. Initially I was using ctypes so everything was c arrays and when I changed to pybind11 I didn't properly update everything, I'll take some time to clean up this code.

@filipcacky
Copy link
Contributor Author

filipcacky commented Nov 14, 2024

@jmoralez Hey, i actually got a bit of time to take a look at it and refactored the code a bit :) Can't help with the O(n**4) situation though.

@jmoralez
Copy link
Member

Awesome, thanks a lot! I'll review it shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants