Build Status | Coverage |
---|---|
The UnobservedComponentsGAS.jl is an innovative open-source package developed entirely in Julia. This package facilitates the modeling of time series data by enabling users to define a diverse range of score-driven models with customizable parameter dynamics defined by unobserved components, akin to the State Space model. By representing these models as mathematical programming problems, this package leverages state-of-the-art optimization techniques and incorporates features from the field of optimization into the model formulation. This integration enhances predictive performance, incorporating robustness techniques for more reliable results.
- Score-driven models as optimization problems
- Implemented dynamics
- Robustness techniques
- Illustrative example
- Next steps
- References
As noted earlier, this package introduces a novel approach to model representation. The formulation of a score-driven model as an optimization problem is elegantly straightforward and relatively easy to grasp. Its core concept entails defining both fixed and time-varying parameters, along with components, as decision variables within the optimization framework. The temporal dynamics and any potential constraints on the fixed parameters' domain are enforced through constraints integrated into the problem's formulation. Lastly, an objective function is introduced to minimize the negative logarithm of the likelihood of the predictive distribution, a crucial step given that score-driven models are typically estimated using maximum likelihood estimation.
For enhanced clarity, let's denote
It is essential to emphasize that in the equation above, the scaled score
At this juncture, an important question arises: while the conceptual representation of the model as an optimization problem is straightforward, is the same true for its implementation? The answer to this question is "Yes," and the justification lies in one of the primary reasons for opting to develop this package using the Julia language.
In addition to providing a syntax that facilitates efficient development while maintaining computational performance, Julia offers a powerful toolset for tackling optimization challenges. JuMP.jl,, an open-source modeling language, simplifies the formulation of various optimization problems, guiding the model to a designated solver and presenting results in a user-friendly format. It is crucial to highlight that JuMP acts as a bridge between user specifications and solvers. By selecting an appropriate solver, JuMP empowers users to handle diverse classes of optimization problems, including linear, mixed-integer, semidefinite, nonlinear, and more. The proposed package utilizes JuMP's modeling language to formulate various score-driven models as optimization problems, similar to what is demonstrated in Equation (1).
As mentioned earlier, UnobservedComponentsGAS.jl enables users to specify the dynamics of the time-varying parameters by exploring various formulations of the latent components. Below, we will outline the components that can be included and the different versions implemented for each of them.
-
- Random Walk;
- Random Walk + Slope;
- AR(1);
-
- Stochastic seasonality via trigonometric terms;
- Deterministic seasonality via trigonometric terms;
Given these components and the option to consider explanatory variables, the package facilitates the definition of various models, rendering it applicable across different scenarios.
One of the primary advantages of formulating a model as a mathematical programming problem is the ability to integrate the model's formulations with various techniques from the optimization field. Among these techniques, robustness techniques emerge as an intriguing approach to enhance model performance. In this regard, this package enables the inclusion of two distinct robust techniques directly in the model formulation. These techniques will be briefly discussed below.
This feature builds upon the work of Bertsimas & Paskov (2020), who successfully enhanced the robustness of an AR(p) model against regime changes in the time series. These regime changes manifest as disruptions in the typical patterns of the series and can stem from events such as economic crises and pandemics, which typically significantly impact model performance. Therefore, possessing a model capable of performing satisfactorily even during these periods is highly relevant.
Before delving into discussing how this feature works, it is necessary to understand how a generic time series model can be expressed as an optimization problem. Equation (2) demonstrates this formulation.
In this equation,
The primary concept behind this technique is to employ optimization to identify the worst sub-sample of the data according to certain criteria and then estimate the model to be robust against all potential worst sub-samples. In this context, the criterion used is the worst sub-sample of length
The concept behind Equation (3) is to train the model using the observed values that maximize the error. In other words, for a given model with optimized parameters, the inner maximization problem selects the sub-sample that yields the highest error for this model. This selection is governed by the variable
While this formulation clarifies the intuition behind this technique, it does not lend itself to an efficient resolution of the problem. To address this challenge, a series of manipulations involving duality must be undertaken in the problem depicted in Equation (3). Further details regarding these manipulations can be found in Bertsimas & Paskov (2020). Here, only the final formulation of the problem will be presented.
The
However, there is no consensus in the literature regarding an appropriate upper bound for the
Given that the objective here does not entail selecting which components should be treated as stochastic or deterministic, the regularization term adopts an
Despite the potential advantages of this feature, determining the penalty factor
In this formulation,
It is important to highlight that both discussed robustness methodologies can be combined in the same model. This capability enables the definition of robust models across various aspects. To illustrate how both robustness features can be applied simultaneously, let's consider the model defined in Equation (1). Equation (6) illustrates the formulation of a model that simultaneously employs both the sample robustness method and regularization as a component variance controller.
This section aims to illustrate how this package can be used for time series modeling. To accomplish this, the first step is to correctly install the package, which can be easily done by using the following code.
import Pkg;
Pkg.add(url="https://github.com/LAMPSPUC/UnobservedComponentsGAS.git")
using UnobservedComponentsGAS
To carry out this example, we will consider sales data for new single-family homes in the US. Let's assume the data can be loaded using the code below.
Pkg.add("CSV") #if not already installed
Pkg.add("DataFrames") #if not already installed
using CSV, DataFrames
data = CSV.read("Data/hsales.csv", DataFrame)
y_train = data[1:end-12, 2]
y_val = data[end-11:end, 2]
dates_train = data[1:end-12, 1]
dates_val = data[end-11:end, 1]
Note that the preceding code also partitions the data into training and validation sets.
At this point, the next step is to define the model that will be estimated. The following code demonstrates how a score-driven model based on a t-LocationScale distribution, with only the mean parameter being time-varying, following a random walk plus a deterministic seasonality dynamic, can be implemented. Note that the scale parameter
dist = UnobservedComponentsGAS.tLocationScaleDistribution();
time_varying_params = [true, false, false];
d = 1.0;
level = ["random walk", "", ""];
seasonality = ["deterministic 12", "", ""];
ar = missing
sample_robustness = false;
model = UnobservedComponentsGAS.GASModel(dist, time_varying_params, d, level, seasonality, ar)
Once specified, you can initiate the optimization process to actually estimate the model. This is accomplished using the fit function, which takes as arguments the previously defined model (GASModel object) and the estimation data. Furthermore, you'll notice the use of arguments
fitted_model = UnobservedComponentsGAS.fit(model, y_train; α = 0.0, robust = sample_robustness, initial_values = missing);
After completing the model estimation process, you can access some of its results using the codes below. It is worth noting that these results should be used to evaluate the adequacy of the defined model, which is a crucial step in time series modeling.
fitted_model.fit_in_sample
fitted_model.fitted_params
fitted_model.components["param_1"]["seasonality"] #access the seasonality component of the mean parameter
fitted_model.residuals
Finally, once the adequacy of the model has been checked, the last step involves using it to make predictions. The predict function handles this task and accepts arguments such as the defined and estimated model, training data, the number of steps forward for prediction, and the number of scenarios considered in the simulation. It is important to highlight that this function provides both point and probabilistic predictions, allowing users to specify the intervals of interest through their desired confidence level using the argument probabilistic_intervals.
steps_ahead = 12;
num_scenarious = 500;
forec = UnobservedComponentsGAS.predict(model, fitted_model, y_train, steps_ahead, num_scenarious; probabilistic_intervals = [0.8, 0.95])
From the figure above, it is evident that the estimated model successfully captured the dynamics of the series during the training period and produced predictions, both point estimates and intervals, that closely align with the observed data.
Please refer to the folder examples in order to dive deeply into the package features.
- Broaden the range of available distributions, encompassing both continuous and discrete options.
- Expand the scope of covered dynamics, offering more possibilities.
- Enable the variation of effects from explanatory variables over time.
- Assess and incorporate new extensions associated with the field of optimization.
- BERTSIMAS, D.; PASKOV, I. Time series that are robust to regime changes. Journal of Machine Learning Research, v. 21, p. 1–31, 2020.