Skip to content

Commit

Permalink
Merge pull request #29 from Nixtla/refactor/generate_output_dates
Browse files Browse the repository at this point in the history
feat: vectorized date generation
  • Loading branch information
MMenchero authored Oct 10, 2024
2 parents 9c6522c + 492bf95 commit 007ea63
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 23 deletions.
37 changes: 19 additions & 18 deletions R/generate_output_dates.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,15 @@
#' dates_df <- .generate_output_dates(df_info, freq, h)
#' }
#'
.generate_output_dates <- function(df_info, freq, h){

new_dates <- vector("list", nrow(df_info))
r_freq <- .r_frequency(freq)

for(i in 1:nrow(df_info)){
.generate_output_dates <- function(df_info, freq, h) {
new_dates <- lapply(1:nrow(df_info), function(i) {
start_date <- df_info$dates[i]
r_freq <- .r_frequency(freq)

if(freq %in% c("QE", "Q")){
dt <- seq(from = start_date, by = r_freq, length.out = h+1)
month <- lubridate::month(start_date)
if (freq %in% c("QE", "Q")) {
# End of quarter dates are: "YYYY-03-31", "YYYY-06-30", "YYYY-09-30", "YYYY-12-31".
dt <- seq(from = start_date, by = "quarter", length.out = h+1)
month <- lubridate::month(start_date)

# Calendar adjustments
if (month %in% c(3, 12)) {
Expand All @@ -44,21 +41,25 @@
}
}
}

}else if(freq %in% c("ME", "M")){
start_date <- start_date+lubridate::days(1)
dt <- seq(from = start_date, by = r_freq, length.out = h+1)-lubridate::days(1)
}else{
dt <- seq(df_info$dates[i], by = r_freq, length.out = h+1)
dt <- format(dt, "%Y-%m-%d")
} else if (freq %in% c("ME", "M")) {
dt <- seq(from = start_date + lubridate::days(1), by = r_freq, length.out = h+1) - lubridate::days(1)
dt <- format(dt, "%Y-%m-%d")
} else if(freq %in% c("B")){
dt <- seq(from = start_date, by = "day", length.out = h+1+ceiling(h/5)*2+2) # ceiling(h/5)*2+2 ~ number of weeks*2 days (Saturday and Sunday) plus an extra weekend to be on the safe side
dt <- dt[!weekdays(dt) %in% c("Saturday", "Sunday")]
dt <- format(dt, "%Y-%m-%d")
} else {
dt <- seq(from = start_date, by = r_freq, length.out = h+1)
}

new_dates[[i]] <- dt[2:length(dt)]
}
dt[2:(h+1)]
})

dates_df <- data.frame(lapply(new_dates, as.POSIXct))

ids <- df_info$unique_id
if(inherits(df_info$unique_id, "numeric") | inherits(df_info$unique_id, "integer")){
if (inherits(df_info$unique_id, "numeric") | inherits(df_info$unique_id, "integer")) {
ids <- as.character(df_info$unique_id)
}
names(dates_df) <- ids
Expand Down
8 changes: 3 additions & 5 deletions vignettes/special-topics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,9 @@ The frequency parameter is crucial when working with time series data because it
| Hourly | h |
| Minute-level | min |
| Second-level | s |
| Business day | B |

In this table, QS and MS stand for quarter and month start, while QE and ME stand for quarter and month end. Hourly and subhourly frequencies can be preceded by an integer, such as "6h", "10min" or "30s". **Only the aliases "min" and "S" are allowed for minute and second-level frequencies**.
In this table, QS and MS stand for quarter and month start, while QE and ME stand for quarter and month end. Hourly and subhourly frequencies can be preceded by an integer, such as "6h", "10min" or "30s". **Only the aliases "min" and "S" are allowed for minute and second-level frequencies**.

The default value of the frequency parameter is `NULL`. When this parameter is not specified, `nixtlar` will attempt to determine the frequency of your data.

Expand All @@ -89,10 +90,7 @@ fcst <- nixtlar::nixtla_client_forecast(df, h = 8, level = c(80,95)) # freq = "h
# infer the frequency when `freq` is not specified
```

If you are dealing with irregular frequencies, such as business days or custom holiday calendars, you must specify them directly. For instance, for business days, you should set `freq="B"`, which corresponds to the pandas alias for business day frequency. Please refer to [pandas's offset aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases) for more information.

When dealing with weekly frequency (`W`), `nixtlar` assumes that the weeks start on Sunday. Consequently, it will return dates corresponding to weeks that begin on Sundays. If your weeks start on a different day, for example, Mondays, you should specify the frequency as `W-MON`. You can select any day of the week with the aliases `W-MON`, `W-TUE`, `W-WED`, `W-THU`, `W-FRI`, and `W-SAT`.

**Currently, `nixtlar` can't infer business day frequency, so you must set it directly using `freq="B"`.**

```{r, include=FALSE}
options(original_options)
Expand Down

0 comments on commit 007ea63

Please sign in to comment.