-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cutoff parameter of gtfs_traveltimes #75
Comments
The second of the above commits replaces the non-transparent library (gtfsrouter)
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
transfers <- gtfsrouter::gtfs_transfer_table (gtfs, network_times = FALSE)
transfers <- gtfs$transfers %>%
select(.data$from_stop_id, .data$to_stop_id, .data$transfer_type, .data$min_transfer_time) %>%
rbind(transfers) %>%
group_by(.data$from_stop_id, .data$to_stop_id) %>%
summarise(across(everything(), first)) %>% # take min_transfer_time from gtfs$transfers if present
data.table::data.table()
gtfs$transfers <- transfers
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
p <- c (5:9 / 10, 0.95, 0.99, 1) # values for new prop_stops parameter
res <- lapply (p, function (i) {
st <- system.time (
x <- gtfs_traveltimes (gtfs, from = from, start_time = start_time, prop_stops = i)
)
c (nrow (x), max (x$duration), st [3]) })
res <- do.call (rbind, res)
dat <- data.frame (prop_stops = p,
n_stations = res [, 1] / 1000,
max_duration = res [, 2] / 3600,
time = res [, 3])
scl <- max (dat$time) / as.integer (max (dat$max_duration))
library (ggplot2)
ggplot(dat, aes (x = prop_stops)) +
geom_point (aes (y = max_duration, colour = "max_duration"), size = 2) +
geom_line (aes (y = max_duration, linetype = "max_duration", colour = "max_duration"), size=1.1) +
geom_line (aes (y = time / scl, linetype = "time", colour = "time"), size=1.1) +
geom_point (aes (y = time / scl, colour = "time"), size = 2) +
scale_y_continuous (sec.axis = sec_axis (~.*scl, name = "Calculation time (seconds)")) +
labs (y = "Max. duration (hours)") +
scale_linetype_manual ("colour", values = c ("time" = "solid", "max_duration" = "solid")) +
theme (legend.title = element_blank (),
legend.background = element_blank (),
legend.position = c (0.1, 0.9)) Created on 2021-02-01 by the reprex package (v1.0.0) We could easily expose a function like this that could be used to help determine an appropriate value for imax <- which.max (diff (max_duration)) - 1L
dat$prop_stops [imax]
#>[1] 0.9 With that value applied to the data above giving: system.time (
x <- gtfs_traveltimes (gtfs, from = from, start_time = start_time, prop_stops = 0.9)
)
#> user system elapsed
#> 3.684 0.016 3.632
format (nrow (x), big.mark = ",")
#> [1] "35,870" Created on 2021-02-01 by the reprex package (v1.0.0) So that takes quite a bit longer than the default value for the previous |
sorry, I totally missed this issue! is there an advantage of using the proportion of all stops instead of fixed times? That would then e.g. also allow to check how many stops I can reach in what time during the night. Otherwise the |
no worries
In my opinion, yes. It reflects exactly how the algorithm itself traverses the timetable, so the control is much more direct, and possible via a single, fairly intuitive parameter. It is of course even more intuitive to have parameters directly control times, but that would need at least 3:
So you'd need at least 3 parameters, instead of one. But I guess even more importantly: with a single, sufficiently large, value of That said, it's obviously advantageous to have some more intuitive way to filter the result using temporal parameters for what is after all a set of travel times. Ideas? |
I see your point.
start and endtime are not returned by the |
... good question! Those values are currently not returned from the C++ routines back into R, but that shouldn't be too difficult, and I can definitely see the use of that. I'll give that a go as soon as I have a chance. |
...just one more thought on that. If I had a connection from A to B at 8AM with duration of 15 min and at 3PM with 10 min (same transfers). |
@AlexandraKapp the above commit starts the implementation of a |
That commit implements most of it, but still leaves the problem of when to stop scanning. Currently still has the library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.192'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
start_time_limits <- 8 * 3600 + c (0, 60) * 60
p <- 5:10 / 10
res <- lapply (p, function (i) {
s <- system.time (
x <- gtfs_traveltimes (gtfs,
from,
start_time_limits,
prop_stops = i)
)
c (nrow (x), s [3]) })
res <- do.call (rbind, res)
res <- data.frame (prop = p,
stops_reached = res [, 1],
calc_time = res [, 2])
par (mar = c (5, 4, 2, 4))
plot (res$prop, res$stops_reached, "l", lwd = 2,
xlab = "prop_stops", ylab = "stops reached")
par (new = TRUE)
plot (res$prop, res$calc_time, "l", lwd = 2, col = "gray",
yaxt = "n", xlab = "", ylab = "")
axis (side = 4)
mtext ("calculation time (s)", side = 4, line = 2)
legend ("topleft", lwd = 2, col = c ("black", "grey"), bty = "n",
legend = c ("stops reached", "calcualtion time")) Created on 2021-03-04 by the reprex package (v1.0.0) Note that the calculation times are around half of what they were above, simply because only a restricted set of initial |
Here it is in action: library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.195'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
start_time_limits <- 8 * 3600 + c (0, 60) * 60
# confirm that default max_traveltime of 1 hour works:
x <- gtfs_traveltimes (gtfs, from, start_time_limits)
dim (x)
#> [1] 9117 7
hms::hms (as.integer (max (x$duration)))
#> 01:00:00
tlims <- 1:12 * 60 * 60
res <- lapply (tlims, function (i) {
s <- system.time (
x <- gtfs_traveltimes (gtfs,
from,
start_time_limits,
max_traveltime = i)
)
c (nrow (x), s [3]) })
res <- do.call (rbind, res)
res <- data.frame (max_traveltime = tlims / 3600,
stops_reached = res [, 1],
calc_time = res [, 2])
par (mar = c (5, 4, 2, 4))
plot (res$max_traveltime, res$stops_reached, "l", lwd = 2,
xlab = "max traveltime (hours)", ylab = "stops reached")
par (new = TRUE)
plot (res$max_traveltime, res$calc_time, "l", lwd = 2, col = "gray",
yaxt = "n", xlab = "", ylab = "")
axis (side = 4)
mtext ("calculation time (s)", side = 4, line = 2)
legend ("topleft", lwd = 2, col = c ("black", "grey"), bty = "n",
legend = c ("stops reached", "calculation time")) Created on 2021-03-04 by the reprex package (v1.0.0) For the record here: the whole reason I implemented the Any further comments @AlexandraKapp? |
looks good! |
Following on from #57 and comment by @AlexandraKapp, setting
cutoff = 0
now takes way too long, and moreover produces this result:Created on 2021-02-01 by the reprex package (v1.0.0)
So the full scan (
cutoff = 0
) is scanning way too many lines of the timetable. (The number reached can be more than the total number reached in the timetable because of transfers to stations that aren't otherwise reachable from the timetable itself.) First thing to do will be to implement a stop clause that stops scanning the timetable as soon as all possible stations have been reached.The text was updated successfully, but these errors were encountered: