Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deviation from otp benchmark #57

Closed
AlexandraKapp opened this issue Nov 25, 2020 · 17 comments
Closed

deviation from otp benchmark #57

AlexandraKapp opened this issue Nov 25, 2020 · 17 comments

Comments

@AlexandraKapp
Copy link
Contributor

comparing transfers computed with OTP and gtfsrouter shows, that they are generally pretty close
( diff is the difference: OTP - gtfsrouter)

grafik

grafik

Examples I checked so far, where the gtfsrouter returned more transfers match the transfers returned by gtfs_route. I'll investigate them a little more tomorrow.

One example where the gtfsrouter returns to few transfers:
Hbf - S Feuerbachstr.: should be 1 transfer, but is 0. gtfs_route returns one transfer correctly.

library(gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.100'
gtfs <- extract_gtfs(file.path("~/03_GitHub/NetworkAnalysis/inst/data_/VBB/gtfs.zip"))
#> > Unzipping GTFS archivev Unzipped GTFS archive  
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
ttable <- gtfs_timetable(gtfs, day = "tuesday")
iso <- gtfs_isochrone(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 10*3600)
#> Loading required namespace: geodist
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
dplyr::filter(iso$mid_points, grepl("Feuerbachstr.", stop_name))
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.33241 ymin: 52.46358 xmax: 13.33241 ymax: 52.46358
#> geographic CRS: WGS 84
#>                  stop_name      stop_id departure  arrival duration transfers
#> 1 S Feuerbachstr. (Berlin) 060063101841  08:01:12 08:22:36 00:21:24         0
#>                    geometry
#> 1 POINT (13.33241 52.46358)
dplyr::filter(iso$end_points, grepl("Feuerbachstr.", stop_name))
#> Simple feature collection with 0 features and 6 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> geographic CRS: WGS 84
#> [1] stop_name stop_id   departure arrival   duration  transfers geometry 
#> <0 rows> (or 0-length row.names)

gtfs_route(ttable, "Berlin Hauptbahnhof", "S Feuerbachstr.", start_time = 8*3600)
#>    route_name         trip_name                        stop_name arrival_time
#> 1          S5 S Strausberg Nord          S+U Berlin Hauptbahnhof     08:00:30
#> 2          S5 S Strausberg Nord   S+U Friedrichstr. Bhf (Berlin)     08:03:06
#> 3          S1     S Wannsee Bhf   S+U Friedrichstr. Bhf (Berlin)     08:05:30
#> 4          S1     S Wannsee Bhf   S+U Brandenburger Tor (Berlin)     08:07:24
#> 5          S1     S Wannsee Bhf S+U Potsdamer Platz Bhf (Berlin)     08:09:24
#> 6          S1     S Wannsee Bhf      S Anhalter Bahnhof (Berlin)     08:11:30
#> 7          S1     S Wannsee Bhf     S+U Yorckstr. S1 U7 (Berlin)     08:14:42
#> 8          S1     S Wannsee Bhf  S Julius-Leber-Brücke (Berlin)     08:16:24
#> 9          S1     S Wannsee Bhf           S Schöneberg (Berlin)     08:18:18
#> 10         S1     S Wannsee Bhf             S Friedenau (Berlin)     08:20:18
#> 11         S1     S Wannsee Bhf         S Feuerbachstr. (Berlin)     08:22:06
#>    departure_time
#> 1        08:01:12
#> 2        08:03:54
#> 3        08:06:12
#> 4        08:07:54
#> 5        08:09:54
#> 6        08:12:18
#> 7        08:15:12
#> 8        08:16:54
#> 9        08:18:48
#> 10       08:20:48
#> 11       08:22:36

Created on 2020-11-25 by the reprex package (v0.3.0)

@mpadge
Copy link
Member

mpadge commented Nov 26, 2020

Thanks. This issue can serve to guide a necessary discussion at a somewhat higher level. The isochrone function constructs an intermediate object which holds all of the actual trips. I suspect it might be more useful to return this actual object, rather than the current list of midpoints. In the particular context you raise above, note the following:

library (gtfsrouter)
gtfs <- extract_gtfs ("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable (gtfs, day = "tuesday")

from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
start_stns <- station_name_to_ids (from, gtfs, FALSE)
isotrips <- get_isotrips (gtfs, start_stns, start_time, end_time = start_time + 2 * 3600)

to <- "S Feuerbachstr"
index <- which (vapply (isotrips$isotrips, function (i)
                        any (grepl (to, i$stop_name)), logical (1)))
ntransfers <- vapply (isotrips$isotrips [index], function (i) {
                          i <- na.omit (i)
                          j <- match (i$trip_id, unique (i$trip_id)) - 1
                          i$ntransfers <- c (0L, as.integer (cumsum (diff (sort (j)))))
                          i$ntransfers [grep (to, i$stop_name) [1]]
                  }, integer (1))
message ("Num. transfers to ", to, " = [", paste0 (ntransfers, collapse = ", "), "]")
#> Num. transfers to S Feuerbachstr = [2, 1, 1, 2]

Created on 2020-11-26 by the reprex package (v0.3.0)

The best connections to isochrone end points sometimes require two transfers to "S Feuerbachstr", rather than the minimal number of 1. These end up needing to extend further using the the service to "Berlin, Kielingerstr" (#144742415), which is boarded at "U Walther-Schreiber-Platz," and just passes through "S Feuerbachstr" on its way elsewhere.

That demonstrates that we ought not think that each "midpoint" in current way of representing results can or does represent a fixed number of transfers. Rather, they can represent a variable number depending on where services beyond any given midpoint end up going on their way to an "endpoint". And that in turn suggests to me that it might be better / more useful / more honest / more realistic to return as a final result the current output of the non-exported function, get_isotrips(), which is simply a list of the full trips towards each isochrone endpoint. Mid points can then be extracted however one desires. Thoughts?

@mpadge
Copy link
Member

mpadge commented Nov 26, 2020

Update: In contrast to your reprex above, I get this:

library (gtfsrouter)
gtfs <- extract_gtfs ("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable (gtfs, day = "tuesday")

from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600

to <- "S Feuerbachstr"
x <- gtfs_isochrone (gtfs, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 10 * 3600)
#> Loading required namespace: geodist
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.2
#> Maximum distance is > 100km. The 'cheap' measure is inaccurate over such
#> large distances, you'd likely be better using a different 'measure'.
x$mid_points [grep (to, x$mid_points$stop_name), ]
#> Simple feature collection with 4 features and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.33241 ymin: 52.46358 xmax: 13.33241 ymax: 52.46358
#> geographic CRS: WGS 84
#>                    stop_name      stop_id departure  arrival duration transfers
#> 466 S Feuerbachstr. (Berlin) 070101001754  08:00:12 08:27:00 00:26:48         2
#> 559 S Feuerbachstr. (Berlin) 060063101841  08:01:12 08:22:36 00:21:24         1
#> 767 S Feuerbachstr. (Berlin) 060063101841  08:01:12 08:22:36 00:21:24         1
#> 960 S Feuerbachstr. (Berlin) 070101001754  08:00:12 08:27:00 00:26:48         2
#>                      geometry
#> 466 POINT (13.33241 52.46358)
#> 559 POINT (13.33241 52.46358)
#> 767 POINT (13.33241 52.46358)
#> 960 POINT (13.33241 52.46358)

Created on 2020-11-26 by the reprex package (v0.3.0)

And the midpoint stations do indeed have all of the possible ways of getting there, including the range of possible transfers. I still think it's likely better to have full trips, because in current form if you were to see the output in this reprex, you'd likely immediately want to know how and why you get different numbers of transfers, but there is no way of extracting that once the midpoints have been collated together in that single object.

@mpadge
Copy link
Member

mpadge commented Nov 26, 2020

Closed to move discussion over to #58

@mpadge mpadge closed this as completed Nov 26, 2020
@AlexandraKapp
Copy link
Contributor Author

AlexandraKapp commented Nov 26, 2020

FYI: update on VBB Feed to October feed resulted in the same four trips for S Feuerbachstr..
Update from version 0.0.4.100 to 0.0.4.105 resulted in the same amount of transfers as you got. 👍

@AlexandraKapp
Copy link
Contributor Author

some more examples, where the gtfs_route function returns connections with less transfers (and shorter traveltime) than gtfs_isochrone. Not sure if this is handled by #58 - but just maybe it helps for testing:

library(gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.105'

gtfs <- extract_gtfs(file.path("C:/Users/AlexandraKapp/OneDrive - Mobility Institute Berlin/02_playground/traveltime_index/data/vbb_202010.zip"))
#> > Unzipping GTFS archivev Unzipped GTFS archive  
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds

#gtfs <- extract_gtfs(file.path("~/03_GitHub/NetworkAnalysis/inst/data_/VBB/gtfs.zip"))
gtfs$calendar[1,]
#>    service_id monday tuesday wednesday thursday friday saturday sunday
#> 1:          1      0       0         0        0      0        0      0
#>    start_date end_date
#> 1:   20201023 20201212

# compute further transfer options
transfers <- gtfsrouter::gtfs_transfer_table (gtfs, network_times = FALSE)
#> > Finding neighbouring services for each stop
#> Loading required namespace: geodist
#> Loading required namespace: pbapply
#> v Found neighbouring services for each stop
#> > Expanding to include in-place transfers
#> v Expanded to include in-place transfers
transfers <- gtfs$transfers %>%
  select(.data$from_stop_id, .data$to_stop_id, .data$transfer_type, .data$min_transfer_time) %>%
  rbind(transfers) %>%
  group_by(.data$from_stop_id, .data$to_stop_id) %>%
  summarise(across(everything(), first)) %>% # take min_transfer_time from gtfs$transfers if present
  data.table::data.table()
#> Error in gtfs$transfers %>% select(.data$from_stop_id, .data$to_stop_id, : could not find function "%>%"
gtfs$transfers <- transfers
ttable <- gtfs_timetable(gtfs, day = "tuesday")

iso <- gtfs_isochrone(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 10*3600)
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

to <- "Berlin, Goslarer Platz"
print(iso$mid_points [grep (to, iso$mid_points$stop_name), ])
#> Simple feature collection with 0 features and 6 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> geographic CRS: WGS 84
#> [1] stop_name stop_id   departure arrival   duration  transfers geometry 
#> <0 rows> (or 0-length row.names)
print(iso$end_points [grep (to, iso$end_points$stop_name), ])
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.31426 ymin: 52.52579 xmax: 13.31426 ymax: 52.52579
#> geographic CRS: WGS 84
#>                 stop_name      stop_id departure  arrival duration transfers
#> 23 Berlin, Goslarer Platz 070101001337  08:00:00 08:32:00 00:32:00         2
#>                     geometry
#> 23 POINT (13.31426 52.52579)
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>    route_name               trip_name                            stop_name
#> 1         245 S+U Zoologischer Garten              S+U Berlin Hauptbahnhof
#> 2         245 S+U Zoologischer Garten   Berlin, Lehrter Str./Invalidenstr.
#> 3         245 S+U Zoologischer Garten               Berlin, Lesser-Ury-Weg
#> 4         245 S+U Zoologischer Garten   Berlin, Alt-Moabit/Rathenower Str.
#> 5         245 S+U Zoologischer Garten                   Berlin, Spenerstr.
#> 6         245 S+U Zoologischer Garten         Berlin, Kirchstr./Alt-Moabit
#> 7         245 S+U Zoologischer Garten           Berlin, Kleiner Tiergarten
#> 8         245 S+U Zoologischer Garten U Turmstr. (Berlin) [Bus Alt-Moabit]
#> 9         M27   S+U Jungfernheide Bhf   U Turmstr. (Berlin) [Bus Turmstr.]
#> 10        M27   S+U Jungfernheide Bhf           Berlin, Rathaus Tiergarten
#> 11        M27   S+U Jungfernheide Bhf         Berlin, Turmstr./Beusselstr.
#> 12        M27   S+U Jungfernheide Bhf                 Berlin, Reuchlinstr.
#> 13        M27   S+U Jungfernheide Bhf         Berlin, Wiebestr./Huttenstr.
#> 14        M27   S+U Jungfernheide Bhf                   Berlin, Neues Ufer
#> 15        M27   S+U Jungfernheide Bhf               Berlin, Goslarer Platz
#>    arrival_time departure_time
#> 1      08:05:00       08:05:00
#> 2      08:07:00       08:07:00
#> 3      08:08:00       08:08:00
#> 4      08:10:00       08:10:00
#> 5      08:11:00       08:11:00
#> 6      08:12:00       08:12:00
#> 7      08:13:00       08:13:00
#> 8      08:14:00       08:14:00
#> 9      08:19:00       08:19:00
#> 10     08:21:00       08:21:00
#> 11     08:24:00       08:24:00
#> 12     08:25:00       08:25:00
#> 13     08:26:00       08:26:00
#> 14     08:27:00       08:27:00
#> 15     08:29:00       08:29:00

to <- "Berlin, Luisenplatz/Schloss Charlottenburg"
print(iso$mid_points [grep (to, iso$mid_points$stop_name), ])
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.29963 ymin: 52.51957 xmax: 13.29963 ymax: 52.51957
#> geographic CRS: WGS 84
#>                                      stop_name      stop_id departure  arrival
#> 253 Berlin, Luisenplatz/Schloss Charlottenburg 070101001138     28800 08:35:00
#>     duration transfers                  geometry
#> 253 00:35:00         3 POINT (13.29963 52.51957)
print(iso$end_points [grep (to, iso$end_points$stop_name), ])
#> Simple feature collection with 0 features and 6 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> geographic CRS: WGS 84
#> [1] stop_name stop_id   departure arrival   duration  transfers geometry 
#> <0 rows> (or 0-length row.names)
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>    route_name             trip_name                                   stop_name
#> 1          S9         S Spandau Bhf                     S+U Berlin Hauptbahnhof
#> 2          S9         S Spandau Bhf                         S Bellevue (Berlin)
#> 3          S9         S Spandau Bhf                       S Tiergarten (Berlin)
#> 4          S9         S Spandau Bhf        S+U Zoologischer Garten Bhf (Berlin)
#> 5         M45 Berlin, Johannesstift S+U Zoologischer Garten/Jebensstr. (Berlin)
#> 6         M45 Berlin, Johannesstift                          Berlin, Steinplatz
#> 7         M45 Berlin, Johannesstift               U Ernst-Reuter-Platz (Berlin)
#> 8         M45 Berlin, Johannesstift                           Berlin, Marchstr.
#> 9         M45 Berlin, Johannesstift         Berlin, Otto-Suhr-Allee/Leibnizstr.
#> 10        M45 Berlin, Johannesstift                        Berlin, Warburgzeile
#> 11        M45 Berlin, Johannesstift             U Richard-Wagner-Platz (Berlin)
#> 12        M45 Berlin, Johannesstift                        Berlin, Eosanderstr.
#> 13        M45 Berlin, Johannesstift  Berlin, Luisenplatz/Schloss Charlottenburg
#>    arrival_time departure_time
#> 1      07:59:36       08:00:12
#> 2      08:02:18       08:02:48
#> 3      08:04:24       08:04:54
#> 4      08:06:18       08:06:54
#> 5      08:12:00       08:12:00
#> 6      08:13:00       08:13:00
#> 7      08:15:00       08:15:00
#> 8      08:16:00       08:16:00
#> 9      08:17:00       08:17:00
#> 10     08:18:00       08:18:00
#> 11     08:19:00       08:19:00
#> 12     08:20:00       08:20:00
#> 13     08:21:00       08:21:00

to <- "S Adlershof"
print(iso$mid_points [grep (to, iso$mid_points$stop_name), ])
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.54055 ymin: 52.4351 xmax: 13.54055 ymax: 52.4351
#> geographic CRS: WGS 84
#>                 stop_name      stop_id departure  arrival duration transfers
#> 1185 S Adlershof (Berlin) 070101006330     28872 08:37:00 00:35:48         2
#>                      geometry
#> 1185 POINT (13.54055 52.4351)
print(iso$end_points [grep (to, iso$end_points$stop_name), ])
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.54055 ymin: 52.4351 xmax: 13.54055 ymax: 52.4351
#> geographic CRS: WGS 84
#>               stop_name      stop_id departure  arrival duration transfers
#> 82 S Adlershof (Berlin) 070101006268  08:01:12 08:52:00 00:50:48         2
#>                    geometry
#> 82 POINT (13.54055 52.4351)
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>    route_name         trip_name                               stop_name
#> 1          S5 S Strausberg Nord                 S+U Berlin Hauptbahnhof
#> 2          S5 S Strausberg Nord          S+U Friedrichstr. Bhf (Berlin)
#> 3          S5 S Strausberg Nord             S Hackescher Markt (Berlin)
#> 4          S5 S Strausberg Nord         S+U Alexanderplatz Bhf (Berlin)
#> 5          S5 S Strausberg Nord           S+U Jannowitzbrücke (Berlin)
#> 6          S5 S Strausberg Nord                   S Ostbahnhof (Berlin)
#> 7          S5 S Strausberg Nord            S+U Warschauer Str. (Berlin)
#> 8          S5 S Strausberg Nord                 S Ostkreuz Bhf (Berlin)
#> 9         S85         S Grünau                 S Ostkreuz Bhf (Berlin)
#> 10        S85         S Grünau               S Treptower Park (Berlin)
#> 11        S85         S Grünau                 S Plänterwald (Berlin)
#> 12        S85         S Grünau               S Baumschulenweg (Berlin)
#> 13        S85         S Grünau             S Schöneweide Bhf (Berlin)
#> 14        S85         S Grünau S Betriebsbahnhof Schöneweide (Berlin)
#> 15        S85         S Grünau                    S Adlershof (Berlin)
#>    arrival_time departure_time
#> 1      08:00:30       08:01:12
#> 2      08:03:06       08:03:54
#> 3      08:05:24       08:05:54
#> 4      08:07:06       08:07:54
#> 5      08:09:24       08:09:54
#> 6      08:11:36       08:12:24
#> 7      08:14:06       08:14:42
#> 8      08:16:24       08:17:12
#> 9      08:18:00       08:18:36
#> 10     08:20:18       08:20:48
#> 11     08:23:12       08:23:36
#> 12     08:25:24       08:25:54
#> 13     08:28:24       08:28:54
#> 14     08:30:54       08:31:24
#> 15     08:33:24       08:33:54

to <- "Berlin, Habermannzeile"
print(iso$mid_points [grep (to, iso$mid_points$stop_name), ])
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.29128 ymin: 52.53967 xmax: 13.29128 ymax: 52.53967
#> geographic CRS: WGS 84
#>                  stop_name      stop_id departure  arrival duration transfers
#> 507 Berlin, Habermannzeile 070101003272     28800 08:38:00 00:38:00         3
#>                      geometry
#> 507 POINT (13.29128 52.53967)
print(iso$end_points [grep (to, iso$end_points$stop_name), ])
#> Simple feature collection with 0 features and 6 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> geographic CRS: WGS 84
#> [1] stop_name stop_id   departure arrival   duration  transfers geometry 
#> <0 rows> (or 0-length row.names)
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>   route_name                trip_name                      stop_name
#> 1       RB10           Nauen, Bahnhof S+U Berlin Hauptbahnhof (tief)
#> 2       RB10           Nauen, Bahnhof S+U Jungfernheide Bhf (Berlin)
#> 3         U7      S+U Rathaus Spandau S+U Jungfernheide Bhf (Berlin)
#> 4         U7      S+U Rathaus Spandau  U Jakob-Kaiser-Platz (Berlin)
#> 5        123 Berlin, Mäckeritzwiesen  U Jakob-Kaiser-Platz (Berlin)
#> 6        123 Berlin, Mäckeritzwiesen      Berlin, Weltlingerbrücke
#> 7        123 Berlin, Mäckeritzwiesen         Berlin, Habermannzeile
#>   arrival_time departure_time
#> 1     08:13:00       08:15:00
#> 2     08:20:00       08:21:00
#> 3     08:29:00       08:29:00
#> 4     08:30:30       08:30:30
#> 5     08:36:00       08:36:00
#> 6     08:37:00       08:37:00
#> 7     08:38:00       08:38:00

to <- "Berlin, Quellweg"
print(iso$mid_points [grep (to, iso$mid_points$stop_name), ])
#> Simple feature collection with 1 feature and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.26777 ymin: 52.53676 xmax: 13.26777 ymax: 52.53676
#> geographic CRS: WGS 84
#>            stop_name      stop_id departure  arrival duration transfers
#> 516 Berlin, Quellweg 070101001459     28800 08:48:00 00:48:00         3
#>                      geometry
#> 516 POINT (13.26777 52.53676)
print(iso$end_points [grep (to, iso$end_points$stop_name), ])
#> Simple feature collection with 0 features and 6 fields
#> bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
#> geographic CRS: WGS 84
#> [1] stop_name stop_id   departure arrival   duration  transfers geometry 
#> <0 rows> (or 0-length row.names)
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>   route_name               trip_name                      stop_name
#> 1       RB10          Nauen, Bahnhof S+U Berlin Hauptbahnhof (tief)
#> 2       RB10          Nauen, Bahnhof S+U Jungfernheide Bhf (Berlin)
#> 3         U7     S+U Rathaus Spandau S+U Jungfernheide Bhf (Berlin)
#> 4         U7     S+U Rathaus Spandau  U Jakob-Kaiser-Platz (Berlin)
#> 5         U7     S+U Rathaus Spandau            U Halemweg (Berlin)
#> 6         U7     S+U Rathaus Spandau         U Siemensdamm (Berlin)
#> 7         U7     S+U Rathaus Spandau            U Rohrdamm (Berlin)
#> 8        123 S+U Berlin Hauptbahnhof            U Rohrdamm (Berlin)
#> 9        123 S+U Berlin Hauptbahnhof               Berlin, Quellweg
#>   arrival_time departure_time
#> 1     08:13:00       08:15:00
#> 2     08:20:00       08:21:00
#> 3     08:24:00       08:24:00
#> 4     08:25:30       08:25:30
#> 5     08:27:00       08:27:00
#> 6     08:28:30       08:28:30
#> 7     08:29:30       08:29:30
#> 8     08:32:00       08:32:00
#> 9     08:34:00       08:34:00

Created on 2020-11-26 by the reprex package (v0.3.0)

comparison to OTP visually:

grafik

grafik

@mpadge
Copy link
Member

mpadge commented Nov 27, 2020

Yep, the first one of those ("Goslerer Platz") definitely gives the wrong isotrip, so I'll re-open the issue to dig deeper. (Haven't checked out the others yet, but will ensure they're all okay before closing again.) Thanks!


Update:

  • "Luisenplatz/Schloss Charlottenburg" is okay - i get 2 mid-points with 3 and 1 transfers, with the latter being the OTP benchmark.
  • "S Adlershof" is okay - the mid-point with only one transfer is too slow to be a viable direct route to that station, but ends up faster getting to the ultimately end point of that journey.
  • "Berlin, Habermannzeile" just does not appear in any results in get, either as end or mid-point, so not sure what's going on there?
  • "Berlin, Quellweg" is all okay for me, with 3 transfers in both isochrone and direct route, but i can't get the route you print above - leaving at 8:15, arriving at 8:34, and so taking 29 minutes. Every route I can get, either isochrone or direct, takes just under 38 minutes. Not sure what's happening there either?

@mpadge mpadge reopened this Nov 27, 2020
@mpadge
Copy link
Member

mpadge commented Jan 22, 2021

Update by way of asking whether it's okay to close this now? Everything seems okay. Note that the following presumes the new gtfs_traveltimes algorithm to give the definitively best (or equal best) results, and that changes learnt in developing that have to be propagated back to gtrs_router(). Until that time, gtfs_route() may give results that are either slower or have more transfers than the optimal routes found with gtfs_traveltimes(). The following code finds optimal traveltimes routes, and confirms in each case that these are at least as good as the best routes found with gtfs_route(). First the code to perform that analyses:

library (lubridate)
library(gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.154'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
transfers <- gtfsrouter::gtfs_transfer_table (gtfs, network_times = FALSE)
transfers <- gtfs$transfers %>%
  select(.data$from_stop_id, .data$to_stop_id, .data$transfer_type, .data$min_transfer_time) %>%
  rbind(transfers) %>%
  group_by(.data$from_stop_id, .data$to_stop_id) %>%
  summarise(across(everything(), first)) %>% # take min_transfer_time from gtfs$transfers if present
  data.table::data.table()
gtfs$transfers <- transfers
gtfs <- gtfs_timetable(gtfs, day = "tuesday")

from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
iso <- gtfs_traveltimes (gtfs, from, start_time)

route_stats <- function (gtfs, from, to, start_time) {
    r <- gtfs_route (gtfs, from, to, start_time)
    tstart <- hms (r$departure_time [1])
    tend <- hms (tail (r$arrival_time, 1))
    dur_sec <- as.duration (tend - tstart) / dseconds (1)
    dur <- seconds_to_period (dur_sec)
    ntr <- length (unique (r$trip_name)) - 1
    dur_fmt <- sprintf ('%02d:%02d:%02d', dur@hour, minute (dur), second (dur))
    message (ntr, " transfers; trip duration = ", dur_fmt)

    return (c (dur_sec, ntr))
}

to <- c ("S Feuerbachstr",
         "Berlin, Goslarer Platz",
         "Berlin, Luisenplatz/Schloss Charlottenburg",
         "S Adlershof",
         "Berlin, Habermannzeile",
         "Berlin, Quellweg")

# This function calculates the route statistics, and prints a message
# confirming that traveltimes routes are optimal compared with the
# results of gtfs_route():
compare_iso2route <- function (gtfs, iso, from, to, start_time) {

    message ("----TO: ", to, "----")
    this_iso <- iso [grep (to, iso$stop_name), ]
    print (this_iso)
    this_route <- route_stats (gtfs, from, to, start_time)
    # extract trip fastest trip with equal minimal transfers:
    this_iso <- this_iso [which.min (this_iso$ntransfers), ]
    this_iso <- this_iso [which.min (this_iso$duration), ]
    iso_time <- as.integer (seconds (this_iso$duration))
    if (iso_time <= this_route [1] & this_iso$ntransfers <= this_route  [2])
        message (" ---> Everything is okay there")
    else
        message (" ---> Nope, something is wrong with that one")
    message ()
}

And then, just for clarity, the results in their own chunk:

for (i in to)
    compare_iso2route (gtfs, iso, from, i, start_time)

#> ----TO: S Feuerbachstr----
#>       duration ntransfers      stop_id                stop_name stop_lon
#> 1129  00:22:54          2 900000063101 S Feuerbachstr. (Berlin) 13.33241
#> 16437 00:20:54          1 060063101841 S Feuerbachstr. (Berlin) 13.33241
#> 16438 00:20:54          2 060063101842 S Feuerbachstr. (Berlin) 13.33241
#> 25780 00:24:54          2 070101000151 S Feuerbachstr. (Berlin) 13.33241
#> 27222 00:24:54          2 070101001754 S Feuerbachstr. (Berlin) 13.33241
#>       stop_lat
#> 1129  52.46358
#> 16437 52.46358
#> 16438 52.46358
#> 25780 52.46358
#> 27222 52.46358
#> 1 transfers; trip duration = 00:21:36
#>  ---> Everything is okay there
#> 
#> ----TO: Berlin, Goslarer Platz----
#>       duration ntransfers      stop_id              stop_name stop_lon stop_lat
#> 288   00:26:00          2 900000001155 Berlin, Goslarer Platz 13.31426 52.52579
#> 26701 00:24:00          1 070101001168 Berlin, Goslarer Platz 13.31426 52.52579
#> 26851 00:17:00          1 070101001337 Berlin, Goslarer Platz 13.31426 52.52579
#> 1 transfers; trip duration = 00:24:00
#>  ---> Everything is okay there
#> 
#> ----TO: Berlin, Luisenplatz/Schloss Charlottenburg----
#>       duration ntransfers      stop_id
#> 521   00:22:48          2 900000022172
#> 26672 00:20:00          2 070101001138
#> 27319 00:22:48          2 070101001861
#> 27431 00:20:48          1 070101001985
#> 28880 00:22:48          2 070101003698
#>                                        stop_name stop_lon stop_lat
#> 521   Berlin, Luisenplatz/Schloss Charlottenburg 13.29963 52.51957
#> 26672 Berlin, Luisenplatz/Schloss Charlottenburg 13.29963 52.51957
#> 27319 Berlin, Luisenplatz/Schloss Charlottenburg 13.29963 52.51957
#> 27431 Berlin, Luisenplatz/Schloss Charlottenburg 13.29963 52.51957
#> 28880 Berlin, Luisenplatz/Schloss Charlottenburg 13.29963 52.51957
#> 1 transfers; trip duration = 00:20:48
#>  ---> Everything is okay there
#> 
#> ----TO: S Adlershof----
#>       duration ntransfers      stop_id
#> 2786  00:34:12          2 900000193002
#> 13859 00:32:12          1 060193002003
#> 13860 00:32:12          2 060193002004
#> 25527 00:44:48          3 900000193702
#> 25528 00:34:32          2 900000193703
#> 30633 00:35:12          2 070101006268
#> 30692 00:35:12          2 070101006330
#> 31088 00:35:12          2 070101006843
#> 31129 00:44:48          3 070101006892
#> 31163 00:34:32          2 070101006928
#> 32296 00:35:12          2 070301008867
#> 32303 00:35:12          2 070301008874
#> 40458 00:35:12          2 070101005663
#>                                              stop_name stop_lon stop_lat
#> 2786                              S Adlershof (Berlin) 13.54055 52.43510
#> 13859                             S Adlershof (Berlin) 13.54055 52.43510
#> 13860                             S Adlershof (Berlin) 13.54055 52.43510
#> 25527 S Adlershof (Bln) [Bus Dörpfeldstr. v. Köpenick] 13.54369 52.43622
#> 25528 S Adlershof (Bln) [Bus Dörpfeldstr. n. Köpenick] 13.54265 52.43567
#> 30633                             S Adlershof (Berlin) 13.54055 52.43510
#> 30692                             S Adlershof (Berlin) 13.54055 52.43510
#> 31088                             S Adlershof (Berlin) 13.54055 52.43510
#> 31129 S Adlershof (Bln) [Bus Dörpfeldstr. v. Köpenick] 13.54369 52.43622
#> 31163 S Adlershof (Bln) [Bus Dörpfeldstr. n. Köpenick] 13.54265 52.43567
#> 32296                             S Adlershof (Berlin) 13.54055 52.43510
#> 32303                             S Adlershof (Berlin) 13.54055 52.43510
#> 40458                             S Adlershof (Berlin) 13.54055 52.43510
#> 1 transfers; trip duration = 00:32:12
#>  ---> Everything is okay there
#> 
#> ----TO: Berlin, Habermannzeile----
#>       duration ntransfers      stop_id              stop_name stop_lon stop_lat
#> 477   00:25:00          3 900000018151 Berlin, Habermannzeile 13.29128 52.53967
#> 27462 00:25:00          3 070101002018 Berlin, Habermannzeile 13.29128 52.53967
#> 27480 00:25:00          3 070101002038 Berlin, Habermannzeile 13.29128 52.53967
#> 27488 00:25:00          3 070101002046 Berlin, Habermannzeile 13.29128 52.53967
#> 28527 00:23:00          2 070101003272 Berlin, Habermannzeile 13.29128 52.53967
#> 2 transfers; trip duration = 00:23:00
#>  ---> Everything is okay there
#> 
#> ----TO: Berlin, Quellweg----
#>       duration ntransfers      stop_id        stop_name stop_lon stop_lat
#> 737   00:25:00          3 900000035104 Berlin, Quellweg 13.26777 52.53676
#> 25918 00:25:00          3 070101000307 Berlin, Quellweg 13.26777 52.53676
#> 26964 00:23:00          2 070101001459 Berlin, Quellweg 13.26777 52.53676
#> 3 transfers; trip duration = 00:37:48
#>  ---> Everything is okay there

Created on 2021-01-22 by the reprex package (v0.3.0)

The biggest discrepancy there is with "Goslarer Platz", but the following lines suffice to show that the traveltimes result is indeed correct:

# ... same set-up as above ...

tr1 <- gtfs$timetable [trip_id == 14989]
tr1$dep <- gtfs$stops$stop_name [tr1$departure_station]
tr1$arr <- gtfs$stops$stop_name [tr1$arrival_station]

tr2 <- gtfs$timetable [trip_id == 36069]
tr2$dep <- gtfs$stops$stop_name [tr2$departure_station]
tr2$arr <- gtfs$stops$stop_name [tr2$arrival_station]

from <- "Berlin Hauptbahnhof"
change <- "Jungfernheide"
to <- "Goslarer Platz"
i1 <- grep (from, tr1$dep)
i2 <- grep (change, tr1$arr)
print (tr1 [i1:i2, ])
#>    departure_station arrival_station departure_time arrival_time trip_id
#> 1:             19371           22947          29700        30000   14989
#>                               dep                            arr
#> 1: S+U Berlin Hauptbahnhof (tief) S+U Jungfernheide Bhf (Berlin)
depart <- tr1$departure_time [i1 [1]]

i1 <- max (grep (change, tr2$dep))
i2 <- grep (to, tr2$arr)
print (tr2 [i1:i2, ])
#>    departure_station arrival_station departure_time arrival_time trip_id
#> 1:             26423           27430          30450        30600   36069
#> 2:             27430           28411          30600        30660   36069
#> 3:             28411           26851          30660        30720   36069
#>                               dep                         arr
#> 1: S+U Jungfernheide Bhf (Berlin) U Mierendorffplatz (Berlin)
#> 2:    U Mierendorffplatz (Berlin)    Berlin, Ilsenburger Str.
#> 3:       Berlin, Ilsenburger Str.      Berlin, Goslarer Platz
arrive <- tr2$arrival_time [i2]

message ("Travel time = ", hms::hms (arrive - depart))
#> Travel time = 00:17:00

Created on 2021-01-22 by the reprex package (v0.3.0)

@AlexandraKapp do you want to maybe repeat the above code, but using OTP as benchmark? As long as traveltimes results are no slower, then all should be good and we could close this

@AlexandraKapp
Copy link
Contributor Author

This looks great now!
For traveltimes we have a difference of only 1.26min Median (where gtfs router is faster than OTP), -2min (1. Quant), 4min (3. Quant)
grafik

Looking at the ratio (instead of differnce) this is also very close to 1.

For the transfers: also looking really good
~50% of transfers are the same, ~88% are within +/- 1 transfer
grafik

If minimise_transfers = T, then it's much less transfers for the gtfsrouter in comparison - so maybe it's more plausible to use minimise_transfers = F?
grafik

I checked a few deviations (for minimise_transfers = F) and there I found:

  • where less transfers for the gtfs_router: for some the amount of transfers of the gtfs_router makes more sense

  • where less transfers for OTP: transfer requires walking inbetween and is not listed in transfers.txt

  • a few routes I found, where I could not explain the gtfsrouter result. Could you maybe have a look at the exact trips there?:

library(gtfsrouter)
library(raster)
#> Loading required package: sp
library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

packageVersion("gtfsrouter")
#> [1] '0.0.4.156'

gtfs <- extract_gtfs(file.path("vbb_202010.zip"))
#> > Unzipping GTFS archive
#> v Unzipped GTFS archive
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
gtfs$calendar[1,]
#>    service_id monday tuesday wednesday thursday friday saturday sunday
#> 1:          1      0       0         0        0      0        0      0
#>    start_date end_date
#> 1:   20201023 20201212

ttable <- gtfs_timetable(gtfs, day = "tuesday")
iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, cutoff = 0)

# How can you get there with only 1 transfer in 1:04? 2 transfers and 1:07 is plausible 
iso[iso$stop_name == "Stahnsdorf, Annastr.", ]
#>       duration ntransfers      stop_id            stop_name stop_lon stop_lat
#> 34848 01:04:48          1 250000195901 Stahnsdorf, Annastr. 13.21124 52.38193
#> 34849 01:07:48          2 250000195902 Stahnsdorf, Annastr. 13.21124 52.38193
#> 34850 01:07:48          2 250000195903 Stahnsdorf, Annastr. 13.21124 52.38193
#> 39996 01:07:48          2 070101007421 Stahnsdorf, Annastr. 13.21124 52.38193

# How can you get there with only 1 transfer in 0:52? 2 transfers in 1:10 would be plausible 
iso[iso$stop_name == "Neuenhagen, Hauptmannstr.", ]
#>       duration ntransfers      stop_id                 stop_name stop_lon
#> 35624 00:52:48          1 820093201061 Neuenhagen, Hauptmannstr. 13.69818
#> 35625 00:57:48          1 820093201062 Neuenhagen, Hauptmannstr. 13.69818
#>       stop_lat
#> 35624 52.50886
#> 35625 52.50886

# How can you get there with only 1 transfer in 0:49? 2 transfers in 0:52 would be plausible 
iso[iso$stop_name == "Berlin, Homburgstr.", ]
#>       duration ntransfers      stop_id           stop_name stop_lon stop_lat
#> 27702 00:51:48          2 070101002304 Berlin, Homburgstr. 13.41082 52.39987
#> 28793 00:49:48          1 070101003592 Berlin, Homburgstr. 13.41082 52.39987

Created on 2021-01-25 by the reprex package (v0.3.0)

@mpadge
Copy link
Member

mpadge commented Jan 25, 2021

Thanks - I've got a few hours blocked out this afternoon to hope to do final bits. Those results look really encouraging. Note also that #71 will entirely remove the minimise_transfers parameter, and should give more reliable results from gtfs_route comparisons.

@AlexandraKapp
Copy link
Contributor Author

sorry just a small correction - this was google maps as benchmark and not OTP. OTP is slower on average than the gtfsrouter (9min Median) but Google Maps seems to be the better benchmark.

@mpadge
Copy link
Member

mpadge commented Jan 25, 2021

Oh, that's great that OTP is slower!! All routes should be feasible, and I would hope that gtfsrouter should now best at finding the truly fastest routes. Possibly not always against google, of course, because they've got highly flexible transfer rules informed by ML algorithms applied to what people actually do, so a "normal" algorithm could never expect to be anything but slightly slower than google. Anything within the vague distribution of equivalent google times would be a brilliant result!

@mpadge
Copy link
Member

mpadge commented Jan 26, 2021

Updated results after changes made in addressing #61. First the pre-processing code:

library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.158'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$transfers <- readRDS ("vbb-transfers.Rds")
gtfs <- gtfs_timetable(gtfs, day = "tuesday")

from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
iso <- gtfs_traveltimes (gtfs, from, start_time)

Then your first question:

How can you get there with only 1 transfer in 0:52? 2 transfers in 1:10 would be plausible

to <- "Neuenhagen, Hauptmannstr."
iso[iso$stop_name == to, ]
#>       duration ntransfers      stop_id                 stop_name stop_lon
#> 6824  00:54:48          2 900000320106 Neuenhagen, Hauptmannstr. 13.69818
#> 35624 00:52:48          1 820093201061 Neuenhagen, Hauptmannstr. 13.69818
#> 35625 00:57:48          1 820093201062 Neuenhagen, Hauptmannstr. 13.69818
#>       stop_lat
#> 6824  52.50886
#> 35624 52.50886
#> 35625 52.50886
gtfs_route (gtfs, from = from, to = to, start_time = start_time)
#>    route_name                              trip_name
#> 1          S5                      S Strausberg Nord
#> 2          S5                      S Strausberg Nord
#> 3          S5                      S Strausberg Nord
#> 4          S5                      S Strausberg Nord
#> 5          S5                      S Strausberg Nord
#> 6          S5                      S Strausberg Nord
#> 7          S5                      S Strausberg Nord
#> 8          S5                      S Strausberg Nord
#> 9          S5                      S Strausberg Nord
#> 10         S5                      S Strausberg Nord
#> 11         S5                      S Strausberg Nord
#> 12         S5                      S Strausberg Nord
#> 13         S5                      S Strausberg Nord
#> 14         S5                      S Strausberg Nord
#> 15         S5                      S Strausberg Nord
#> 16         S5                      S Strausberg Nord
#> 17         S5                      S Strausberg Nord
#> 18        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 19        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 20        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 21        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 22        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 23        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 24        940 Neuenhagen, Lindenstr./Einkaufszentrum
#> 25        940 Neuenhagen, Lindenstr./Einkaufszentrum
#>                          stop_name arrival_time departure_time
#> 1          S+U Berlin Hauptbahnhof     08:00:30       08:01:12
#> 2   S+U Friedrichstr. Bhf (Berlin)     08:03:06       08:03:54
#> 3      S Hackescher Markt (Berlin)     08:05:24       08:05:54
#> 4  S+U Alexanderplatz Bhf (Berlin)     08:07:06       08:07:54
#> 5     S+U Jannowitzbrücke (Berlin)     08:09:24       08:09:54
#> 6            S Ostbahnhof (Berlin)     08:11:36       08:12:24
#> 7     S+U Warschauer Str. (Berlin)     08:14:06       08:14:42
#> 8          S Ostkreuz Bhf (Berlin)     08:16:24       08:17:12
#> 9          S Nöldnerplatz (Berlin)     08:18:42       08:19:12
#> 10    S+U Lichtenberg Bhf (Berlin)     08:20:42       08:21:18
#> 11  S Friedrichsfelde Ost (Berlin)     08:23:12       08:23:42
#> 12             S Biesdorf (Berlin)     08:26:18       08:26:48
#> 13           S+U Wuhletal (Berlin)     08:28:24       08:29:24
#> 14            S Kaulsdorf (Berlin)     08:30:42       08:31:12
#> 15        S Mahlsdorf Bhf (Berlin)     08:32:54       08:33:54
#> 16                   S Birkenstein     08:36:42       08:37:12
#> 17                   S Hoppegarten     08:39:12       08:39:42
#> 18                   S Hoppegarten     08:46:00       08:46:00
#> 19  Neuenhagen, Hoppegartener Str.     08:48:00       08:48:00
#> 20            Neuenhagen, Grünstr.     08:49:00       08:49:00
#> 21          Neuenhagen, Buchenstr.     08:50:00       08:50:00
#> 22        Neuenhagen, Weimarerstr.     08:51:00       08:51:00
#> 23       Neuenhagen, Apoldaer Str.     08:52:00       08:52:00
#> 24     Neuenhagen, Eisenacher Str.     08:53:00       08:53:00
#> 25       Neuenhagen, Hauptmannstr.     08:54:00       08:54:00

The gtfs_route() result shows that it is indeed plausible (with a bus as the second leg).


Your second question:

How can you get there with only 1 transfer in 0:49? 2 transfers in 0:52 would be plausible

to <- "Berlin, Homburgstr."
iso[iso$stop_name == to, ]
#>       duration ntransfers      stop_id           stop_name stop_lon stop_lat
#> 1405  00:48:00          4 900000074159 Berlin, Homburgstr. 13.41082 52.39987
#> 27702 00:48:00          4 070101002304 Berlin, Homburgstr. 13.41082 52.39987
#> 28793 00:46:00          3 070101003592 Berlin, Homburgstr. 13.41082 52.39987
iso <- gtfs_traveltimes (gtfs, from, start_time, minimise_transfers = TRUE)
iso[iso$stop_name == to, ]
#>       duration ntransfers      stop_id           stop_name stop_lon stop_lat
#> 1405  00:48:00          4 900000074159 Berlin, Homburgstr. 13.41082 52.39987
#> 27702 00:48:00          4 070101002304 Berlin, Homburgstr. 13.41082 52.39987
#> 28793 00:46:00          3 070101003592 Berlin, Homburgstr. 13.41082 52.39987

Created on 2021-01-26 by the reprex package (v0.3.0)

So the previous result (49 minutes with 1 transfer) no longer appears, but it is poissible in 46:00 with 3 transfers. How does the connection with only 2 transfers work? That is not returned at all, either here or in gtfs_route(). Raptor also gives a traveltime of 48:00 with 3 transfers. I think - and hope 🤞 - that this is now all okay too? Let me know ...

@AlexandraKapp
Copy link
Contributor Author

Yes, examples looking all good now :) 🎉

Though setting cutoff=0 now takes much longer than it used to (46 sec vs. 1.3 sec). I know it's not unexpected that it takes longer than leaving it at default, as it searches the entire timetable, though it now takes significantly longer than it used to.
How would you recommend to best use the cutoff parameter?

library(gtfsrouter)
ttable <- readRDS('timetable.Rds')

t1 <- Sys.time()
cutoff_default <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600)
t2 <- Sys.time()
t2 - t1
#> Time difference of 1.381428 secs
nrow(cutoff_default)
#> [1] 13990

t1 <- Sys.time()
cutoff_0 <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, cutoff = 0)
t2 <- Sys.time()
t2 - t1
#> Time difference of 46.60025 secs
nrow(cutoff_0)
#> [1] 40776

Created on 2021-01-28 by the reprex package (v0.3.0)

I also tried again with the old example in Stuttgart where the issue came up the last time. Here we'd still have the same problem, that only 101 stations would be reached if the cutoff parameter is not changed. Setting it to e.g. 20 is faster than cutoff = 0 and gets the desired amount of results.
Though I suppose it is difficult to use a fixed value working for every feed and station?

library(gtfsrouter)

gtfs <- extract_gtfs(file.path("gtfs.zip"))
#> > Unzipping GTFS archivev Unzipped GTFS archive  
#> > Extracting GTFS feed
#> Warning in data.table::fread(flist[f], integer64 = "character", showProgress
#> = FALSE): Found and resolved improper quoting out-of-sample. First healed
#> line 109: <<"21-10-j21-1","","10","Marienplatz - Degerloch (Zahnradbahn
#> "Zacke")","1400","FFB300","004299">>. If the fields are not quoted (e.g. field
#> separator does not appear within any field), try quote="" to avoid this warning.
#> v Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
ttable <- gtfs_timetable(gtfs, day = "tuesday")


start_ids <- ttable$stops[grepl("Charlottenplatz", ttable$stops$stop_name)] 
start_ids <- start_ids[1:9, ]$stop_id # exclude two "Charlottenplatz in Esslingen)

t1 <- Sys.time()
pt_traveltimes <- gtfs_traveltimes(ttable, start_ids, from_is_id = T, start_time = 8 * 3600)
t2 <- Sys.time()
t2-t1
#> Time difference of 0.2840421 secs
nrow(pt_traveltimes)
#> [1] 101

t1 <- Sys.time()
pt_traveltimes <- gtfs_traveltimes(ttable, start_ids, from_is_id = T, start_time = 8 * 3600, cutoff = 0)
t2 <- Sys.time()
t2-t1
#> Time difference of 4.354886 secs
nrow(pt_traveltimes)
#> [1] 9006

t1 <- Sys.time()
pt_traveltimes <- gtfs_traveltimes(ttable, start_ids, from_is_id = T, start_time = 8 * 3600, cutoff = 20)
t2 <- Sys.time()
t2-t1
#> Time difference of 0.472074 secs
nrow(pt_traveltimes)
#> [1] 8135

Created on 2021-01-28 by the reprex package (v0.3.0)

@mpadge
Copy link
Member

mpadge commented Jan 28, 2021

I suspect it might be more useful to replace that cutoff with a parameter that actually means something, like the maximum_trip_duration parameter of r5r::travel_time_matrix(). That's still not entirely trivial, because it's unclear (including in r5r) whether that it best calculated from the nominated start time, or whether more of the timetable is scanned, but only trips out to that nominated duration are ultimately returned. I think that r5r uses the latter, but that doesn't really mean anything in the context of this algorithm, because it would just be throwing out useful data for no reason.

Maybe the simplest of all: specify a start_time and end_time, where end_time is absolute, so only trips which arrive at any station prior to the nominated time will be returned? Thoughts appreciated

@AlexandraKapp
Copy link
Contributor Author

yes I think that would be great!

I'm not sure what makes more sense for the algorithm, but I think additionally to the start_time a

  • latest_start_time
  • end_time
  • or a combination of all three
    would work.

@mpadge
Copy link
Member

mpadge commented Feb 1, 2021

@AlexandraKapp Can we close this now and move subsequent discussion of algorithm stopping control over to #75?

@AlexandraKapp
Copy link
Contributor Author

yes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants