-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Time series plot of scheduled trips since 2019 #30
Comments
I do think that for historical versions you probably would want to pick up all successive versions, because if there's a new version released it likely contains corrections. So the best way to treat a schedule zipfile is that it should be considered "in effect" for the period where both the following are true:
Theoretically, if both those conditions are met but there are no active services in |
Thanks for the explanation. I can try to write something that would check for those two conditions. Would someone still want to download that feed even if |
If (Sidebar, this kind of thing is why I want to take a second look at the general schedule aggregation code in the RT vs. schedule script.) |
I've just submitted a pull request (#35) that makes a first attempt at generating schedule feeds. I wasn't sure how to handle the following case though. In [10]: create_schedule_list(10, 2020)
INFO:root: Searching page 1
INFO:root: Searching page 2
INFO:root: Searching page 3
INFO:root: Searching page 4
INFO:root: Searching page 5
INFO:root: Searching page 6
INFO:root: Found schedule for October 2020
INFO:root: Adding schedule for October 10, 2020
INFO:root: The duplicate schedule versions are {'1 September 2021'}. Check whether these were in-effect
Out[10]:
[{'schedule_version': '20201010',
'feed_start_date': '2020-10-11',
'feed_end_date': '2020-11-13'},
{'schedule_version': '20201114',
'feed_start_date': '2020-11-15',
'feed_end_date': '2020-11-19'},
{'schedule_version': '20201120',
'feed_start_date': '2020-11-21',
'feed_end_date': '2020-12-11'},
{'schedule_version': '20201212',
'feed_start_date': '2020-12-13',
'feed_end_date': '2021-01-03'},
{'schedule_version': '20210104',
'feed_start_date': '2021-01-05',
'feed_end_date': '2021-03-17'},
{'schedule_version': '20210318',
'feed_start_date': '2021-03-19',
'feed_end_date': '2021-03-25'},
{'schedule_version': '20210326',
'feed_start_date': '2021-03-27',
'feed_end_date': '2021-04-22'},
{'schedule_version': '20210423',
'feed_start_date': '2021-04-24',
'feed_end_date': '2021-04-26'},
{'schedule_version': '20210427',
'feed_start_date': '2021-04-28',
'feed_end_date': '2021-05-03'},
{'schedule_version': '20210504',
'feed_start_date': '2021-05-05',
'feed_end_date': '2021-05-12'},
{'schedule_version': '20210513',
'feed_start_date': '2021-05-14',
'feed_end_date': '2021-05-27'},
{'schedule_version': '20210528',
'feed_start_date': '2021-05-29',
'feed_end_date': '2021-06-09'},
{'schedule_version': '20210610',
'feed_start_date': '2021-06-11',
'feed_end_date': '2021-06-14'},
{'schedule_version': '20210615',
'feed_start_date': '2021-06-16',
'feed_end_date': '2021-08-01'},
{'schedule_version': '20210802',
'feed_start_date': '2021-08-03',
'feed_end_date': '2021-08-31'},
{'schedule_version': '20210901',
'feed_start_date': '2021-09-02',
'feed_end_date': '2021-09-06'},
{'schedule_version': '20210907',
'feed_start_date': '2021-09-08',
...
] There are multiple versions of 1 September 2021 here. Would this violate condition 1 because there is no gap between successive schedule versions? I was planning to drop it, but it looks like |
Oof, that's wild (that they had 3 versions). In cases of multiples, I'd just keep the final version that was left up (because that was the one that was actually online for subsequent days). Because the way this works on the actual CTA website is just that there's a current version, and whatever is the final version uploaded on a given date is the one that was left on the actual website the longest (into subsequent dates). |
Okay thanks, I'll modify it to take the latest version then. |
It will be important to start tracking the number of scheduled trips starting from pre-COVID up to date. This will help to check whether the CTA decides to lower the number of scheduled trips to match the actual trips. The reduction in scheduled trips will improve the trip ratios, but the bus service will still be lower than pre-COVID levels, a less than ideal scenario.
Data
To access older data, you will need to look at schedule versions from transitfeeds.com dating back to 2019. You probably do not need to choose every schedule version for a given year because there is some overlap of the date ranges between the versions. It is good to check, however, that the schedule versions you choose span the entire year. For 2019, for example, you could choose the versions "7 November 2018" (6 November 2018 - 31 January 2019), "31 January 2019" (30 January 2019 - 31 March 2019), "14 April 2019" (29 March 2019 - 31 May 2019), "16 May 2019" (13 May 2019 - 31 July 2019), "5 August 2019" (1 August 2019 - 31 October 2019), "4 October 2019" (4 October 2019 - 31 December 2019). You could then drop the 2018 dates and duplicates that may arise from the overlapping dates.
Set up your virtual environment with the required packages by following the instructions in the README, and activate it. Once you have the schedule feeds of interest, run the snippet from inside the
data_analysis
directory. If running from the project root, changestatic_gtfs_analysis
todata_analysis.static_gtfs_analysis
.Access the schedule data in the list with
schedule_df
should have enough information to generate plots of scheduled trip counts by day.Example
The text was updated successfully, but these errors were encountered: