-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctrl+f vertical, replace with diagonal #83
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
WalkthroughThe overarching change across the SportsDataverse project involves switching the data concatenation method from "vertical" to "diagonal" in various loaders and roster management files across multiple sports modules (CFB, MBB, NBA, NFL, NHL, WBB, WNBA). This modification aims to optimize data handling and ensure consistency in data structure and processing across the board. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (17)
- sportsdataverse/cfb/cfb_game_rosters.py (2 hunks)
- sportsdataverse/cfb/cfb_loaders.py (4 hunks)
- sportsdataverse/cfb/cfb_schedule.py (1 hunks)
- sportsdataverse/mbb/mbb_game_rosters.py (2 hunks)
- sportsdataverse/mbb/mbb_loaders.py (4 hunks)
- sportsdataverse/nba/nba_game_rosters.py (2 hunks)
- sportsdataverse/nba/nba_loaders.py (4 hunks)
- sportsdataverse/nfl/nfl_game_rosters.py (2 hunks)
- sportsdataverse/nfl/nfl_loaders.py (12 hunks)
- sportsdataverse/nfl/nfl_schedule.py (1 hunks)
- sportsdataverse/nhl/nhl_api.py (1 hunks)
- sportsdataverse/nhl/nhl_game_rosters.py (1 hunks)
- sportsdataverse/nhl/nhl_loaders.py (4 hunks)
- sportsdataverse/wbb/wbb_game_rosters.py (2 hunks)
- sportsdataverse/wbb/wbb_loaders.py (4 hunks)
- sportsdataverse/wnba/wnba_game_rosters.py (2 hunks)
- sportsdataverse/wnba/wnba_loaders.py (4 hunks)
Additional comments: 51
sportsdataverse/nhl/nhl_api.py (1)
- 61-61: The change from vertical to diagonal concatenation in
nhl_api_schedule
is consistent with the PR's objective to better handle mismatches in column names across different years' tables. This approach allows for the creation of new columns as needed with appropriate backfilling of NaNs for previous entries, enhancing data integrity and consistency.sportsdataverse/nba/nba_loaders.py (4)
- 39-39: The change to diagonal concatenation in
load_nba_pbp
aligns with the PR's goal to improve data frame merging by handling column mismatches more effectively.- 67-67: The diagonal concatenation in
load_nba_team_boxscore
is a positive change for handling data from different seasons more robustly.- 95-95: Switching to diagonal concatenation in
load_nba_player_boxscore
enhances the method's ability to deal with varying column names across seasons.- 123-123: The update to diagonal concatenation in
load_nba_schedule
is consistent with the PR's objectives, facilitating better data integrity when merging frames.sportsdataverse/wnba/wnba_loaders.py (4)
- 39-39: The diagonal concatenation in
load_wnba_pbp
is a strategic improvement for handling data frames with mismatched columns across different seasons.- 67-67: Updating to diagonal concatenation in
load_wnba_team_boxscore
supports the PR's goal of enhancing data merging capabilities.- 95-95: The change to diagonal concatenation in
load_wnba_player_boxscore
is beneficial for managing varying column names in data from different seasons.- 123-123: Switching to diagonal concatenation in
load_wnba_schedule
aligns with the PR's objectives to improve data integrity during the merging process.sportsdataverse/wbb/wbb_loaders.py (4)
- 39-39: The update to diagonal concatenation in
load_wbb_pbp
is a positive change for handling data frames with mismatched columns more effectively.- 67-67: Switching to diagonal concatenation in
load_wbb_team_boxscore
supports the PR's goal of improving data merging capabilities.- 95-95: The change to diagonal concatenation in
load_wbb_player_boxscore
enhances the ability to manage varying column names across different seasons.- 123-123: Updating to diagonal concatenation in
load_wbb_schedule
aligns with the PR's objectives to ensure better data integrity during the merging process.sportsdataverse/mbb/mbb_loaders.py (4)
- 39-39: The diagonal concatenation in
load_mbb_pbp
is a strategic improvement for handling data frames with mismatched columns across different seasons.- 67-67: Updating to diagonal concatenation in
load_mbb_team_boxscore
supports the PR's goal of enhancing data merging capabilities.- 95-95: The change to diagonal concatenation in
load_mbb_player_boxscore
is beneficial for managing varying column names in data from different seasons.- 123-123: Switching to diagonal concatenation in
load_mbb_schedule
aligns with the PR's objectives to improve data integrity during the merging process.sportsdataverse/nhl/nhl_loaders.py (4)
- 39-39: The update to diagonal concatenation in
load_nhl_pbp
is a positive change for handling data frames with mismatched columns more effectively.- 66-66: Switching to diagonal concatenation in
load_nhl_schedule
supports the PR's goal of improving data merging capabilities.- 94-94: The change to diagonal concatenation in
load_nhl_team_boxscore
enhances the ability to manage varying column names across different seasons.- 122-122: Updating to diagonal concatenation in
load_nhl_player_boxscore
aligns with the PR's objectives to ensure better data integrity during the merging process.sportsdataverse/cfb/cfb_loaders.py (3)
- 40-40: The update to diagonal concatenation in
load_cfb_pbp
is a strategic improvement for handling data frames with mismatched columns across different seasons.- 68-68: Switching to diagonal concatenation in
load_cfb_schedule
supports the PR's goal of improving data merging capabilities.- 95-95: The change to diagonal concatenation in
load_cfb_rosters
enhances the ability to manage varying column names in data from different seasons.sportsdataverse/nba/nba_game_rosters.py (2)
- 93-93: Switching to diagonal concatenation in
helper_nba_team_items
is consistent with the PR's objectives, facilitating better handling of data frames with mismatched columns.- 132-132: The update to diagonal concatenation in
helper_nba_roster_items
supports the goal of improving data merging capabilities by effectively managing varying column names.sportsdataverse/mbb/mbb_game_rosters.py (2)
- 88-88: The change from "vertical" to "diagonal" concatenation in
helper_mbb_team_items
is aligned with the PR's objective to better handle mismatches in column names across different years' tables. This approach ensures that new columns are created as needed with appropriate backfilling of NaNs for previous entries.- 130-130: The change from "vertical" to "diagonal" concatenation in
helper_mbb_roster_items
is consistent with the PR's goal and should help in managing column mismatches more effectively.sportsdataverse/wbb/wbb_game_rosters.py (2)
- 88-88: Switching to "diagonal" concatenation in
helper_wbb_team_items
is a strategic move to address column mismatches, ensuring data integrity across years.- 130-130: The adoption of "diagonal" concatenation in
helper_wbb_roster_items
supports the PR's objective to improve data handling and consistency.sportsdataverse/nhl/nhl_game_rosters.py (1)
- 97-97: Modifying the concatenation method to "diagonal" in
helper_nhl_team_items
is a thoughtful approach to enhance data frame merging, especially for handling column mismatches.sportsdataverse/wnba/wnba_game_rosters.py (2)
- 93-93: The switch to "diagonal" concatenation in
helper_wnba_team_items
aligns with the PR's objectives, facilitating better data frame merging and column handling.- 133-133: Implementing "diagonal" concatenation in
helper_wnba_roster_items
is a positive change for managing data consistency and integrity.sportsdataverse/cfb/cfb_game_rosters.py (2)
- 92-92: Switching to "diagonal" concatenation in
helper_cfb_team_items
is a strategic improvement for handling data frames, especially for column mismatches.- 134-134: The adoption of "diagonal" concatenation in
helper_cfb_roster_items
supports the PR's goal of improving data handling and consistency across different years' tables.sportsdataverse/nfl/nfl_game_rosters.py (2)
- 95-95: Switching the concatenation method to "diagonal" in
helper_nfl_team_items
is consistent with the PR's objective to better handle mismatches in column names across different years' tables. This change should ensure that new columns are created as needed with appropriate backfilling of NaNs for previous entries, enhancing data integrity and consistency.- 137-137: The change to "diagonal" concatenation in
helper_nfl_roster_items
aligns with the PR's goal of improving data concatenation methods. This should facilitate the creation of new columns where necessary and backfill NaNs for previous concatenation entries, addressing issues with mismatches in column names.sportsdataverse/cfb/cfb_schedule.py (1)
- 224-224: The modification to use "diagonal" concatenation in
espn_cfb_calendar
is in line with the PR's objectives to address and rectify issues with data frame concatenation. This approach should help in handling mismatches in column names more effectively, ensuring data consistency across different years' tables.sportsdataverse/nfl/nfl_schedule.py (1)
- 172-172: Changing the concatenation method to "diagonal" in
espn_nfl_calendar
supports the PR's aim of improving data frame concatenation methods. This should enhance the library's ability to handle column name mismatches and facilitate the creation of new columns with backfilled NaNs, thereby improving data integrity.sportsdataverse/nfl/nfl_loaders.py (12)
- 62-62: The change to "diagonal" concatenation in
load_nfl_pbp
aligns with the PR objectives. However, it's important to monitor the performance and memory usage, as diagonal concatenation can be more resource-intensive.- 92-92: The change to "diagonal" concatenation in
load_nfl_schedule
is appropriate for handling varying columns across seasons. Keep an eye on performance and memory usage.- 228-228: The update to "diagonal" concatenation in
load_nfl_pfr_weekly_pass
is correctly implemented. Continue to monitor performance and memory usage.- 276-276: Adopting "diagonal" concatenation in
load_nfl_pfr_weekly_rush
is suitable for the intended data handling improvements. Be mindful of the potential impact on performance and memory.- 324-324: The "diagonal" concatenation method in
load_nfl_pfr_weekly_rec
is correctly applied. It's advisable to monitor its impact on performance and memory usage.- 372-372: The implementation of "diagonal" concatenation in
load_nfl_pfr_weekly_def
is appropriate. Keep monitoring performance and memory usage.- 396-396: Changing to "diagonal" concatenation in
load_nfl_rosters
is well-implemented. Continue to assess the impact on performance and memory usage.- 420-420: The update to "diagonal" concatenation in
load_nfl_weekly_rosters
is correctly implemented. It's important to monitor its impact on performance and memory usage.- 482-482: Implementing "diagonal" concatenation in
load_nfl_snap_counts
is suitable for the intended improvements. Be mindful of the potential impact on performance and memory.- 506-506: The "diagonal" concatenation method in
load_nfl_pbp_participation
is correctly applied. It's advisable to monitor its impact on performance and memory usage.- 530-530: Adopting "diagonal" concatenation in
load_nfl_injuries
is well-implemented. Continue to assess the impact on performance and memory usage.- 554-554: The update to "diagonal" concatenation in
load_nfl_depth_charts
is correctly implemented. It's important to monitor its impact on performance and memory usage.
@@ -122,7 +122,7 @@ def load_cfb_team_info(seasons: List[int], return_as_pandas=False) -> pl.DataFra | |||
i_data = pl.read_parquet(CFB_TEAM_INFO_URL.format(season=i), use_pyarrow=True, columns=None) | |||
except Exception: | |||
print(f"We don't seem to have data for the {i} season.") | |||
data = pl.concat([data, i_data], how="vertical") | |||
data = pl.concat([data, i_data], how="diagonal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
The expanded context confirms the potential issue with the exception handling block not accounting for cases where i_data
is not defined due to an exception. This oversight could lead to a runtime error when attempting to concatenate data
with an undefined i_data
. The review comment is validated by the provided context.
Analysis chain
Updating to diagonal concatenation in load_cfb_team_info
aligns with the PR's objectives to ensure better data integrity during the merging process. However, there's a potential issue with the exception handling block that might not account for cases where i_data
is not defined due to an exception.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify that i_data is always defined before being used in concatenation
grep -C3 "try:" sportsdataverse/cfb/cfb_loaders.py | grep -C3 "except Exception:"
Length of output: 367
Script:
#!/bin/bash
# Expand the search to include more context around the try-except block and the concatenation operation
grep -C10 "try:" sportsdataverse/cfb/cfb_loaders.py | grep -C10 "except Exception:"
Length of output: 838
when doing broad sweeps, pl.concat can sometimes fail due to mismatched columns between different years' tables. Changing 'vertical' to 'diagonal' fixes that; it causes column name matching, and creation of new columns (with backfilled nans for previous concat entries) when needed.
NB: when running pytest locally, am getting a lot of errors in test_dl_utils. I suspect some changes to urllib/requests in a later version of python may cause issues with how the exception block is being handled.
in any event, the exception handling logic for
download
likely needs rewritten. For example, the logger expectsresponse
to exist, when in reality, an exception raising inside session.get will cause response to never be written. A mock response object thus has to be instantiated prior to logging the status codes and so on.also of note: consider refactoring the manual retry handling using adapters, eg here
Summary by CodeRabbit