Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctrl+f vertical, replace with diagonal #83

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MirandaRosalise
Copy link

@MirandaRosalise MirandaRosalise commented Mar 19, 2024

when doing broad sweeps, pl.concat can sometimes fail due to mismatched columns between different years' tables. Changing 'vertical' to 'diagonal' fixes that; it causes column name matching, and creation of new columns (with backfilled nans for previous concat entries) when needed.

NB: when running pytest locally, am getting a lot of errors in test_dl_utils. I suspect some changes to urllib/requests in a later version of python may cause issues with how the exception block is being handled.

in any event, the exception handling logic for download likely needs rewritten. For example, the logger expects response to exist, when in reality, an exception raising inside session.get will cause response to never be written. A mock response object thus has to be instantiated prior to logging the status codes and so on.

also of note: consider refactoring the manual retry handling using adapters, eg here

Summary by CodeRabbit

  • Refactor
    • Updated the data concatenation method from "vertical" to "diagonal" across various loaders and roster files for improved data structure and consistency. This change impacts College Football (CFB), Men's Basketball (MBB), NBA, NFL, NHL, Women's Basketball (WBB), and WNBA data processing modules.
    • Made column modifications in certain data frames for better alignment and consistency across the board.

Copy link

vercel bot commented Mar 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
sportsdataverse-py ✅ Ready (Inspect) Visit Preview Mar 19, 2024 3:17am

Copy link

coderabbitai bot commented Mar 19, 2024

Walkthrough

The overarching change across the SportsDataverse project involves switching the data concatenation method from "vertical" to "diagonal" in various loaders and roster management files across multiple sports modules (CFB, MBB, NBA, NFL, NHL, WBB, WNBA). This modification aims to optimize data handling and ensure consistency in data structure and processing across the board.

Changes

Files Change Summary
.../cfb/..., .../mbb/..., .../nba/..., .../nfl/..., .../nhl/..., .../wbb/..., .../wnba/... (Multiple files across sports) Switched data concatenation method from "vertical" to "diagonal" in various functions. Additionally, made column modifications for improved data alignment and processing.

🐰✨
In the realm of data, both wide and tall,
A rabbit hopped, making sense of it all.
From vertical to diagonal, a magical switch,
Aligning the columns, without a hitch.
Through sports and seasons, the changes unfurl,
A leap towards consistency, in the data world.
🌟📊

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 8f757d3 and 846a387.
Files selected for processing (17)
  • sportsdataverse/cfb/cfb_game_rosters.py (2 hunks)
  • sportsdataverse/cfb/cfb_loaders.py (4 hunks)
  • sportsdataverse/cfb/cfb_schedule.py (1 hunks)
  • sportsdataverse/mbb/mbb_game_rosters.py (2 hunks)
  • sportsdataverse/mbb/mbb_loaders.py (4 hunks)
  • sportsdataverse/nba/nba_game_rosters.py (2 hunks)
  • sportsdataverse/nba/nba_loaders.py (4 hunks)
  • sportsdataverse/nfl/nfl_game_rosters.py (2 hunks)
  • sportsdataverse/nfl/nfl_loaders.py (12 hunks)
  • sportsdataverse/nfl/nfl_schedule.py (1 hunks)
  • sportsdataverse/nhl/nhl_api.py (1 hunks)
  • sportsdataverse/nhl/nhl_game_rosters.py (1 hunks)
  • sportsdataverse/nhl/nhl_loaders.py (4 hunks)
  • sportsdataverse/wbb/wbb_game_rosters.py (2 hunks)
  • sportsdataverse/wbb/wbb_loaders.py (4 hunks)
  • sportsdataverse/wnba/wnba_game_rosters.py (2 hunks)
  • sportsdataverse/wnba/wnba_loaders.py (4 hunks)
Additional comments: 51
sportsdataverse/nhl/nhl_api.py (1)
  • 61-61: The change from vertical to diagonal concatenation in nhl_api_schedule is consistent with the PR's objective to better handle mismatches in column names across different years' tables. This approach allows for the creation of new columns as needed with appropriate backfilling of NaNs for previous entries, enhancing data integrity and consistency.
sportsdataverse/nba/nba_loaders.py (4)
  • 39-39: The change to diagonal concatenation in load_nba_pbp aligns with the PR's goal to improve data frame merging by handling column mismatches more effectively.
  • 67-67: The diagonal concatenation in load_nba_team_boxscore is a positive change for handling data from different seasons more robustly.
  • 95-95: Switching to diagonal concatenation in load_nba_player_boxscore enhances the method's ability to deal with varying column names across seasons.
  • 123-123: The update to diagonal concatenation in load_nba_schedule is consistent with the PR's objectives, facilitating better data integrity when merging frames.
sportsdataverse/wnba/wnba_loaders.py (4)
  • 39-39: The diagonal concatenation in load_wnba_pbp is a strategic improvement for handling data frames with mismatched columns across different seasons.
  • 67-67: Updating to diagonal concatenation in load_wnba_team_boxscore supports the PR's goal of enhancing data merging capabilities.
  • 95-95: The change to diagonal concatenation in load_wnba_player_boxscore is beneficial for managing varying column names in data from different seasons.
  • 123-123: Switching to diagonal concatenation in load_wnba_schedule aligns with the PR's objectives to improve data integrity during the merging process.
sportsdataverse/wbb/wbb_loaders.py (4)
  • 39-39: The update to diagonal concatenation in load_wbb_pbp is a positive change for handling data frames with mismatched columns more effectively.
  • 67-67: Switching to diagonal concatenation in load_wbb_team_boxscore supports the PR's goal of improving data merging capabilities.
  • 95-95: The change to diagonal concatenation in load_wbb_player_boxscore enhances the ability to manage varying column names across different seasons.
  • 123-123: Updating to diagonal concatenation in load_wbb_schedule aligns with the PR's objectives to ensure better data integrity during the merging process.
sportsdataverse/mbb/mbb_loaders.py (4)
  • 39-39: The diagonal concatenation in load_mbb_pbp is a strategic improvement for handling data frames with mismatched columns across different seasons.
  • 67-67: Updating to diagonal concatenation in load_mbb_team_boxscore supports the PR's goal of enhancing data merging capabilities.
  • 95-95: The change to diagonal concatenation in load_mbb_player_boxscore is beneficial for managing varying column names in data from different seasons.
  • 123-123: Switching to diagonal concatenation in load_mbb_schedule aligns with the PR's objectives to improve data integrity during the merging process.
sportsdataverse/nhl/nhl_loaders.py (4)
  • 39-39: The update to diagonal concatenation in load_nhl_pbp is a positive change for handling data frames with mismatched columns more effectively.
  • 66-66: Switching to diagonal concatenation in load_nhl_schedule supports the PR's goal of improving data merging capabilities.
  • 94-94: The change to diagonal concatenation in load_nhl_team_boxscore enhances the ability to manage varying column names across different seasons.
  • 122-122: Updating to diagonal concatenation in load_nhl_player_boxscore aligns with the PR's objectives to ensure better data integrity during the merging process.
sportsdataverse/cfb/cfb_loaders.py (3)
  • 40-40: The update to diagonal concatenation in load_cfb_pbp is a strategic improvement for handling data frames with mismatched columns across different seasons.
  • 68-68: Switching to diagonal concatenation in load_cfb_schedule supports the PR's goal of improving data merging capabilities.
  • 95-95: The change to diagonal concatenation in load_cfb_rosters enhances the ability to manage varying column names in data from different seasons.
sportsdataverse/nba/nba_game_rosters.py (2)
  • 93-93: Switching to diagonal concatenation in helper_nba_team_items is consistent with the PR's objectives, facilitating better handling of data frames with mismatched columns.
  • 132-132: The update to diagonal concatenation in helper_nba_roster_items supports the goal of improving data merging capabilities by effectively managing varying column names.
sportsdataverse/mbb/mbb_game_rosters.py (2)
  • 88-88: The change from "vertical" to "diagonal" concatenation in helper_mbb_team_items is aligned with the PR's objective to better handle mismatches in column names across different years' tables. This approach ensures that new columns are created as needed with appropriate backfilling of NaNs for previous entries.
  • 130-130: The change from "vertical" to "diagonal" concatenation in helper_mbb_roster_items is consistent with the PR's goal and should help in managing column mismatches more effectively.
sportsdataverse/wbb/wbb_game_rosters.py (2)
  • 88-88: Switching to "diagonal" concatenation in helper_wbb_team_items is a strategic move to address column mismatches, ensuring data integrity across years.
  • 130-130: The adoption of "diagonal" concatenation in helper_wbb_roster_items supports the PR's objective to improve data handling and consistency.
sportsdataverse/nhl/nhl_game_rosters.py (1)
  • 97-97: Modifying the concatenation method to "diagonal" in helper_nhl_team_items is a thoughtful approach to enhance data frame merging, especially for handling column mismatches.
sportsdataverse/wnba/wnba_game_rosters.py (2)
  • 93-93: The switch to "diagonal" concatenation in helper_wnba_team_items aligns with the PR's objectives, facilitating better data frame merging and column handling.
  • 133-133: Implementing "diagonal" concatenation in helper_wnba_roster_items is a positive change for managing data consistency and integrity.
sportsdataverse/cfb/cfb_game_rosters.py (2)
  • 92-92: Switching to "diagonal" concatenation in helper_cfb_team_items is a strategic improvement for handling data frames, especially for column mismatches.
  • 134-134: The adoption of "diagonal" concatenation in helper_cfb_roster_items supports the PR's goal of improving data handling and consistency across different years' tables.
sportsdataverse/nfl/nfl_game_rosters.py (2)
  • 95-95: Switching the concatenation method to "diagonal" in helper_nfl_team_items is consistent with the PR's objective to better handle mismatches in column names across different years' tables. This change should ensure that new columns are created as needed with appropriate backfilling of NaNs for previous entries, enhancing data integrity and consistency.
  • 137-137: The change to "diagonal" concatenation in helper_nfl_roster_items aligns with the PR's goal of improving data concatenation methods. This should facilitate the creation of new columns where necessary and backfill NaNs for previous concatenation entries, addressing issues with mismatches in column names.
sportsdataverse/cfb/cfb_schedule.py (1)
  • 224-224: The modification to use "diagonal" concatenation in espn_cfb_calendar is in line with the PR's objectives to address and rectify issues with data frame concatenation. This approach should help in handling mismatches in column names more effectively, ensuring data consistency across different years' tables.
sportsdataverse/nfl/nfl_schedule.py (1)
  • 172-172: Changing the concatenation method to "diagonal" in espn_nfl_calendar supports the PR's aim of improving data frame concatenation methods. This should enhance the library's ability to handle column name mismatches and facilitate the creation of new columns with backfilled NaNs, thereby improving data integrity.
sportsdataverse/nfl/nfl_loaders.py (12)
  • 62-62: The change to "diagonal" concatenation in load_nfl_pbp aligns with the PR objectives. However, it's important to monitor the performance and memory usage, as diagonal concatenation can be more resource-intensive.
  • 92-92: The change to "diagonal" concatenation in load_nfl_schedule is appropriate for handling varying columns across seasons. Keep an eye on performance and memory usage.
  • 228-228: The update to "diagonal" concatenation in load_nfl_pfr_weekly_pass is correctly implemented. Continue to monitor performance and memory usage.
  • 276-276: Adopting "diagonal" concatenation in load_nfl_pfr_weekly_rush is suitable for the intended data handling improvements. Be mindful of the potential impact on performance and memory.
  • 324-324: The "diagonal" concatenation method in load_nfl_pfr_weekly_rec is correctly applied. It's advisable to monitor its impact on performance and memory usage.
  • 372-372: The implementation of "diagonal" concatenation in load_nfl_pfr_weekly_def is appropriate. Keep monitoring performance and memory usage.
  • 396-396: Changing to "diagonal" concatenation in load_nfl_rosters is well-implemented. Continue to assess the impact on performance and memory usage.
  • 420-420: The update to "diagonal" concatenation in load_nfl_weekly_rosters is correctly implemented. It's important to monitor its impact on performance and memory usage.
  • 482-482: Implementing "diagonal" concatenation in load_nfl_snap_counts is suitable for the intended improvements. Be mindful of the potential impact on performance and memory.
  • 506-506: The "diagonal" concatenation method in load_nfl_pbp_participation is correctly applied. It's advisable to monitor its impact on performance and memory usage.
  • 530-530: Adopting "diagonal" concatenation in load_nfl_injuries is well-implemented. Continue to assess the impact on performance and memory usage.
  • 554-554: The update to "diagonal" concatenation in load_nfl_depth_charts is correctly implemented. It's important to monitor its impact on performance and memory usage.

@@ -122,7 +122,7 @@ def load_cfb_team_info(seasons: List[int], return_as_pandas=False) -> pl.DataFra
i_data = pl.read_parquet(CFB_TEAM_INFO_URL.format(season=i), use_pyarrow=True, columns=None)
except Exception:
print(f"We don't seem to have data for the {i} season.")
data = pl.concat([data, i_data], how="vertical")
data = pl.concat([data, i_data], how="diagonal")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tip

Codebase Verification

The expanded context confirms the potential issue with the exception handling block not accounting for cases where i_data is not defined due to an exception. This oversight could lead to a runtime error when attempting to concatenate data with an undefined i_data. The review comment is validated by the provided context.

Analysis chain

Updating to diagonal concatenation in load_cfb_team_info aligns with the PR's objectives to ensure better data integrity during the merging process. However, there's a potential issue with the exception handling block that might not account for cases where i_data is not defined due to an exception.


Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that i_data is always defined before being used in concatenation
grep -C3 "try:" sportsdataverse/cfb/cfb_loaders.py | grep -C3 "except Exception:"

Length of output: 367



Script:

#!/bin/bash
# Expand the search to include more context around the try-except block and the concatenation operation
grep -C10 "try:" sportsdataverse/cfb/cfb_loaders.py | grep -C10 "except Exception:"

Length of output: 838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant