-
Notifications
You must be signed in to change notification settings - Fork 23
Querying Play By Play Data
Play by play data is queryable, however it is a but cumbersome at the moment. A single row in the play_by_play
table contains up to three player's participation. There is a event_message_type
enum foreign key to a table that acts as an enum for high level descriptions. There is also the eent_msg_action_type
field, which is a more indepth description for each event_message_type
, and it has 88 possible values for each event_message_type
. However, there are no good available resources for the possible values and we have an open issue here to address this: https://github.com/mpope9/nba-sql/issues/12. Storing these id -> description would greatly help with querying specific plays.
Using regex in SQL is currently the easiest way of getting data. Each row has three fields that can be queried using regex, home_description
, away_description
, and neutral_description
. For example, what if we were interested in querying for the player with the most technical fouls? In this case, we can focus on the home_description
, and we can look for the t.foul
value (that is the abbreviation in the column). We're going to use ILIKE
, which ignores letter casing. We're going to refine the search to % t.foul%
. %
means match any characters on either side. We have the space so we don't match on FLAGRANT FOUL
.
SELECT
player1.player_name AS player_name,
count(player1.player_name) AS technical_fouls
FROM play_by_play
LEFT join player as player1 on player1.player_id = play_by_play.player1_id
WHERE
home_description ilike '% t.foul%' OR
visitor_description ilike '% t.foul%'
GROUP BY player_name
ORDER BY technical_fouls DESC;
Running this correctly identifies Dwight Howard as the leader of technical fouls in the league.
Here are the possible play classifications:
id | string |
---|---|
1 | FIELD_GOAL_MADE |
2 | FIELD_GOAL_MISSED |
3 | FREE_THROW |
4 | REBOUND |
5 | TURNOVER |
6 | FOUL |
7 | VIOLATION |
8 | SUBSTITUTION |
9 | TIMEOUT |
10 | JUMP_BALL |
11 | EJECTION |
12 | PERIOD_BEGIN |
13 | PERIOD_END |
18 | UNKNOWN |
The above table can be used to query for specific plays. The following example is for querying for player_ids that were involved in rebound plays:
SELECT player_id1, player_id2, player_id3 FROM play_by_play
LEFT JOIN event_message_type ON play_by_play.event_msg_type = event_message_type.id
WHERE event_message_type.string = 'REBOUND'
The play_by_play
table has three description fields, home_description
, neutral_description
, visitor_description
. These can be queried using regex to get more specific information about matched plays. The nba_api documentation has a great write-up on how to do that.