Skip to content

Querying Play By Play Data

Matthew Pope edited this page Apr 7, 2021 · 5 revisions

Play By Play

Table of Contents

Play by play data is queryable, however it is a but cumbersome at the moment. A single row in the play_by_play table contains up to three player's participation. There is a event_message_type enum foreign key to a table that acts as an enum for high level descriptions. There is also the eent_msg_action_type field, which is a more indepth description for each event_message_type, and it has 88 possible values for each event_message_type. However, there are no good available resources for the possible values and we have an open issue here to address this: https://github.com/mpope9/nba-sql/issues/12. Storing these id -> description would greatly help with querying specific plays.


Regex

Using regex in SQL is currently the easiest way of getting data. Each row has three fields that can be queried using regex, home_description, away_description, and neutral_description. For example, what if we were interested in querying for the player with the most technical fouls? In this case, we can focus on the home_description, and we can look for the t.foul value (that is the abbreviation in the column). We're going to use ILIKE, which ignores letter casing. We're going to refine the search to % t.foul%. % means match any characters on either side. We have the space so we don't match on FLAGRANT FOUL.

SELECT 
  player1.player_name AS player_name,
  count(player1.player_name) AS technical_fouls
FROM play_by_play
LEFT join player as player1 on player1.player_id = play_by_play.player1_id
WHERE
  home_description ilike '% t.foul%' OR
  visitor_description ilike '% t.foul%'
GROUP BY player_name
ORDER BY technical_fouls DESC;

Running this correctly identifies Dwight Howard as the leader of technical fouls in the league.


event_msg_action_type

Here are the possible play classifications:

id string
1 FIELD_GOAL_MADE
2 FIELD_GOAL_MISSED
3 FREE_THROW
4 REBOUND
5 TURNOVER
6 FOUL
7 VIOLATION
8 SUBSTITUTION
9 TIMEOUT
10 JUMP_BALL
11 EJECTION
12 PERIOD_BEGIN
13 PERIOD_END
18 UNKNOWN

Example Query

The above table can be used to query for specific plays. The following example is for querying for player_ids that were involved in rebound plays:

SELECT player_id1, player_id2, player_id3 FROM play_by_play
LEFT JOIN event_message_type ON play_by_play.event_msg_type = event_message_type.id
WHERE event_message_type.string = 'REBOUND'

The play_by_play table has three description fields, home_description, neutral_description, visitor_description. These can be queried using regex to get more specific information about matched plays. The nba_api documentation has a great write-up on how to do that.

Clone this wiki locally