Using Aster to Uncover Super Bowl Probabilities

Learn Data Science
Teradata Employee



The NFL playoffs are coming up so what better time to combine Aster and football! We can use Aster's graph capabilities to analyze the 2015 football season in order to see who we think has the best chance of winning the Super Bowl. Above you is the finished graphic of probabilities that each remaining playoff team has to advance to and win the Super Bowl. For example, Carolina has a 53.6% chance to win its game against Seattle, a 27.2% chance to win both against Seattle and whomever they may face in the NFC Championship and a 14.9% chance to win the Super Bowl.


The data used for this analysis was gathered using the NFLDB Python package. NFLDB is a great way to look at NFL statistics and is usable with PostgreSQL. All NFL 2015 regular and wildcard playoff games were used to calculated the probabilities. Dummy games have been inserted to add parity for the playoff games that earned byes.

--loading data nfl 2015 games with dummy nodes

CREATE TABLE al186065.games_2015_dummyinc


    gsis_id varchar,

    home_team varchar,

    home_score integer,

    away_team varchar,

    away_score integer,

    season_type varchar




--load CSV file with ncluster_loader

ncluster_loader -h -U db_superuser -w db_superuser -d beehive

     --skip-rows 1 -c al186065.games_2015_dummyinc /tmp/games_2015_dummyinc.csv


Eigenvector Centrality

The method used to calculate an initial vector of probabilities for every NFL team was eigenvector centrality. Using each team as vertices in a graph and directed edges from the losing team pointing to the winning team we are able to generate a graph. The edges or games in the graph are weighted based on the margin of victory and home field advantage (3 points in this case). Edge weighting is perhaps the most critical part of the probability formulation. In this formulation the winning edges are weight in three categories: a win of more than 7 points after home field yields a full weight of 1 while wins greater than a field goal only get 80% of the weight and wins less than 3 points get a weight of 50%. The sigma graph below is the visualization of our graph used to setup eigenvector centrality.

In addition to weighting edges, dummy nodes were added to help boast the ratings of the playoff teams that had bye weeks. The dummy node was given a victory in order for all nodes to remain in one communicating class.

--nodes (teams)

CREATE TABLE al186065.nfl_teams



    SELECT DISTINCT home_team as teams

    FROM al186065.games_2015_dummyinc


--edges (games)

CREATE TABLE al186065.nfl_games



    SELECT * ,


            WHEN home_score > away_score

                THEN away_team

            ELSE home_team

        END) AS losing_team,


            WHEN home_score > away_score

                THEN home_team

            ELSE away_team

        END) AS winning_team,


            WHEN abs(home_score - away_score) - 3 > 7 AND home_score > away_score

                THEN 1

            WHEN abs(away_score - home_score) > 7 AND away_score > home_score

                THEN 1

            WHEN abs(home_score - away_score) - 3 > 3 AND home_score > away_score

                THEN .8

            WHEN abs(away_score - home_score) > 3 AND away_score > home_score

                THEN .8

            ELSE .5

        END) AS weight

    FROM al186065.games_2015_dummyinc


--calculate eigenvector


ON al186065.nfl_teams AS vertices PARTITION BY teams

ON al186065.nfl_games AS edges PARTITION BY losing_team




) ORDER BY teams;


                             nfl network.PNG

This Sigma chart was made using the directed edges of the graph as "paths" with the weights as the "cnt" from the season's games. From the chart we can clearly see the clustering of the AFC and NFC divisions that played each other according to this season's schedule. The color of the edges represents the weight (the darker the higher weight). The dummy node we used to give more edges to the playoff teams receiving byes can be seen in the middle as well.

--sigma chart nfl season

--create mock npath output with path and cnt

CREATE TABLE al186065.sigma_vis



    SELECT *,

    ('[' || losing_team || ', ' || winning_team || ']') AS path,

    weight AS cnt

FROM al186065.nfl_games


--appcenter visualization

INSERT INTO app_center_visualizations  (json)

SELECT json FROM Visualizer (

ON al186065.sigma_vis PARTITION BY 1





Calculating Probabilities

Eigenvector centrality produces the dominate eigenvector of the graph. Since the graph can be expressed as a matrix with all nodes in one communicating class we can normalize the dominate eigenvector produced by eigenvector centrality as a vector of probabilities. Using these probabilities we compare the relative weight of them to form head to head probabilities. For example the probability Carolina wins the NFL Championship game would be it's influence on the graph over the total influence of Carolina and it's perspective opponents (Green Bay and Arizona). The influence of the opponents is calculated by their influence on the graph and the expected probability the team has to advance to the current round. The probability of winning multiple games is the product of individual game winning probabilities assuming game independence. Attached is an Excel sheet used the calculate the win probabilities after obtaining the eigenvector.


The probabilities suggest that the two teams with the best chance to win the Super Bowl are the Cardinals and Panthers. The NFC is slightly favored to win the Super Bowl with a 53.7% chance. As a sanity check Vegas has Carolina at +500 odds to win the Super Bowl which converts to 16.67%, we predicted Carolina with a 14.9% chance to win. In reality we would expect there to be more variance in the probabilities, the weighting system used to weight the edges does not produce significantly diverse probabilities but still does a good job in determining relative strength.