Feb. 2, 2021, 3:14 p.m.

R
*· 12 min read*

Making it to the Super Bowl is not easy, yet Tom Brady accomplished this 10 times now, and as the Super Bowl's date and time approaches many fans want to predict the outcome of the game.

Here is a list of a few trends I noticed and will discuss in this article:

- Tom Brady ALWAYS wins the Super Bowl when it is held in a Southern State
- The difference between scores of each Super Bowl Tom Brady has played in shows a steady increase
- Tom Brady has an 84.03% regular-season home winning record
- Tom Brady has a 73.08% post-season home winning record

In order to graph and analyze the data on the trends I noticed, I used RStudio which you can read more about in this article.

Tom Brady has competed in Super Bowls located in Northern, Western, and Southern states. He played two Super Bowls in the North (Indiana and Minnesota) and lost them both, two Super Bowls in the West (both in Arizona) lost one and won the other, and he played five Super Bowls in the South (Georgia, Louisiana, Tennessee, and twice in Texas) and won every single game.

Below is a chart showcasing each state the Super Bowl was held in and whether or not Tom Brady won (represented by 1) or lost (represented by 0).

```
tom <- read.csv("~path_to_my_file/tb_sb_data.csv")
ggplot(tom, aes(x=Location, y=Win.)) + geom_bar(stat="identity", width=.5, fill="tomato3") + labs(title="Tom Brady Always Wins in Southern States", x = "Super Bowl Location", y = "Win(1) or Loss(0)")
```

Understanding the code: I upload a .csv file of Tom Brady's statistics and other information and name it *tom*. I use *ggplot()* to call on *tom* and assign the x-value as *Location* and the y-value as *Win.*. I then add *geom_bar()* to customize and color my bar chart and finally, I use *labs()* to give my graph and x and y-axises titles.

Since the Super Bowl is held in Tampa Bay, Florida this year, does that mean Tom Brady is destined to win? I performed a correlation test to see how associated the Super Bowl's location is to Tom Brady winning the Super Bowl. I got a correlation number of **.7756** which means there is a relatively strong, positive association between the Super Bowl's location and whether or not Tom Brady wins the Super Bowl. Here are my results:

`cor(tom$location.in.numbers, tom$Win.)`

Understanding the code: I use *cor()* which tells R to run a correlation test between *Win.* and *location.in.numbers* from my dataset *tom*.

Finally, to make sure the correlation test I performed can be accounted for I ran a regression test to see the p-value which indicates how significant the results are compared to the data gathered. A p-value greater than .05 means the results are not significant and there likely is no correlation between the data, but a p-value below .05 means the results show a level of significance and there is a possibility a correlation exists between the data. The information in the red box below is my p-value of **.01425**.

```
regression_tom <- lm(Win. ~ location.in.numbers, data=tom)
summary(regression_tom)
```

Understanding the code: My code above tells R to create a linear regression model between *Win.* and *location.in.numbers* from my data set tom. *summary()* prints the information seen above.

Another interesting trend I noticed was the gradual increase in the difference between the final scores of the Super Bowl as the number of Super Bowls Tom Brady appeared in increases. Here is the data visualized as a chart and a graph:

```
print(tom)
ggplot(tom, aes(x = number.of.super.bowls, y = Difference.between.Scores)) +
geom_point() +
geom_smooth(method = "lm", color= "blue", se = FALSE) +
labs( title = "Number of Super Bowls Tom Brady Appeared in and Difference Between Final Scores", x = "Number of Super Bowls", y = "Difference between final score") + theme(plot.title = element_text(hjust = 0.5))
```

Understanding the code: *print()* shows me all the information in my dataset. Once again I use *ggplot()* to call on tom and assign the x-value to *number.of.super.bowls* and the y-value to *Difference.between.Scores*. Then, I add *geom_point()* to create the points and *geom_smooth()* to create a liner regression line as indicated by *method = "lm"*, along with setting the *color = "blue"*. *se* *= FALSE* means I do not want the standard error. Finally, I use *labs()* to label my graph and x and y-axises.

I was able to calculate an equation to give me a rough estimate of the point difference for Super Bowl 2021, first by performing another correlation test to see if the number of Tom Brady's Super Bowl appearances is associated with the difference between the score at the end of the game. I got a correlation number of **.8875895** which indicates a strong, positive correlation.

`cor(tom$number.of.super.bowls, tom$Difference.between.Scores)`

Understanding the code: This line of code uses *cor()* to call on *tom* and perform a correlation test between *number.of.super.bowls* and *Difference.between.Scores*.

To double-check if I could draw a correlation between these two variables, I ran a linear regression model and got a p-value of **.001** meaning my data is sufficient enough to assume by correlation might be right. The two values boxed in red below are used to plug into the standard linear equation, y=mx + b. Once the numbers are plugged in, the equation becomes **y=.7222 + .8333x** where x equals the number of Super Bowls Tom Brady appeared in and y represents the difference between the final score of the Super Bowl. When plugging in **10** (it will be Tom Brady's 10th Super Bowl appearance) as the x-value of the linear regression equation, the y-value becomes **9.0552**.

If it is fair to assume my correlation and linear regression model are accurate, I estimate the difference between the final score of the Super Bowl to be at least 9 points.

```
regression_tom <- lm(Difference.between.Scores ~ number.of.super.bowls, data=tom)
summary(regression_tom)
```

Understanding the code: The first line of code creates the linear regression information by calling on *tom* and using *number.of.super.bowls *and *Difference.between.Scores. summary()* prints the information seen above.

Whenever a team has the home-field advantage, fans are typically optimistic about their team being able to win. Looking at Tom Brady's winning record for all the home games he's played in the regular season from 2003 to 2020 might be useful statistics in determining who is more likely to win in Super Bowl 2021.

While playing for the New England Patriots, Tom Brady won **116 out of 136 home games** which is an **85.3% success rate**. Of the 22 home games the Patriots lost, quarterbacks Matt Cassel and Jacoby Brissett were reported as the leading passers with Cassel losing 4 games, Brissett losing 1, and Tom Brady losing 17 games.

Since Tom Brady signed his contract with the Tampa Bay Buccaneers in 2020, he's won 5 out of 8 home games and the last time he had a 5 and 8 home winning record was in 2006. Today, Tom Brady's overall regular-season home winning record is **121 out of 144 home games** for an **84.03% success rate**.

While Tom Brady's regular-season home winning record is impressive, looking at his post-season home winning record provides better statistics in determining his chance at winning this year's Super Bowl.

Of the 26 post-season games held at home, Tom Brady won **19 games** which is a **73.08% success rate**. Given these numbers, calculating the probability of him winning his 20th game out of 27 post-season games held at home is simple and shown below:

This means Tom Brady has a **17.18%** chance of only winning exactly 20 out of 27 post-season home games and no more in his entire career. In other words, Tom Brady has a 17.18% chance of maintaining a 20 out of 27 post-season home winning record only if he wins this upcoming Super Bowl.

```
probability <- .7308
wins <- 20
totalgames <- 27
dbinom(wins, totalgames, probability) # probability of winning exactly 20/27 home post-season games
```

Understanding the code: First, I set Tom Brady's *probability* to .7308, next I set *wins* equal to 20 because I want to know the chance he will win exactly 20 out of 27 post-season home games, then I set *totalgames* to 27 because Super Bowl 2021 will be the 27th post-season home game Tom Brady will compete in. Finally, I used *dbinom* which represents a binomial distribution. A binomial distribution determines the probability of the number of successes in a fixed number of trials, or games in my situation. After running *dbinom(wins, totalgames, probability), I get .1717607. *

Using the probabilities I calculated above, I am going to calculate the probabilities of him losing and winning the Super Bowl.

**Tom Brady's Chance of Losing Super Bowl 2021:**

I want to calculate the probability he will win less than 20 post-season home games, meaning he will lose this upcoming Super Bowl and never win another post-season home game during his career. Here are my results:

This number means Tom Brady has a **44.68%** chance of never winning his 20th post-season home football game out of 27 tries. In other words, he has a 44.68% chance of maintaining a 19 and 27 home post-season record.

`sum(dbinom(0:19,27,0.7308)) # P of winning less than 20 times in 27 tries. `

Understanding the code: I used the *dbinom* function again but I want to add all the values of each individual win. R calculates each probability of winning 1 game out of 27 games, 2 games out of 27 games, and so forth until R reaches the 19th game, then adds all the individual game probabilities to generate one probability number as seen below above.

**Tom Brady's Chance of Winning Super Bowl 2021:**

Now, I want to know Tom Brady's chance of winning more than 19 times in 27 games, meaning he wins this upcoming Super Bowl.

Tom Brady has a **55.32%** chance of winning this upcoming Super Bowl as it will be his 27th post-season home game or he has a 55.32% of winning at least one more game.

`1-sum(dbinom(0:19,27,0.7308)) # P of winning more than 19 times in 27 tries.`

Understanding the code: I run a very similar line of code to the one I used to calculate Tom Brady's chance of losing his 20th post-season home game except I subtract the entire probability by 1. This gives me Tom Brady's probability of winning more than 19 times in 27 games.

To help visualize Tom Brady's statistics and the probability of winning here is a graph I created:

This graph represents each probability Tom Brady has of winning his X number of post-season home games. The probability begins at 1 because of 26 post-season home games as Tom Brady is bound to win a certain number of 26 games given his success rate. As the number of wins increases, Tom Brady's chance of winning begins to decrease.

```
tbwins <- c(0:20)
tbtotalgames <- 27
probability <- .7308
tb<- data.frame(tbwins=tbwins,probabilityXOrMoreWins = 1-pbinom(tbwins-1,tbtotalgames, probability))
ggplot(tb, aes(tbwins, probabilityXOrMoreWins)) + geom_point() + geom_line(color="red") + labs(title = "Probability Of Tom Brady Winning X Or More Home Playoff Games Of 27 Total", x = "Number of Wins", y = "Probability")
```

Understanding the code: I begin creating my graph by making a list with the values 0 through 20 which represents every post-season home game Tom Brady will win. Once again I set *totalgames* to 27 and *probability* to .7308. Next, I create a new data frame that categories *tbwins* and calculate the *probabilityXorMoreWins*. Finally, I use *ggplot()* to create a line graph with points that shows the distribution of Tom Brady's probability of winning 19 or more games.

While this information may sound confusing at first, try to think of it in terms of how teams actually perform. A team winning 25 out of 27 post-season home games is an insane record and relatively hard to achieve given the amount of talent there is in the NFL. As Tom Brady wins a specific amount of games, his probability of maintaining the same proportion of wins naturally decreases.

This is a similar concept regarding the NFL's regular season. If an NFL team wins the first 5 games on their schedule, that's not commonly seen so their odds of losing their 6th, 7th, or 8th games increases with every win. However, it is always important to realize it is just as likely to win all, two, one, or none of those 6th, 7th, or 8th games.

These are all probabilities so nothing is certain and the impossible can definitely happen as we have already seen with Tom Brady going to the Super Bowl now 10 times or having a 16-0 winning record in 2007. Regardless, accurate predictions of trends can be noticed and be accounted for.

My work is not an accurate prediction of the Super Bowl's outcome because I only used statistics and trends revolving around Tom Brady, so I would not place a monetary bet based on my results as this is just practice for fun!

If I gather data on both the Tampa Bay Buccaneers and Kansas City Chiefs regarding their chances of winning each regular and post-season games based on the skill levels of each team, then my probabilities would likely be more reflective of the actual outcome of the Super Bowl. If every game is controlled by Tom Brady then these results would be more accurate, but that's not how football works.

Regardless, using Tom Brady's stats and other information is great practice in learning how to calculate probabilities, correlations, and linear regression models which are useful skills applicable to many other situations.

If I were to make a prediction, **TB12 will win by 12**. Why? Well, Tom Brady doesn't loose in the South and there seems to be a trend of 2 point differences between the final score of the Super Bowl, but probabilities are not certain and the game can always go either way.

The Super Bowl takes place on **February 7th, 2021 at 3:30 PST** so you'll have until then to gather as much data and insights as you can to make your best prediction as to who will win the Super Bowl!

Follow us @ordinarycoders

0 Comments