Don't already have an account? Create an account.
By creating an Ordinary Coders account, you agree to Ordinary Coders' Terms and Conditions.
If you already have an account, login instead.
Create an account and receive your first 100 points. Submit an article to earn more and unlock your first project.
Making it to the Super Bowl is not easy, yet Tom Brady accomplished this 10 times now, and as the Super Bowl's date and time approaches many fans want to predict the outcome of the game.
Here is a list of a few trends I noticed and will discuss in this article:
In order to graph and analyze the data on the trends I noticed, I used RStudio which you can read more about in this article.
Tom Brady has competed in Super Bowls located in Northern, Western, and Southern states. He played two Super Bowls in the North (Indiana and Minnesota) and lost them both, two Super Bowls in the West (both in Arizona) lost one and won the other, and he played five Super Bowls in the South (Georgia, Louisiana, Tennessee, and twice in Texas) and won every single game.
Below is a chart showcasing each state the Super Bowl was held in and whether or not Tom Brady won (represented by 1) or lost (represented by 0).
tom <- read.csv("~path_to_my_file/tb_sb_data.csv")
ggplot(tom, aes(x=Location, y=Win.)) + geom_bar(stat="identity", width=.5, fill="tomato3") + labs(title="Tom Brady Always Wins in Southern States", x = "Super Bowl Location", y = "Win(1) or Loss(0)")
Understanding the code: I upload a .csv file of Tom Brady's statistics and other information and name it tom. I use ggplot() to call on tom and assign the x-value as Location and the y-value as Win.. I then add geom_bar() to customize and color my bar chart and finally, I use labs() to give my graph and x and y-axises titles.
Since the Super Bowl is held in Tampa Bay, Florida this year, does that mean Tom Brady is destined to win? I performed a correlation test to see how associated the Super Bowl's location is to Tom Brady winning the Super Bowl. I got a correlation number of .7756 which means there is a relatively strong, positive association between the Super Bowl's location and whether or not Tom Brady wins the Super Bowl. Here are my results:
Understanding the code: I use cor() which tells R to run a correlation test between Win. and location.in.numbers from my dataset tom.
Finally, to make sure the correlation test I performed can be accounted for I ran a regression test to see the p-value which indicates how significant the results are compared to the data gathered. A p-value greater than .05 means the results are not significant and there likely is no correlation between the data, but a p-value below .05 means the results show a level of significance and there is a possibility a correlation exists between the data. The information in the red box below is my p-value of .01425.
regression_tom <- lm(Win. ~ location.in.numbers, data=tom)
Understanding the code: My code above tells R to create a linear regression model between Win. and location.in.numbers from my data set tom. summary() prints the information seen above.
Another interesting trend I noticed was the gradual increase in the difference between the final scores of the Super Bowl as the number of Super Bowls Tom Brady appeared in increases. Here is the data visualized as a chart and a graph:
ggplot(tom, aes(x = number.of.super.bowls, y = Difference.between.Scores)) +
geom_smooth(method = "lm", color= "blue", se = FALSE) +
labs( title = "Number of Super Bowls Tom Brady Appeared in and Difference Between Final Scores", x = "Number of Super Bowls", y = "Difference between final score") + theme(plot.title = element_text(hjust = 0.5))
Understanding the code: print() shows me all the information in my dataset. Once again I use ggplot() to call on tom and assign the x-value to number.of.super.bowls and the y-value to Difference.between.Scores. Then, I add geom_point() to create the points and geom_smooth() to create a liner regression line as indicated by method = "lm", along with setting the color = "blue". se = FALSE means I do not want the standard error. Finally, I use labs() to label my graph and x and y-axises.
I was able to calculate an equation to give me a rough estimate of the point difference for Super Bowl 2021, first by performing another correlation test to see if the number of Tom Brady's Super Bowl appearances is associated with the difference between the score at the end of the game. I got a correlation number of .8875895 which indicates a strong, positive correlation.
Understanding the code: This line of code uses cor() to call on tom and perform a correlation test between number.of.super.bowls and Difference.between.Scores.
To double-check if I could draw a correlation between these two variables, I ran a linear regression model and got a p-value of .001 meaning my data is sufficient enough to assume by correlation might be right. The two values boxed in red below are used to plug into the standard linear equation, y=mx + b. Once the numbers are plugged in, the equation becomes y=.7222 + .8333x where x equals the number of Super Bowls Tom Brady appeared in and y represents the difference between the final score of the Super Bowl. When plugging in 10 (it will be Tom Brady's 10th Super Bowl appearance) as the x-value of the linear regression equation, the y-value becomes 9.0552.
If it is fair to assume my correlation and linear regression model are accurate, I estimate the difference between the final score of the Super Bowl to be at least 9 points.
regression_tom <- lm(Difference.between.Scores ~ number.of.super.bowls, data=tom)
Understanding the code: The first line of code creates the linear regression information by calling on tom and using number.of.super.bowls and Difference.between.Scores. summary() prints the information seen above.
Whenever a team has the home-field advantage, fans are typically optimistic about their team being able to win. Looking at Tom Brady's winning record for all the home games he's played in the regular season from 2003 to 2020 might be useful statistics in determining who is more likely to win in Super Bowl 2021.
While playing for the New England Patriots, Tom Brady won 116 out of 136 home games which is an 85.3% success rate. Of the 22 home games the Patriots lost, quarterbacks Matt Cassel and Jacoby Brissett were reported as the leading passers with Cassel losing 4 games, Brissett losing 1, and Tom Brady losing 17 games.
Since Tom Brady signed his contract with the Tampa Bay Buccaneers in 2020, he's won 5 out of 8 home games and the last time he had a 5 and 8 home winning record was in 2006. Today, Tom Brady's overall regular-season home winning record is 121 out of 144 home games for an 84.03% success rate.
While Tom Brady's regular-season home winning record is impressive, looking at his post-season home winning record provides better statistics in determining his chance at winning this year's Super Bowl.
Of the 26 post-season games held at home, Tom Brady won 19 games which is a 73.08% success rate. Given these numbers, calculating the probability of him winning his 20th game out of 27 post-season games held at home is simple and shown below:
This means Tom Brady has a 17.18% chance of only winning exactly 20 out of 27 post-season home games and no more in his entire career. In other words, Tom Brady has a 17.18% chance of maintaining a 20 out of 27 post-season home winning record only if he wins this upcoming Super Bowl.
probability <- .7308
wins <- 20
totalgames <- 27
dbinom(wins, totalgames, probability) # probability of winning exactly 20/27 home post-season games
Understanding the code: First, I set Tom Brady's probability to .7308, next I set wins equal to 20 because I want to know the chance he will win exactly 20 out of 27 post-season home games, then I set totalgames to 27 because Super Bowl 2021 will be the 27th post-season home game Tom Brady will compete in. Finally, I used dbinom which represents a binomial distribution. A binomial distribution determines the probability of the number of successes in a fixed number of trials, or games in my situation. After running dbinom(wins, totalgames, probability), I get .1717607.
Using the probabilities I calculated above, I am going to calculate the probabilities of him losing and winning the Super Bowl.
Tom Brady's Chance of Losing Super Bowl 2021:
I want to calculate the probability he will win less than 20 post-season home games, meaning he will lose this upcoming Super Bowl and never win another post-season home game during his career. Here are my results:
This number means Tom Brady has a 44.68% chance of never winning his 20th post-season home football game out of 27 tries. In other words, he has a 44.68% chance of maintaining a 19 and 27 home post-season record.
sum(dbinom(0:19,27,0.7308)) # P of winning less than 20 times in 27 tries.
Understanding the code: I used the dbinom function again but I want to add all the values of each individual win. R calculates each probability of winning 1 game out of 27 games, 2 games out of 27 games, and so forth until R reaches the 19th game, then adds all the individual game probabilities to generate one probability number as seen below above.
Tom Brady's Chance of Winning Super Bowl 2021:
Now, I want to know Tom Brady's chance of winning more than 19 times in 27 games, meaning he wins this upcoming Super Bowl.
Tom Brady has a 55.32% chance of winning this upcoming Super Bowl as it will be his 27th post-season home game or he has a 55.32% of winning at least one more game.
1-sum(dbinom(0:19,27,0.7308)) # P of winning more than 19 times in 27 tries.
Understanding the code: I run a very similar line of code to the one I used to calculate Tom Brady's chance of losing his 20th post-season home game except I subtract the entire probability by 1. This gives me Tom Brady's probability of winning more than 19 times in 27 games.
To help visualize Tom Brady's statistics and the probability of winning here is a graph I created:
This graph represents each probability Tom Brady has of winning his X number of post-season home games. The probability begins at 1 because of 26 post-season home games as Tom Brady is bound to win a certain number of 26 games given his success rate. As the number of wins increases, Tom Brady's chance of winning begins to decrease.
tbwins <- c(0:20)
tbtotalgames <- 27
probability <- .7308
tb<- data.frame(tbwins=tbwins,probabilityXOrMoreWins = 1-pbinom(tbwins-1,tbtotalgames, probability))
ggplot(tb, aes(tbwins, probabilityXOrMoreWins)) + geom_point() + geom_line(color="red") + labs(title = "Probability Of Tom Brady Winning X Or More Home Playoff Games Of 27 Total", x = "Number of Wins", y = "Probability")
Understanding the code: I begin creating my graph by making a list with the values 0 through 20 which represents every post-season home game Tom Brady will win. Once again I set totalgames to 27 and probability to .7308. Next, I create a new data frame that categories tbwins and calculate the probabilityXorMoreWins. Finally, I use ggplot() to create a line graph with points that shows the distribution of Tom Brady's probability of winning 19 or more games.
While this information may sound confusing at first, try to think of it in terms of how teams actually perform. A team winning 25 out of 27 post-season home games is an insane record and relatively hard to achieve given the amount of talent there is in the NFL. As Tom Brady wins a specific amount of games, his probability of maintaining the same proportion of wins naturally decreases.
This is a similar concept regarding the NFL's regular season. If an NFL team wins the first 5 games on their schedule, that's not commonly seen so their odds of losing their 6th, 7th, or 8th games increases with every win. However, it is always important to realize it is just as likely to win all, two, one, or none of those 6th, 7th, or 8th games.
These are all probabilities so nothing is certain and the impossible can definitely happen as we have already seen with Tom Brady going to the Super Bowl now 10 times or having a 16-0 winning record in 2007. Regardless, accurate predictions of trends can be noticed and be accounted for.
My work is not an accurate prediction of the Super Bowl's outcome because I only used statistics and trends revolving around Tom Brady, so I would not place a monetary bet based on my results as this is just practice for fun!
If I gather data on both the Tampa Bay Buccaneers and Kansas City Chiefs regarding their chances of winning each regular and post-season games based on the skill levels of each team, then my probabilities would likely be more reflective of the actual outcome of the Super Bowl. If every game is controlled by Tom Brady then these results would be more accurate, but that's not how football works.
Regardless, using Tom Brady's stats and other information is great practice in learning how to calculate probabilities, correlations, and linear regression models which are useful skills applicable to many other situations.
If I were to make a prediction, TB12 will win by 12. Why? Well, Tom Brady doesn't loose in the South and there seems to be a trend of 2 point differences between the final score of the Super Bowl, but probabilities are not certain and the game can always go either way.
The Super Bowl takes place on February 7th, 2021 at 3:30 PST so you'll have until then to gather as much data and insights as you can to make your best prediction as to who will win the Super Bowl!
Django Powered Blog for Affiliate Marketing
A Django powered blog and product showcase for affiliate marketing from "Building a Django Web App ...
React Chatbot for Lead Generation
A basic React chatbot component with a pre-built component to handle posting. Built using Lucas ...
Follow us @ordinarycoders
Post a Comment
Join the community
April 25, 2020, 2:43 p.m.
Feb. 17, 2021, 9:50 a.m.
Feb. 10, 2021, 4:55 p.m.
Jan. 26, 2021, 10:08 a.m.