Is There A “Magic Number” For Arsenal To Win The Title?
In our day and age, the quest to find meaningful numbers in the beautiful game is more intense than it has ever been. In many ways the most popular sport on the planet is still woefully behind the likes of baseball, (American) football, basketball, and other sports when it comes to the use of statistical analysis to glean meaning and better predictive ability of success on the pitch. Each year it seems we are finding new stats to judge players and teams on as well as new models to better ascertain or predict who the best (whether it be a player or a team as a whole) is.
This article comes about after musing (not for the first time) over just how “far off” a team like Arsenal was from being a serious title threat purely from a goals forced/allowed perspective.
Note: I use the terms “forced” as synonymous with goals scored, goals for, etc., and “allowed” synonymously with conceded. I shouldn’t have to make a note but I’m only recently realizing my terms are more a personal habit and not widely-accepted terminology.
When you examine Arsenal’s past season performance purely in terms of goals forced and allowed, you see there isn’t a whole lot of difference between where Arsene Wenger’s men performed compared to those who finished ahead of them. The Gunners scored only one goal less than the champions (while scoring twelve less than Manchester City) while allowing four more than Chelsea (and two fewer than City). Based on those numbers there you could make the argument that we’re not far off the title pace at all.
But surely we can do better than just guessing how many more we need to score or how many fewer we need to concede, right? After this discussion Soham Samaddar (@SohamSamaddar) came to me curious about whether there is a particular forced-to-allowed (F/A) ratio that any team could or should aspire to in hopes of capturing a Premier League title. Before we get into that, let’s take a look at some other possible metrics and see if and how they might stack up as predictors of championship success.
Note: Premier League data goes back to the 1992/93 season but we will only go back as far as the 1995/96 season which was the first year the Premier League changed to its current 20-team 38-round format.
Goals Forced, Allowed, and Goal Difference as Predictors of Success
Goals Forced (Scored): I think everyone understands this one. If you score the most goals you should therefore be the best team and lift the crown more often than not, right? Out of the past 20 years of Premier League results (again, going back to 95/96 and not 92/93), the top scorers in the league have lifted the trophy 13 times. Below are the instances where the top scoring team did not:
|Year||Winning Club||Goals||Highest Scoring||Goals|
1 Chelsea and Manchester United tied for the most goals scored on the season but Chelsea won the title with an 8 point gap.
2 Manchester United were not only outscored by Liverpool, they also tied 3rd and 4th place Chelsea and Arsenal with that 68 goal total.
13 times out of 20 isn’t a terrible probability but there are two years definitely worth pointing out where this becomes troublesome. In the 2009/10 season, Chelsea set the single season goals record with 103. Based off that you would not have been faulted for thinking that surely this was a historic season and that Chelsea absolutely obliterated the league table, but you’d be wrong. The gap between the champions and 2nd place Manchester United? One point.
Even worse, let’s look at Manchester City from two seasons ago (13/14). They came to the brink of Chelsea’s record with their own 102 goal tally and yet they only won the league by two points. Imagine how Liverpool must feel after scoring 101 goals (3rd most all time in the Premier League) and losing the title by that slim a margin.
Looking at goals forced alone isn’t a great way no matter how you cut it. The difference between highest scoring champs and lowest scoring champs is 35 (09/10 Chelsea’s 103 and 68 goals which was produced by 97/98 Arsenal and 08/09 Manchester United). That’s quite a spread when trying to pick out a magic number there. Take the average goals forced for a champ and you come up with 80.45.
If you scored 80 goals in a season, would that be a good predictor of success in the league? Based on the twenty years of BPL history we’re looking at here you’d have about a 75% chance (15 out of 20) of winning the league if you put up that goal total. In history though we’ve seen six times where a goal tally of 80 was not enough for a 2nd place team to lift the crown.
The reason for this is pretty simple. It’s not just a game of scoring. While putting up the most goals in the league is always going to be a huge advantage, if you don’t have a great defense your odds of lifting the cup can be adversely affected.
Goals Allowed (Against): In a way I like this metric as a predictor even better than the offensive side but there are some distinct areas of worry. For one, the gap between the most goals allowed for a champion and fewest goals allowed for a champion is 30 (99/00 Manchester United’s 45 and 04/05 Chelsea’s 15). While this gap is smaller than the goals forced gap noted earlier, the magnitude of it is even greater when we take into account just how valuable each goal let in is compared to each goal you put past an opponent.
On top of that, if you look at the average number of goals allowed for a champion you get right at 32 (32.05 to be exact). Based on the twenty years of results, letting in 32 goals in a season would only have been good enough to win you the title 12 times (twice the champ allowed 32 goals). Eight times have a runner up allowed 32 or fewer (in some cases drastically fewer) and failed to win the title. That doesn’t sound like a great indicator of success either.
Your confidence in that 32 goal benchmark is even further reduced when you look at the average number of goals allowed of all the 2nd place finishers. That number is only 33. You’re looking at a single goal on average being the difference between a champion and second place. That doesn’t inspire a lot of confidence either when you consider that 33 goals (the average goals allowed of runners up) could have won the title 10 out of 20 times.
We’ve seen some teams that had all-time great defenses fail to lift the trophy, most notably the 98/99 Arsenal team which allowed only 17 goals that season (2nd fewest all time in Premier League history) only to lose the title by one solitary point to a Manchester United squad that let in a full 20 goals more.
Note: I’ll be making a point to return to the 98/99 Arsenal side later in the article because out of the twenty years of Premier League results, that was one of a very few utterly mind-blowing historic sides numbers wise.
Goal Difference: As we move along we get closer and closer to where myself and Soham were initially wanting to go. Goal differenc (GD) is another one of those metrics that you would think as being an obvious indicator of championship success and while this will generally be the case we come across issues just like relying solely on goals forced or goals allowed.
Will the best GD win you the title? Out of 20 seasons, the highest GD won the title 15 times (75%). This is pretty much the same as the highest scoring team winning the title and better than the best defensive team (which only won the title 11 out of the 20 times, or 55%).
The average championship winning side’s GD is 48.4. This number would have won the league 12 out of the 20 seasons we’re measuring. We’ve also seen five instances in runners up where this number wasn’t enough to surpass their opponent and one time (08/09 Liverpool) where their GD was both better than the average (50) and the champion (United with 44).
The average GD gap between the champions and the runners up is 7.45 which is a better gap than what we’ve seen between champions and runners up for goals forced and goals allowed, so perhaps that magic 48 GD is something to aspire to.
Examining the ratio of Goals Forced to Goals Allowed (F/A)
All three of the previous measurements we’ve looked at have varying degrees of success at figuring out a team’s title winning chances depending largely upon what angle you look at it from. To me that doesn’t work all that well. And I think we can all agree that the best teams are usually ones that can combine offense and defense together over the course of the season.
This brings us all the way back to our original question. Is there any type of correlation between a forced-to-allowed ratio (hereby known as F/A) and title winning success? To start with we took a look at only the title winning sides of the past twenty years, plotting out all the data for goals forced, allowed, goal difference, and eventual point haul at the end of the season. We then took the F/A ratio by simply dividing the number of goals forced by the number of goals allowed.
One thing that immediately stands out is that we’re looking at a much smaller range of values than we’ve been seeing elsewhere. To me this says that F/A ratio might be a much better commonality between successful teams than other metrics.
The next thing that immediately stands out to me (and not just because I have it highlighted in yellow) is the 04/05 Chelsea side for the sheer holy shit hot damn look at that F/A ratio of it all. Just looking at that number right there in comparison to all of the other champions in the BPL and it is no wonder why a lot of people consider the 04/05 Chelsea superior to the 03/04 Invincibles of Arsenal. Not only did they record the highest point tally in league history but they had the best defensive record in league history and combined it with a decent enough offensive side that allowed them to score nearly 5 goals for every goal they let in. That is remarkable. So remarkable that I wound up doing extra calculations which discounted that year because of how much of an outlier it was.
Now that I’ve properly enthused over that 04/05 Chelsea number, we can start to drill down a little bit more. The average F/A ratio of a title winning side was ~2.66 and ~2.54 sans the 04/05 Chelsea side. Taking that second value you get pretty much a 50/50 split.
How does that compare to the runners up?
It looks better when you take a look at that 2.54 and compare it to the average F/A of runners up (right around 2.31 either way you cut it). That 2.54 is also good enough to top all but six of those who came in 2nd place over the past 20 years.
There have also been five times in the past twenty years where the champion had a worse F/A than the runners up.
|96/97||Manchester United||1.727273||Newcastle United||1.825||-0.09773|
Out of those five listed we see only one example where we could say a great team (at least as measured by F/A ratio) was outdone by a runner up. The 08/09 United side was clearly better than the 2.54 (or even the Chelsea-weighted 2.66) and their 90 point haul is indicative of a very strong team. The Liverpool of that year had a better GD by way of scoring and you couldn’t say they were any bit of a worse defense. Based purely on F/A this is the closest two teams have come to one another in a title race (though United wound up winning the title by 4 points).
But what really stands out in this table (as highlighted) is the 98/99 season. Based on F/A ratio, the Arsenal side that came up only a single solitary point short that year ranks as the 3rd best F/A team in Premier League history. The Gunners allowed only 17 goals that season (at the time a Premier League record) which was 20 fewer than the champions, Manchester United. That -1.30843 is the largest gap in F/A between champion and runner up in league history other than the 04/05 Chelsea side (which had a crushing 2.38 margin) and is far and away the biggest “upset” in terms of a runner up being numerically better than the champion.
The difference between that Arsenal side and the truly historic 04/05 Chelsea side was a complete inability to put the goal in the net. The 59 goals forced is the lowest for a runner up in 20 years and the second lowest all time in the PL (going back even to the 92/93 beginning). That one point difference between Arsenal and United in 98/99 surely must go down as one of the luckiest breaks in league title history (or bad beats, depending on what side you’re looking from).
F/A Ratios over the Years
When I first started calculating all of the F/A values for the champions and for the runners up the first thing that occurred to me (aside from my initial shock at the 04/05 Chelsea side) was that there appeared to be an upward trend. In other words, the champions each year appeared to be on an upward trend as far as F/A values were concerned. Similarly it seemed the runners up were also improving but by nearly the same degree.
Based on this eyeballing assumption I went and put all the values in a scatter plot for both Champions and Runners Up as two different groups. Once those were plotted it was only a matter of adding trendlines for both groups. As you can see in the chart above, there is in fact an upward trend for both the champions and runners up, albeit the one for runners up is far less significant than the champions.
What does this tell us? At first glance one could guess that the teams that win the title each season are stronger over time than their predecessors. And if the title winners F/A are trending up at a faster rate than runners up, the point differential between first and second place should be growing.
But based on the chart above that simply hasn’t been the case and in fact we’re seeing a very slight trend downwards which would indicate title races are tightening up over the past 20 years. So what other trends can we look at to try and dissect these F/A values over time?
One thing that is happening over time (and something I did notice when first putting the data together) was that the defense of the champions has been improving in terms of the number of goals they are conceding in a season. Strangely, the runners up are trending in the opposite direction. This right here might be enough to explain the growing disparities in F/A ratios because allowed goals are so much more valuable because of their rarity compared to ones scored by a team.
This can also be seen when you plot out the goals forced. Both champions and runners up are both trending upwards in almost identical trajectories which indicate a growing firepower of top teams in the league. Combine that with the opposing trendlines for allowed goals and it becomes clear why champions are gaining ground F/A wise.
None of this properly explains why the champions are getting better faster but the title races are getting tighter. If both champions and runners up are improving offensively at the same rate and going in opposite directions defensively, that gap should be growing not shrinking. Why is it the way it is?
My personal theory on this is that for all of the talk about the Premier League being the strongest league top to bottom what we are actually seeing is an overall greater concentration of talent in the top half of the table. The league title used to almost always be a two horse race but now you arguably have three and four teams that go into each season with realistic title hopes. Add in to that a growing collection of teams competing for the Europa League spots and suddenly the Premier League is starting to be anywhere from 5 to 10 teams deep that can take points off you any week.
With more teams in the two or three contender “zones” (Title, Champions League, Europa League), there are more chances of getting scored on and it is the champions of the league each season who ride better defenses that limit those conceded goals compared to their competitors.
This also likely coincides with a bit of a weakening at the lower rungs of the league, allowing the better teams to score more goals against the bottom-feeders and keep that forced goals rating climbing. If the better teams are making their gains more against the poor teams it would also explain in some small way the decline of English clubs in the UEFA Champions League. More teams are ascending to that top level of the league but that level isn’t always necessarily growing in quality at the same rate as top teams from other leagues around the continent.
Digging Even Deeper: Predictive Power of F/A and Title Probability
I am more than willing to admit that my work is rudimentary at best. I’ve only been comparing champions and runners up. A bit of a small sample and I am hardly one qualified to be doing anything more complex. While I have heard of things like confidence intervals, chi squares, and regressions, the last time I did anything even remotely resembling actual statistical analysis was 12 years ago as a first year university student.
And this is why Soham suggested we take what we’ve done so far and give it to a friend. Arunabha Sengupta (@Senantix) is a cricket historian and Chief Cricket Writer at Cricketcountry, and while we’re not too interested in the cricket, we are definitely interested in that post-graduate degree in statistics. Arunabha was kind enough to take our work and go a few steps further.
Arunabha went a step beyond the top two and gathered the data for the top five of each season to give us a full 100 data points to work with instead of our 40. After that he did his magic to give statistical probabilities of league title success versus F/A ratio.
In his own words, explaining the use of binary logistical regression:
“Many a times in life we encounter problems of inferring how one variable depends on others. Think of a simple situation where you are not quite sure how the salary is structured in a company. However, you do know that three of your colleagues earn basic pays of 15,000, 17,500 and 20,000 and the corresponding Take home salaries they carry back are 28,887, 34,012 and respectively 39,005. (You can fill in whichever currency symbol that suits your dreams).
It should be more or less reasonable to conclude from the available information that at the end of the month, the money deposited in the bank is approximately [(2 x the basic pay) – 1000]. Having derived this equation, you know that if your basic pay is 25,000, you can expect around 49,000 as your take home salary.
The statistical technique most often used to find out the relation of the resultant with the influencing variable(s) is known as regression analysis. (In this example, take-home pay is the resultant variable and basic-pay is the solitary influencing variable.)
However, the problem we are looking at is slightly different. Here we do have a numerical influencing (x) variable representing goals for vs against ratio, but the resultant like emoticon variable can take only two values, winning championship and not-winning. In other words, the y variable is binary – where a win may be denoted as ‘success’ and not-win as ‘failure’.
In such situations, the statistical technique used to look at the data and decipher how a numerical x influences a binary y is called Binary Logistic Regression. This particular method takes all the available past data into consideration, and predicts the chance (probability) of the y variable being a success given the numerical value of x.
In other words, Binary Logistic Regression equation, fitted on the 100 data points denoting gf/Ga along with the team results in championship, provides us with the estimated probability of winning the championship given a certain ratio of gf/ga.”
Note: I will be providing the full Microsoft Excel (.xlsx) files for both our original work and Arunabha’s further analysis if you would like to dig in and see everything for yourself.
That’s a pretty graph. Once you factor in all 100 top five finishes over the past 20 years you see the inflection point squarely at the 50% probability which coincides with an F/A ratio of 2.8. So based upon this model, a team that can manage to score 2.8 goals for every goal they allow has a 50% chance of lifting the Premier League title at the end of the season.
For reference sake, going back to my original spreadsheet, a 2.8 F/A would have been better than the title winner 12 out of the 20 years and been the top F/A ratio 10 of those 20 years. So it looks like that’s pretty spot on, wouldn’t you say?
More accurately, 2.8’s probability value rounds up to .51 which some might consider the true magic number, where you now have a technically greater than 50/50 chance of winning the title. I wouldn’t necessarily agree with a 51% probability being the “magic number.” When I think of such a mythical analytical device I picture something which is something closer to a lock than a coin flip.
Want to have better chances of winning the league? Crank that F/A ratio up to 3 and you now have a 62% probability. Start going into higher values of F/A and you really start approaching the level of elite teams. A ratio of 3.2 has only been done by 5 league winners and 1 league runner up (damn you 98/99 Arsenal!) so if you hit that level (a 72% probability of success) you are far and away favorites for a title win.
How Does Arsenal Stack Up?
After looking at the probabilities assigned to each F/A ratio I started wondering how Arsenal stacks up and whether it’s even reasonable to shoot for that “magic” 2.8 where you now have a better than 50% chance of lifting the trophy at season’s end. How much improvement does there need to be from Arsenal before the club is once more a “true contender?”
To do this I took Arsenal’s F/A data for the past twenty years and compared each year versus that of the champion (just as before we compared champions to runners up). Champions are in blue and Arsenal is in red. In cases where Arsenal were the champions that year, they have been marked in gold.
First thing that hits me when this char comes up is the worrying downward trendline for Arsenal compared to the champions. If you just focus on that line you’d have reason to worry but when you actually examine where we are falling with relation to the champions in recent times, that gap really does look surmountable. In three of the past five seasons Arsenal’s F/A ratio has been very close to the champions. 2011 saw the gap at 0.43, 2013 at a barely-there 0.05, and this past season of 0.31.
You could, however, point to the gaps in 2012 and 2014 (1.70 and 1.10 respectively) as problematic but the bright side there is that the gap is largely due to champions who performed above average in terms of F/A. the 2011/12 Manchester City champions finished with a 3.21 F/A which is far above average (in fact only three champions and four clubs all time have bested that mark) and their 13/14 title also had them finishing above average with a 2.76, which in itself is close to that 2.8 magic mark we’ve highlighted.
When you look at Arsenal during that same time frame (2011-2015) there have been ups and down but everything has been fitting within the 1.5-2.0 range and we’re ending that five-year span at a high mark. And with the club currently set up as it is, it is not hard to imagine that F/A value in growing years as The Gunners continue to tighten up their defense (the 36 goals allowed in 14/15 is the lowest since 31 in 2008) and become more healthy on the goal-scoring front. The 71 goals scored in 14/15 is the lowest number of goals Arsenal have scored in the league since 2009 and it isn’t surprising given the long layoffs of players like Olivier Giroud, Aaron Ramsey, and Theo Walcott.
Focusing on that 2.5 average for champions, if we go back and look at the past twenty years of Arsenal finishes, we’ll find surprisingly that Arsenal have only eclipsed that mark twice.
The 03/04 Invincibles with the 2.8 right square in line with our magic number and then the ridiculous 98/99 side with that absurd 3.4. Interestingly, our title-winning 97/98 side didn’t come anywhere near the average mark for a league champion.
If you were to go by this historical record, a 2.5 looks far away but these numbers can swing in a more positive fashion by saving just a few more goals here and scoring a few more goals there.
Achieving a 2.5 F/A ratio next season (let alone a 2.8 for the 51% probability of title success) will not be easy but it is certainly achievable if progress is made both offensively and defensively.
Context, Conclusions (Mostly Duh), and Other Interesting Bits
Overall this has been an enlightening look into some Premier League history and using math (some basic, some not so much) to understand what relationships there are between scoring, defense, combinations of both, and league success. We’ve always heard of discussions about “magic numbers” whether it is goals scored or points achieved in a season and this was our little attempt find similarly ourselves.
Instead of finding a value which guarantees success we’ve perhaps hit on some common sense truths we’ve always known about the need for a balanced team that is both capable of putting the ball in the net as well as stopping their opponent from doing similar.
You can clearly tell from everything up to this point I’ve developed an interest in viewing prior league results through this new prism we’ve developed. You can start to draw conclusions about the clubs themselves and how a league played out and sometimes you wind up being right and other times you miss out on truly historic teams.
As we’ve seen in even recent times, an elite offense (13/14 Liverpool) can carry you a great distance but not always over the hump. And based on our spreadsheet we see Liverpool were historically bad defensively (for a runner up). Comparing their F/A ratio to the champion Manchester City and you wouldn’t think the league title would have been so closely decided. And yet…football.
In 2012/13 season, Manchester United wound up walking to the title on the back of Robin Van Persie’s golden boot winning season a full 11 points ahead of their sky blue local rivals. Look at both teams’ F/A and you see a picture developing of two fairly mediocre clubs (relative to other historical top twos) separated by a hair in terms of quality. And yet…football.
Look at Arsenal. Every one of us point to the 03/04 side as our best ever (and the PL’s best ever) because we see the W/D/L record and the gold trophy but when you look at the numbers, it was our losing 98/99 side which appears so dominant and should have taken the league. Arsenal’s 3.47 F/A would have placed it between 80-84% probability of league title success, while United’s 2.16 put their title probability all the way down to 17-21%. And yet…football.
As much as it pains me to say it, the 04/05 Chelsea squad truly does have a strong case for being the greatest Premier League side ever from a numerical standpoint. Record-breaking defense. 2nd highest point gap in league history. But it’s not that side that holds the golden trophy. It was Arsenal who, with a probability percentage right at 50%, were the ones who did just enough each and every game to ensure they finished a season with a 0 in the loss column. From this bird’s-eye level the 04/05 Chelsea must have been better or more special than the Invincibles. And yet…football.
In a way I like that you can’t just point to the numbers such as this and draw a 100% accurate conclusion about a club or a season. Even with all of our new statistics and analytical models the beautiful game still largely transcends numbers in a way that most other sports can’t. The fact that we can’t draw sweeping conclusions from the numbers in a box score means every game remains a must-watch because it is that artistry and passion that often times can be the deciding factor between who wins and loses more than sheer offensive and/or defensive output.
Download Data Files
- Premier League Champion, Runner Up, and Arsenal Data (.xlsx)
- Arunabha Sengupta’s F/A Ratio Probability Data (.xlsx)
Advertise your business here! Click here for details .