Premier league: Has the video assistant referee impacted the rankings?

There has been much debate about the video assistant referee (VAR) when it was introduced last year (in 2019). The goal is to lead to fairer refereeing, but concerns are high: will this really turn out to be the case and won’t it break the rhythm of the game?

We will let football analysts – or soccer analysts depending on where you read this article from – answer that question. But one thing we can look at is how VAR has impacted the league so far.

We gathered data containing all the premier league goals of the 2019/2020 season and computed the standings with it. Then, using atoti’s simulation capabilities, we evaluated a number of scenarios:

  • What would happen if VAR-cancelled goals were still counted?
  • Would the rankings change if we use the scoring system that was common before the 1990’s, where only 2 points were awarded to the winner?
  • Would Leicester reach the bottom of the standings if Vardy had not scored?

The Data:

The data we used is composed of events. An event can be anything that happens in a game: kick-off, goal, foul, etc. 

For the purpose of this example, we only kept events with EventType Goal or Kick-Off:

Image1: First lines of the data set at 4-1 victory on day 1 for Liverpool.

We then just have to import the data and create a cube:

events_store = session.read_csv("events.csv", sep=";")
cube = session.create_cube(events_store)

And we are ready to start creating our model!

Computing the score of each match:

What we first did is to compute the score of each team for a particular match in the season. From there we were later able to calculate how many points each team had won for each day of the season, and then compute the rankings.

For that, we just had to create a measure that counts the number of events of EventType Goal. Then we will be able to evaluate it for a particular match and team.

m["Team Goals (including Own Goals)"] = tt.agg.sum(
        lvl["EventType"] == "Goal", tt.agg.count_distinct(events_store["EventId"]), 0.0

As the name of the measure suggests, this measure also counts goals that have been scored by players, but against their team. Those goal events are flagged with IsOwnGoal = True. We are then able to create a measure to isolate them:

m["Team Own Goals"] = tt.agg.sum(
    tt.where(lvl["IsOwnGoal"] == True, m["Team Goals (including Own Goals)"], 0.0),

And compute the real goals that were scored for the team by doing the difference between the last two measures:

m["Team Goals"] = m["Team Goals (including Own Goals)"] - m["Team Own Goals"]

At this point we can already have a look at the total goals for the season by each team using  cube.visualize() in the notebook:

Or if we drill down on days, see how many goals a team has scored on each day of the season:

To compute team scores we also have to calculate the opponent’s goals and the opponent’s own goals for each team. In order to do that, we will simply re-use the measures we previously created and define a new one that retrieves the Team Goals of the opponent:

m["Opponent Goals"] = tt.agg.sum(
        m["Team Goals"], {lvl["Team"]: lvl["Opponent"], lvl["Opponent"]: lvl["Team"]},
    scope=tt.scope.origin("Team", "Opponent"),

And do the same for the opponent’s own goals.

The score of a team is then equal to the Team goals plus the opponent own goals:

m["Team Score"] = m["Team Goals"] + m["Opponent Own Goals"]

And the opponent score is the opponent goals plus the team own goals:

m["Opponent Score"] = m["Opponent Goals"] + m["Team Own Goals"]

We can now have a look at the results of each game with another cube.visualize():

Now if you were to compare these results with the actual results, they would be different. E.g.


This is because the events also include goals that were later cancelled by the VAR. To get the actual results, we simply filtered on IsCancelledAfterVAR = False:

Computing the rankings:

Now that we have the score of each game, we can compute the rankings. 

Following the FIFA World Cup points system, three points are awarded for a win, one for a draw and none for a loss (before the 1990’s, winners received two points).:

m["Points for victory"] = 3.0
m["Points for tie"] = 1.0
m["Points for loss"] = 0.0

Then we simply have to compare the team score against the opponent score at match level (League / Day / Team is the key of a particular match) to compute the points won by the team:

m["Points"] = tt.agg.sum(
        m["Team Score"] > m["Opponent Score"],
        m["Points for victory"],
            m["Team Score"] == m["Opponent Score"],
            m["Points for tie"],
            m["Points for loss"],
    scope=tt.scope.origin("League", "Day", "Team"),

Before looking at the results, since those points are computed without filtering the VAR-cancelled goals, we defined another measure to remove them and retrieve the actual points:

m["Actual Points"] = tt.filter(m["Points"], lvl["IsCancelledAfterVAR"] == False)

And we can now have a look at how VAR impacted the premier league rankings, especially those highlighted in red suffered because of the point cancellation:

More than half of the teams have had their points total impacted by VAR. 

Though it does not impact the top teams, it definitely has an impact in the ranking of many teams. If the VAR had not cancelled any goal, Manchester United would have lost 2 positions and Tottenham 4!

If we look at the evolution of the points throughout the season, we can quite well understand why Liverpool was not impacted by VAR.

m["Points cumulative sum"] = tt.agg.sum(
    m["Actual Points"], scope=tt.scope.cumulative(lvl["Day"])

They are simply flying over the whole championship.

Simulation of a different scoring system:

Although we are all used to a scoring system giving 3 points for a victory, 1 for a tie and 0 per lost match this was not always the case. Before the 1990’s, many european leagues only gave 2 points per victory. The reason for the change being to encourage teams to score more goals during the games.

The premier league gifts us well with plenty of goals scored (take it from someone watching the French ligue 1), but how different would the results be with the old scoring system?

atoti enables us to simulate this very easily. We simply have to create a new scenario where we can replace the number of points given for a victory.

We first set up a simulation on that measure.

scoring_system_simulation = cube.setup_simulation(
    "Scoring system simulations",
    replace=[m["Points for victory"]],
    base_scenario="Current System",

And create a new scenario where we give it another value

And that’s it, no need to define anything else, all the measures will be re-computed on demand with the new value in the new scenario.

Surprisingly, having only 2 points for a win would only have Burnley and West Ham lose 2 ranks, but no other real impact on the standings. This can be seen more clearly in the chart below:

If you wish to explore the data used in this article or see the simulations live, you can have a look at our notebook on this topic.