Let’s learn analyzing Football matches using a bit of math, data science, python.
To understand why I am writing this post, here is a bit about me. I am in the software industry for just over a decade, have worked across various domains. I am a Manchester United fan since 2004. I play football regularly (2-3 times / week) and I wrote an open source side project called TrackFootball.app to track casual football games (you do the tracking with your watch) and get heatmap / sprints.
While I am neither in the football industry or analytics industry professionally. I love the game of football and have learned a lot from the sport about life. It has been very kind to me and this is my way of giving back to the beautiful game.
Besides, I have ulterior motives. I want Manchester United to win the premier league again and I have taken the matter in my own hands. Just kidding, I recently re-learned math up to high school (+ some parts of University) level and I was looking to do some side projects to stay in touch with math. I also want to explore how working at a club as a data analyst might look like, so, not only I would go towards the basics but briefly touch on what it needs to do cutting edge work in this space as that is my aim. I want to spend whole of 2026 learning this to a level where I can start testing my ideas about the game holistically with data.
Overall, I just want to have fun with math and football.
What will you learn?
There are several different roles and skillsets in a data analytics team in any football club. It could be anywhere from a 1 person “team” just setting things up (as analytics and football clubs are still relatively new, though Premier League has awoken to data.) to a full squad of data engineers, data scientists running advanced techniques like ghosting (more on that later) or supporting training drills against weaknesses of your opposition.
With roles this broad, I intend to touch on the building blocks to answer several questions with data, my aim is to motivate you to dive into the space with curiosity. At the end of this blog post, you should be comfortable to explore the space on your own and I will link several excellent resources for you to go on.
As I have already mentioned that I am not in the Football analytics industry (yet anyways). This has some pros and cons, biggest pro being that I would dive deep into a lot of things with fresh eyes and first principles (and a lot of experience in software engineering and databases in general). The biggest con would be that I will have to “guess” a lot of real data analysis requirements, my guesses would be based on talking to others in the industry and watching videos, reading papers from others who are established in the industry.
Prior experience: if you have written any software at all, this blog post will suit you. I will try to cover the rest of the skills up to a basic level within the post. I you have never written any software. I suggest bookmarking this and playing around with Automating Boring Stuff with Python. While this blog post is targeted towards those who speak Python, it is not a necessary skill to join a data analytics team in a Football team.
Broadly, we will use the following tools / skills:
- Python + Jupyter, python is a programming language and Jupyter is a common tool used in data science industry. It can be used with minimal setup. For following this blog post, executing snippets in Google CoLab (free) should do.
- We will work with “event data”, which means tracking passes, shots, goals etc. from a match in a big file. For this post, we will work with Manchester United 3 v 2 Arsenal game in 2016. This was Marcus Rashford’s premier league debut. Here is the relevant event file. Credits to StatsBomb. TK insert their logo
- Data visualization, showing technical information to coaches and other experts in their language is part of the job. This would mean expressing ideas as pictures on a football field or a small video showing an insight. This would mean loads of plotting.
- Statistics, I will show you some cool things that you can do with statistics.
The following questions might be common in a football club and could be assisted by a data analytics team, we won’t answer all of these in this blog post but I want to give you ideas about how wide the use might be:
- Visualizing data nicely for staff and even for fan engagement
- What is the load of a player, can you use math and statistics to prevent injury?
- Compare two players on a specific stat? Like goals as a substitute. Which player is the better super sub?
- Explore pass networks of opponents to find (and mark) key player(s).
- Support recruitment by finding players with similar profiles as the club is looking for. This might involve advance techniques like ghosting imitation to slot a prospect in your team to see how they might influence the game.
print('hello')Basics of Visualization
Let’s start with basics of visualization.
Challenges
- data is not easily available