Real-time soccer video analysis

Marco Leo, Nicola Mosca, Paolo Spagnolo, Pierluigi Mazzeo, Tiaziana D'Orazio, and Arcangelo Distante,  "Real-time multiview analysis of soccer matches for understanding
interactions between ball and players", presented at 2008 International
Conference on Content-based Image and Video Retrieval
, Niagara Falls,
Ontario, Canada, pp. 525-534.

For my review, read on.  Comments welcome.

In my post titled "Moneyball and soccer", I mentioned a technical paper that presented a system to analyze ball/player interactions on a soccer field in real time. The authors are members of an Italian research group at the Institute of Intelligent Systems for Automation, a division of the Italian Research Council based in Bari. The Institute has signed an operating agreement with the Italian soccer federation (FIGC) that will implement prototypes to analyze and detect dynamic moments during the course of a match.  In particular these devices will look for offside, goal line crossings, and events within the penalty area. 

This paper presents a multi-camera system that processes image data in real-time in order to locate and track the soccer ball during play, identify the player in possession of the ball, and record interactions between the ball and player.  This system would enable a number of applications, such as identification of key events for a highlights package, referee decision support (offside and goalline decisions), tactical analysis, and the development of team and player performance metrics.

A suitable image analysis tool in football has to achieve high- and low-level tasks:

  • Low-level – Segmentation and tracking of images in the time domain.  The analysis tool has to be able to identify the ball or player (modeled as a 'blob' in the image processing terminology) and track that blob in the presence of background noise or other blobs.  These tasks reside in the area of estimation and tracking, and there is a lot of research activity in the development of filters and algorithms to accurately estimate and track objects in the presence of certain types of disturbances.  The paper mentions at least ten previous works that focus on low-level tasks.
  • High-level – Identification of semantic events.  The analysis tool has to be able to identify who has the ball and determine what he is doing to it – is he dribbling? passing? shooting?  In order to develop better statistics for soccer, this is the information that we will need.  Well, those lower-level tasks are open research questions, and most of the activity is being focused in that area.  The paper only referenced two papers that addressed the topic in a very basic way (Beetz et al 2006, Zhu et al. 2007).  Beetz et al. used image data to identify the player in possession of the ball and identify passes and shots on goal.  Zhu et al. attempted to identify dribbling and shooting of the ball.

As you can imagine, this image analysis problem is very hard.  (My PhD advisor's lab worked on these problems in different contexts, so I know just how difficult this problem is!)  What makes it difficult are the following:

  • Object estimation/localization is not sufficiently robust
  • Required estimation bandwidth – how quickly do you need to identify objects and events?
  • Lots of occlusions (players or the ball out of view of the camera or hidden behind other players)
  • Identifying and classifying events is hard, identifying players even more so
  • Processing time – how much time is required, and can it be done in real-time, or even quasi-real-time?  Some applications may not require an answer right away (e.g. stats, tactical analysis, or highlights), but for others you need an answer immediately (e.g. referee decisions)
  • Processing space/power required – it's going to be measured in gigabytes.  Memory is cheap but for this size, storage will still cost a non-trivial amount.

This paper describes the multi-view system developed by the group at IISA.  The system uses six high-resolution video cameras mounted above the field, three on each side (touchlines).  The image output of each camera is triggered at the same time and sent to a processing node, which handles the low-level tasks, such as image segmentation and blob classification, tracking, and identification.  The processed image data are then sent to a central processing unit that carries out the high-level tasks.  These tasks involve fusing the data from the six cameras in order to reconstruct the three-dimensional ball trajectory and infer the player actions.  The majority of the paper is devoted to explaining the algorithms used to estimate the 3D ball trajectory from the image data using homography (which is a mathematical map — a matrix — between the camera image and a 'virtual' field), and the heuristic used to identify the player-ball interactions.  It is interesting that the ball trajectory is modeled as piecewise straight lines between identified ball location.  It works fairly well, except in situations described below.

The group tested their approach during Udinese's home league matches in the 2006-07 Serie A season.  They recorded 11 minutes of image data from the match (actually they recorded images from four matches, but they could only do qualitative analysis from those).  In those 11 minutes they were able to employ a human operator to identify players with the ball, ball events, and offside calls, so they had a truth baseline against which to compare the results from the image analysis system. The system was able to correctly identify shots and the player in possession of the ball more than 90% of the time, with very few false positives (< 5%).  The system is also able to accurately determine the time that events occured; 70% of the events were localized within .20 seconds of the correct time, and 83% within .32 seconds of the correct time.  That may not be fast enough for quick passes near the penalty area, but it appears to be good enough for long passes from the defensive half.  The system is susceptible to balls hit with spin, most of the false positives and inaccurate identifications occurred because of an inability to estimate the trajectory of spun balls.

There are a few areas that in my opinion could be improved or further studied.  The 3-D ball trajectory is modeled using straight lines between estimates, and while it works well for most cases, it's clear that it breaks down for curved ball trajectories.  That estimation problem is much more difficult, and I can easily see a trade-off having to be made between accuracy of the trajectory and computation time.  I did not see any mention of the system performance during corner kicks, where the ball trajectory curves significantly.  The estimation problem could also be solved by obtaining more estimates of the ball locations.  I see another accuracy vs. time trade-off there.  The inference algorithm is basic, but it works well so I don't see why that would have to be refined.  It would be nice to see how this system performs for a full 90-minute match.  Over the course of a full match, there would be more opportunities for false positives and occlusions.  It will be interesting to see if the error percentage goes up significantly in the course of a complete match.

In the end, I think this work makes an important contribution to a very difficult problem, and can enable a number of very high-level tasks in the game of soccer, from referee decisions to performance analysis.


[Beetz et al. 2006]  M. Beetz, N. v. Hoyningen-Huene, J. Bandouch, B. Kirchlechner, S. Gedikli, and A. Maldonado, "Camera-based observation of football games for analyzing multi-agent activities", In Proccedings of AAMAS '06, p. 42-49.

[Zhu et al. 2007] G. Zhu, Q. Huang, C. Xu, Y. Rui, S. Jiang, W. Gao, and H. Yao, "Trajectory based event tactics analysis in broadcast sports video", In Proceedings of MULTIMEDIA '07, p. 58-67.