Scaling mountains starts with a single step
Predicting athlete injury risk has been a holy grail in sports medicine with little progress to date due to a variety of factors such as small sample sizes, significantly imbalanced data, and inadequate statistical approaches. We also noticed that there is a plethora of publicly available data ripe for analyses as foundations which can be built upon with unique team and player data.
Realities of not-that-big data(sets) have limited machine learning in Sports
Data modeling approaches which fail to account for multiple interactions across factors and limited sample sizes can be misleading and show hyperbolic results. We addressed this by collecting longitudinal data of NBA player injuries over a ten-year period and applying “feature learning” to draw out where otherwise non-significant factors become meaningful in combination with one another. Using 32,000 parameters including past injuries, game activity, and player statistics, we developed a state-of-the-art Deep Learning model to forecast future injuries.
With limited and messy data, the model still outperformed all competitors
Upon evaluation using metrics appropriate for imbalanced data we found that our model performs twelve times better than traditional machine learning approaches as well as more than twenty times better than the most advanced injury-forecasting model published to data.
We’re just getting started
This is merely our proof-of-concept, validated approach to demonstrate a use-case in one league. Unique team and player biometric data will significantly enhance both the accuracy and utility of our models. This can be used by practitioners and front offices to improve athlete management and reduce injury incidence, potentially saving sports teams millions in revenue due to increased player availability.