002 — PROJECTS
SoccerML
Tech Stack
PythonPandasNumPyXGBoostPyTorchScikit-learnSQLAlchemyPostgreSQLJupyter NotebookBoruta
Role
Machine Learning Engineer
Team Size
2
Duration
September 2021 - June 2022
Project Overview
SoccerML is a football match result prediction system that integrates multi-source data (odds, historical records, real-time indicators, etc.) to build various machine learning models to predict home win, draw, or away win outcomes in football matches. The system uses multiple algorithms including SVM, XGBoost, and neural networks, combined with Poisson distribution and statistical analysis methods to improve prediction accuracy.
Highlights
- Multi-model ensemble prediction system (SVM, XGBoost, Neural Networks)
- Poisson distribution-based goal prediction model
- Feature engineering and automatic feature selection (Boruta algorithm)
- Real-time odds data processing and analysis
- Multi-version iterative optimization model architecture
Challenges
- Handling high uncertainty in football match results
- Integrating multi-source heterogeneous data and extracting effective features
- Balancing model complexity and prediction accuracy
- Responding to real-time changes in odds data
Solutions
- Designed modular data processing workflow
- Implemented multi-model voting system to improve prediction robustness
- Used Boruta and RFE algorithms for feature selection optimization
- Developed automated data collection and model training workflow
References
- Deep Generative Multi-Agent Imitation Model as a Computational Benchmark for Evaluating Human Performance in Complex Interactive Tasks: A Soccer Case Study
- Prediction of football match results with Machine Learning— arXiv
- Machine Learning in Football Betting Prediction— IEEE
- Investigating the efficiency of the Asian handicap— Journal of Sports Economics
- Bayesian modelling of football outcomes: Using the Skellam's distribution— International Journal of Forecasting
