Building NBA Insights AI — Weeks 3 to 5 Recap
- Aykut Onat
- Jul 6, 2025
- 2 min read
Updated: Jul 10, 2025
When I started this project, I set out to answer one question: Can Machine Learning reliably predict NBA games using only past stats and context? Weeks 3 through 5 were when that vision started turning into a working system.
Week 3 — Teaching the Model to Think Like a Coach
By Week 3, I had already collected and cleaned data from the 2023–2024 NBA season. It was time to train the model. But I didn’t want to just throw raw numbers at it. I engineered rolling averages — points, assists, rebounds, turnovers, and field goal percentages — to simulate how a coach might evaluate a team’s current form rather than season-long stats.
We experimented with two modeling approaches: Logistic Regression for interpretability and XGBoost for performance. Logistic regression gave us a baseline, but XGBoost — with its ability to capture non-linear relationships — quickly outperformed.
Using these features, we trained our first binary classification model to predict win/loss outcomes. The model evaluation was based on accuracy and log loss, and we split our data into training and testing sets to validate its performance.
Week 4 — Putting the Model to Work
A model is only as useful as its interface — and Week 4 was all about making our model accessible and testable.
We built a Streamlit interface that allowed users to either manually input data or upload CSVs to simulate NBA matchups. The key function predict_game_result() made it possible to generate outcomes for individual games or entire seasons in batch mode.
We also created a batch inference pipeline to simulate entire weeks of the NBA schedule, calculating predicted wins and comparing them with actual outcomes.
To visualize this data, we published Tableau dashboards displaying:
Predicted vs Actual Wins per team
Team-level shooting percentages in wins vs losses
Simulation performance charts over time
Week 5 — Smarter Features, Smarter Model
This was the week we leveled up — transforming our model into something context-aware and much more accurate.
We introduced several new game context features, including:
WIN_STREAK: Current streak to reflect momentum
REST_DAYS: Days since last game
Opponent_WinRate: Cumulative win rate of the opposing team up to that game
DateOrdinal: Numeric representation of game date for time-aware modeling
Team_WinRate: Rolling win rate of the team
After updating our feature set, we retrained the model using GridSearchCV to fine-tune hyperparameters like max_depth, learning_rate, and n_estimators.
We evaluated the new model on both the 2023–24 season (seen data) and the 2024–25 season (completely unseen data).
The results were promising: Over 70% prediction accuracy on unseen data — a huge leap forward.
Stay Connected
This project is fully documented through videos, and blog posts. You can follow every step:
Access Code : A step-by-step project implementation guide is available on my GitLab.
YouTube Playlist:


Comments