top of page

Building NBA Insights AI — Weeks 3 to 5 Recap

  • Writer: Aykut Onat
    Aykut Onat
  • Jul 6, 2025
  • 2 min read

Updated: Jul 10, 2025

When I started this project, I set out to answer one question: Can Machine Learning reliably predict NBA games using only past stats and context? Weeks 3 through 5 were when that vision started turning into a working system.


Week 3 — Teaching the Model to Think Like a Coach


By Week 3, I had already collected and cleaned data from the 2023–2024 NBA season. It was time to train the model. But I didn’t want to just throw raw numbers at it. I engineered rolling averages — points, assists, rebounds, turnovers, and field goal percentages — to simulate how a coach might evaluate a team’s current form rather than season-long stats.

We experimented with two modeling approaches: Logistic Regression for interpretability and XGBoost for performance. Logistic regression gave us a baseline, but XGBoost — with its ability to capture non-linear relationships — quickly outperformed.

Using these features, we trained our first binary classification model to predict win/loss outcomes. The model evaluation was based on accuracy and log loss, and we split our data into training and testing sets to validate its performance.


Week 3 - Can We Predict NBA Wins? Which NBA Model Wins?

Week 4 — Putting the Model to Work


A model is only as useful as its interface — and Week 4 was all about making our model accessible and testable.

We built a Streamlit interface that allowed users to either manually input data or upload CSVs to simulate NBA matchups. The key function predict_game_result() made it possible to generate outcomes for individual games or entire seasons in batch mode.

We also created a batch inference pipeline to simulate entire weeks of the NBA schedule, calculating predicted wins and comparing them with actual outcomes.

To visualize this data, we published Tableau dashboards displaying:

  • Predicted vs Actual Wins per team

  • Team-level shooting percentages in wins vs losses

  • Simulation performance charts over time


Week 4 — Putting the Model to Work

 Week 5 — Smarter Features, Smarter Model


This was the week we leveled up — transforming our model into something context-aware and much more accurate.

We introduced several new game context features, including:


  • WIN_STREAK: Current streak to reflect momentum

  • REST_DAYS: Days since last game

  • Opponent_WinRate: Cumulative win rate of the opposing team up to that game

  • DateOrdinal: Numeric representation of game date for time-aware modeling

  • Team_WinRate: Rolling win rate of the team


After updating our feature set, we retrained the model using GridSearchCV to fine-tune hyperparameters like max_depth, learning_rate, and n_estimators.

We evaluated the new model on both the 2023–24 season (seen data) and the 2024–25 season (completely unseen data).


The results were promising: Over 70% prediction accuracy on unseen data — a huge leap forward.



Stay Connected

This project is fully documented through videos, and blog posts. You can follow every step:

  • Access Code : A step-by-step project implementation guide is available on my GitLab.



YouTube Playlist:



Comments


1656997251linkedin-logo-black.png

Machine Learning AI Data Systems  Blog | Aykut Onat

bottom of page