From Passion to Prediction: Building the NBA Insights System (Weeks 1 & 2 Recap)
- Aykut Onat
- 1 day ago
- 2 min read
Why We Started This
As both a basketball enthusiast and a data professional, I’ve always wondered: Can we accurately predict NBA game outcomes using historical data?
That question became the foundation of NBA Insights, a machine learning-driven system designed to forecast NBA games using Python, nba_api, and core data science techniques. This post covers the progress we’ve made over the first two weeks of the project—and what we’ve learned so far.
Week 1: Setting the Foundation
We began with two important goals:
Define a clear roadmap : The project was broken into structured phases—starting with data preparation and feature engineering, then progressing into modeling, evaluation, and advanced modules like playoff simulation.
Collect and explore NBA data : Using the nba_api Python library, we pulled all regular season game data for the 2023–2024 season. We focused on key variables such as:
Points scored (PTS)
Plus-minus differential
Home vs. away matchups
Win/loss results
Our initial exploratory analysis led to important questions, including:
How strong is the home-court advantage?
Can recent team trends explain future performance?
What role do stats like assists, turnovers, and FG% play in predicting outcomes?
We visualized these patterns and shared early breakdowns, including performance charts for the Los Angeles Lakers.
Week 2: Feature Engineering
This week was dedicated to crafting the features that would feed our prediction model. We followed a nine-step process that emphasized recent form and matchup dynamics.
Key Features Engineered:
Rolling 5-game averages for each team:
Points, assists, turnovers, rebounds, field goal percentage
Opponent rolling stats, allowing us to account for the quality of the competition
Home/away flag, capturing the impact of venue
Binary win/loss target (TARGET_WIN) for classification modeling
We also introduced structured identifiers like home_team_game_id and away_team_game_id to support a reliable merge process and prevent duplication—especially on days when multiple games occurred.
Sanity Checks
Before moving to modeling, we verified:
Feature distributions and value ranges
No missing data
Balanced target variable (50% win, 50% loss)
Dataset integrity for both home and away team perspectives
Lessons Learned
Recent performance matters: Season averages don’t capture team momentum; rolling windows do.
Matchup awareness is key: Understanding both teams’ recent form improves prediction potential.
Planning prevents errors: A well-defined step-by-step process saved time, especially in handling merges and feature generation.
What’s Next
We’re now entering Week 3, which shifts our focus to predictive modeling:
Start with logistic regression for a quick, interpretable baseline
Progress to XGBoost for more complex and accurate predictions
We'll test both, compare results, and prepare for evaluation and optimization.
Stay Connected
This project is fully documented through videos, and blog posts. You can follow every step:
Access Code : A step-by-step project implementation guide is available on my GitLab.
YouTube Playlist:
Kommentare