NBA data intake & prop modeling — project report
A readable surface for our NBA sports betting build: what we’re collecting, what’s working today, and how we’ll turn it into a prop-evaluation engine.
What this report is
This is a lightweight, developer-friendly project surface for our NBA sports betting work. It’s written for two audiences at once:
- Beginners (Dustin): what these betting terms mean, what data we’re collecting, and what we’re trying to achieve.
- Developers (Jeffrey): how the pipeline is shaped, what tables exist, and how we’ll turn data into repeatable prop evaluation.
We’re intentionally starting with a small “truth-layer” fetcher that proves we can collect stable game/player/event data and store it in a canonical format. Once that foundation is stable, we scale collection and add market/odds snapshots.
Beginner primer: what is a “player prop”?
A player prop is a bet on a player’s stat line rather than the final score. Examples:
- Points: “Player X over 24.5 points”
- Assists: “Player Y at least 8 assists”
- Rebounds: “Player Z under 10.5 rebounds”
Sportsbooks publish a line (e.g., 24.5 points) and an odds price (how much you win relative to your stake). Our job is to estimate the player’s distribution of outcomes and decide if the offered line/price implies a probability that’s worse than reality.
Where we are today
Already working
- Fetches a date’s NBA games (schedule/game IDs)
- Fetches traditional box score player lines per game
- Optionally fetches play-by-play (PBP) event logs
- Writes results into a local SQLite database with canonical tables
Actively hardening
- PBP idempotency (safe re-runs without UNIQUE constraint errors)
- Stable identifiers across multiple upstream sources
- Versioned “contract manifests” so core functions don’t drift silently
How this becomes betting advice
We are building toward a repeatable loop:
- Collect truth data: game context, player minutes/production, and event detail (optional).
- Add market context: sportsbook lines/odds for each prop market, captured over time.
- Model outcomes: project minutes + per-minute rates + adjustments (pace, role, opponent, rest) to produce an outcome distribution.
- Compare to the line: convert odds to implied probability; compute expected value (EV).
- Track quality: measure closing line value (CLV) and ROI over a meaningful sample.
Early on, we’ll use simple, interpretable models and gradually add complexity only when it improves performance and stability.
Glossary (quick)
- Line
- The threshold (e.g., 24.5 points) the bet is evaluated against.
- Odds price
- The payout format (often American odds like -110 / +120). Converts to implied probability.
- Implied probability
- The probability the sportsbook is charging for (after their margin).
- Edge
- Our estimated probability minus implied probability.
- EV (expected value)
- Average expected profit per bet if our probability estimate is correct.
- CLV
- Whether we beat the closing line/price (often a better long-run signal than short-run wins).
Important note
This project is about building a rigorous, testable approach to prop evaluation. Nothing here is a guarantee of profit; variance is real, and any strategy must be tested over enough volume to matter.