NBA data intake & prop modeling — project report
Checkpoint: Decision Packet Enriched (v5). We now have vendor links + published pricing snapshots, a filled comparison matrix, and a market landscape page to align expectations. The remaining dependency is still Jeffrey’s vendor/budget decision.
What changed in v5
- Provider comparison matrix fixed: no more empty cells; unknowns are labeled as verification tasks.
- Vendor links + pricing snapshots: direct links to each option so Jeffrey can evaluate quickly.
- Market landscape page: benchmarks existing edge tools and clarifies our differentiation thesis.
Start here
Decision packet
Pick a provider stack (lean / mid / enterprise) so we can ingest odds + player props and start measuring CLV/EV.
What we’re building
Truth POC fetcher status, data points captured, and how this becomes a modeling + edge loop.
Vendors & pricing appendix
Deeper vendor list + a concrete checklist for validating coverage, history, and licensing.
Market landscape
Source-led overview of existing edge tools and where we can match/outperform.
Reality check
We already have a season-to-date performance truth base (schedule + box + play-by-play). The bottleneck is market data: odds + player props + book IDs + timestamps. Once that is chosen, we move to “Market Ingestion First Light.”
Past releases
Release 2025-12-21 (bundle v4) — previous homepage
NBA data intake & prop modeling — project report
Checkpoint: Decision Packet Ready. We have proven performance truth capture (schedule + box + play-by-play). We are now ready to ingest market/props truth once Jeffrey selects the provider path and constraints.
Latest release — Decision Packet Ready
- Performance truth: validated at scale (season-to-date pulls, coverage reports, resume behavior).
- Decision packet: a short set of “pick A/B/C” choices that let Jeffrey decide provider path, scope, latency, history depth, and budget.
- Market adapter readiness: canonical schema + adapter contract drafted; implementation is plug-in once provider is chosen.
- Next milestone: “Market Ingestion First Light” (first snapshot pull + market coverage report).
Start here: Decision point and Why PBP + novelty.
Pending decision point
We need Jeffrey’s quick picks before we can compute real betting edges:
- Provider path (enterprise bundle vs developer API vs hybrid)
- FanDuel-only vs multi-book
- Realtime vs delayed snapshots
- History depth (season-to-date vs multi-season)
- Budget + storage/backtest rights constraints
What becomes possible after the decision
- Capture time-stamped prop/odds snapshots into canonical tables.
- Run a market coverage report (market equivalent of the PBP coverage report).
- Compute EV vs line, track outcomes, and measure CLV (closing line value).
Past releases
Release 2025-12-18 (bundle v3) — expand
NBA data intake & prop modeling — project report
This report summarizes our current build state, what we have proven in the truth layer, and the remaining decision point (odds/props markets) before we can compute actionable “edges” for NBA betting.
Latest release
- Truth layer is now season-to-date capable: we backfilled 387 games across 55 dates with 0 missing box scores and 0 missing play-by-play in our coverage reports.
- Idempotent ingest + resume verified: re-running the same date ranges does not duplicate rows and a
--resumerun can safely pick up where a prior run left off. - We are now ready for market ingestion: the next gating decision is selecting a provider for odds + player props (and, ideally, historical line movement / closing lines).
Where we are in the process
Phase 6 (Market + availability ingestion). Last gate passed: Gate 4C — Observability + coverage reporting. Next gate: Gate 4A — Idempotent snapshots. Active ritualset reference: sportsbetting_ritualset_v2_activated_v1.
In plain terms: schedule/box/PBP data capture is stable at scale. The next work is adding market “truth” (lines, prices, books, timestamps) so we can measure EV/CLV and move from “projections” to “bet decisions.”
Fast map of what we built
- truth_poc.py pulls (a) schedule, (b) box score, and (c) play-by-play, and writes to a local SQLite DB.
- coverage_report.py reads the DB and outputs coverage metrics (games seen, missing box, missing PBP), per date and overall.
- Run logs + primary keys enforce re-run safety (idempotency) and support long backfills via
--resume.
Pending decision point
To compute edges for FanDuel-style markets, we still need stable access to:
- Game lines: moneyline, spread, total (and ideally alternates)
- Player props: points, assists, rebounds, threes, combos (PRA, etc.)
- Historical snapshots: open/close, line movement, book-by-book pricing (for CLV + backtesting)
Past releases
Release 2025-12-16 (bundle v2) — expand
NBA data intake & prop modeling — project report
A readable surface for our NBA sports betting build: what we’re collecting, what’s working today, and how we’ll turn it into a prop-evaluation engine.
What this report is
This is a lightweight, developer-friendly project surface for our NBA sports betting work. It’s written for two audiences at once:
- Beginners (Dustin): what these betting terms mean, what data we’re collecting, and what we’re trying to achieve.
- Developers (Jeffrey): how the pipeline is shaped, what tables exist, and how we’ll turn data into repeatable prop evaluation.
We’re intentionally starting with a small “truth-layer” fetcher that proves we can collect stable game/player/event data and store it in a canonical format. Once that foundation is stable, we scale collection and add market/odds snapshots.
Beginner primer: what is a “player prop”?
A player prop is a bet on a player’s stat line rather than the final score. Examples:
- Points: “Player X over 24.5 points”
- Assists: “Player Y at least 8 assists”
- Rebounds: “Player Z under 10.5 rebounds”
Sportsbooks publish a line (e.g., 24.5 points) and an odds price (how much you win relative to your stake). Our job is to estimate the player’s distribution of outcomes and decide if the offered line/price implies a probability that’s worse than reality.
Where we are today
Already working
- Fetches a date’s NBA games (schedule/game IDs)
- Fetches traditional box score player lines per game
- Optionally fetches play-by-play (PBP) event logs
- Writes results into a local SQLite database with canonical tables
Actively hardening
- PBP idempotency (safe re-runs without UNIQUE constraint errors)
- Stable identifiers across multiple upstream sources
- Versioned “contract manifests” so core functions don’t drift silently
How this becomes betting advice
We are building toward a repeatable loop:
- Collect truth data: game context, player minutes/production, and event detail (optional).
- Add market context: sportsbook lines/odds for each prop market, captured over time.
- Model outcomes: project minutes + per-minute rates + adjustments (pace, role, opponent, rest) to produce an outcome distribution.
- Compare to the line: convert odds to implied probability; compute expected value (EV).
- Track quality: measure closing line value (CLV) and ROI over a meaningful sample.
Early on, we’ll use simple, interpretable models and gradually add complexity only when it improves performance and stability.
Glossary (quick)
- Line
- The threshold (e.g., 24.5 points) the bet is evaluated against.
- Odds price
- The payout format (often American odds like -110 / +120). Converts to implied probability.
- Implied probability
- The probability the sportsbook is charging for (after their margin).
- Edge
- Our estimated probability minus implied probability.
- EV (expected value)
- Average expected profit per bet if our probability estimate is correct.
- CLV
- Whether we beat the closing line/price (often a better long-run signal than short-run wins).
Important note
This project is about building a rigorous, testable approach to prop evaluation. Nothing here is a guarantee of profit; variance is real, and any strategy must be tested over enough volume to matter.