Why play-by-play exists, why market APIs exist, and what novelty we bring
This section is written for a software developer (Jeffrey) but also intended to be beginner-friendly. It explains why we capture play-by-play (PBP), why market data is a separate truth feed, and how our approach differentiates from existing “edge” tools.
Latest release
Why play-by-play (PBP) exists — and why it’s useful
Play-by-play is a chronological event log of the game: shots, rebounds, fouls, turnovers, substitutions, clock, etc.
People use it because box scores can’t answer lineup/rotation questions or possession-level questions.
In short: box score = summary; PBP = the ledger.
Why “market data” APIs exist (odds + props)
Sportsbooks publish prices/lines (spreads, totals, player props).
That information is valuable because betting decisions are made against the line, and the line changes over time.
If we want to measure edge rigorously, we need:
- the **line and price at decision time**
- the **closing line** (to compute CLV)
- a **history of movement** (to study when/why edge appears)
What we’ve already solved vs what’s next
### ✅ Solved: performance truth (what happened in games)
We have a working truth pipeline that captures:
- schedule + final scores
- box score stats
- play-by-play events
…and we can backfill across date ranges with idempotent resume behavior.
### ⏳ Pending: market truth + availability truth
To compute betting edges, we still need:
- **market truth**: player props lines + prices, time-stamped snapshots
- **availability truth**: injuries/lineups/role signals that change minutes/usage
Is “scrape-first” shallow or deep?
It’s deep as a developer instinct. Scraping is often the fastest way to prove feasibility and avoid vendor lock-in.
The problem is that once you scale, scraping can become brittle (blocking/rate limits/schema drift/ToS risk).
Our approach is to:
1) keep the system provider-agnostic (adapter contracts), and
2) choose an API/bundle when reliability and historical coverage matter.
What novelty we bring (even though products already exist)
There are many tools that show props and trends. Our differentiation is:
- **auditability**: every recommendation links to frozen truth + frozen market snapshot + model version
- **CLV-first iteration**: we measure whether we beat the market over time, not just whether a bet won
- **join power**: availability → minutes/usage → distribution → compare to line → EV + CLV tracking
How this changes depending on Jeffrey’s choices
Some parts of the thesis change depending on provider scope and constraints. When Jeffrey confirms decisions, we generate a short delta note so the report stays consistent.
- FanDuel-only vs multi-book: affects normalization complexity and whether we can “best-price” shop.
- Realtime vs delayed: affects whether timing/line-move edges are realistic.
- Season-only vs multi-season: affects priors and backtest stability.