Sports Betting — Cloudflare Report
Fetcher deep dive — truth_poc.py
Exactly what the current script does, what it writes, and what’s coming next to connect data → props → EV.
Fetcher overview (truth_poc.py)
This script is our proof-of-collection. It makes small, polite requests, normalizes fields, and writes a durable SQLite snapshot.
Design goals:
- Auditable: small surface area, clear outputs.
- Joinable: stable IDs so we can attach odds/props later.
- Versionable: core functions are contract-hashed so changes are explicit.
Data points: what we can compute immediately
From schedule + box score alone we can produce:
- Minutes trend (season vs last 5)
- Per-minute production rates (PTS/min, AST/min, REB/min)
- Basic volatility estimates (game-to-game variance)
With PBP (optional), we can later derive:
- Event-based pace proxies and possession timing
- Foul trouble and substitution timing features
- Shot profile context (if action subtypes are present)
Known issues and hardening
- PBP uniqueness / reruns: if you rerun the same game into the same DB with the same capture timestamp, you can hit UNIQUE constraint conflicts. Our hardening patches focus on idempotency and safer event IDs.
- Endpoint churn: upstream NBA endpoints deprecate versions over time; we’re moving to v3 endpoints when possible and keeping
raw_jsonfor schema resilience. - Scaling: for multi-season backfills we’ll add chunking, retries with jitter, and a resume ledger so interrupted runs continue safely.
Next: odds & props ingestion
To advise bets, we must ingest the sportsbook side too:
- Prop markets (points, assists, rebounds, 3PM, combos, etc.)
- Line (e.g., 24.5 points)
- Odds price (e.g., -110)
- Timestamped snapshots so we can measure movement and closing line value (CLV)
We prefer licensed/normalized market data feeds when possible; scraping is a fallback and requires extra stability work.