Valx Sports Betting NBA data intake · truth_poc.py
Sports Betting — Cloudflare Report

Fetcher deep dive — truth_poc.py

Exactly what the current script does, what it writes, and what’s coming next to connect data → props → EV.

Fetcher overview (truth_poc.py)

This script is our proof-of-collection. It makes small, polite requests, normalizes fields, and writes a durable SQLite snapshot.

Design goals:

  • Auditable: small surface area, clear outputs.
  • Joinable: stable IDs so we can attach odds/props later.
  • Versionable: core functions are contract-hashed so changes are explicit.

Data points: what we can compute immediately

From schedule + box score alone we can produce:

  • Minutes trend (season vs last 5)
  • Per-minute production rates (PTS/min, AST/min, REB/min)
  • Basic volatility estimates (game-to-game variance)

With PBP (optional), we can later derive:

  • Event-based pace proxies and possession timing
  • Foul trouble and substitution timing features
  • Shot profile context (if action subtypes are present)

Known issues and hardening

  • PBP uniqueness / reruns: if you rerun the same game into the same DB with the same capture timestamp, you can hit UNIQUE constraint conflicts. Our hardening patches focus on idempotency and safer event IDs.
  • Endpoint churn: upstream NBA endpoints deprecate versions over time; we’re moving to v3 endpoints when possible and keeping raw_json for schema resilience.
  • Scaling: for multi-season backfills we’ll add chunking, retries with jitter, and a resume ledger so interrupted runs continue safely.

Next: odds & props ingestion

To advise bets, we must ingest the sportsbook side too:

  • Prop markets (points, assists, rebounds, 3PM, combos, etc.)
  • Line (e.g., 24.5 points)
  • Odds price (e.g., -110)
  • Timestamped snapshots so we can measure movement and closing line value (CLV)

We prefer licensed/normalized market data feeds when possible; scraping is a fallback and requires extra stability work.