Valx Sports Betting NBA data intake · Build surface
Sports Betting — Cloudflare Report

What we’re building now — detailed

A deeper look at the current fetcher, the exact fields we collect, and how those fields turn into prop evaluation.

Current state of the fetcher

truth_poc.py is our “truth-layer” fetcher. It is intentionally small and auditable. Its job is to:

  • Take a date (YYYY-MM-DD) and pull that day’s NBA games.
  • For each game, pull player box score rows (minutes + key stats).
  • Optionally pull play-by-play (event log) for deeper derived metrics later.
  • Write everything into a local SQLite database in canonical tables.

Right now, the script is productionized enough for small test pulls (1–2 games) and is being hardened for safe re-runs and larger backfills.

What data we’re gathering (today)

The fetcher writes three canonical tables. These are the “truth spine” we’ll join against odds/props later.

1) canonical_schedule_rest

One row per game per capture timestamp. Key columns:

  • captured_time_utc
  • game_id
  • raw_game_date
  • start_time_utc
  • home_team_id
  • away_team_id
  • source
  • requested_date

2) canonical_box_score

One row per player stat line per game. Key columns:

  • captured_time_utc
  • game_id
  • team_id
  • player_id
  • minutes
  • points
  • rebounds
  • assists
  • threes_made
  • steals
  • blocks
  • turnovers
  • source

Why this matters for props: minutes + production are the starting point for every prop model. Minutes are the volume lever; per-minute rates are the efficiency lever.

3) canonical_pbp

One row per play-by-play event (optional). Key columns:

  • captured_time_utc
  • game_id
  • event_num
  • period
  • clock
  • event_type
  • score
  • description
  • source
  • raw_json

Why PBP matters: it allows deeper features later (usage proxies, possession timing, lineup segments, foul trouble effects, etc.). We keep raw_json to stay forward-compatible as upstream schemas evolve.

Command to run (developer-friendly)

Jeffrey can paste this in any Python environment after installing deps:

python truth_poc.py --date 2025-12-15 --max-games 1 --db pilot.sqlite --pbp-mode nba_api --sleep 1.5

Tip: If you re-run often, use a new DB filename per run or apply the idempotency patch so reruns don’t collide on primary keys.

How we use these tables to build a prop model

The canonical tables are not the model themselves — they are the data substrate. The modeling pipeline will:

  1. Compute baseline rates (e.g., points per minute, assists per minute) from historical box scores.
  2. Project minutes for the upcoming game (recent minutes + role + rotation changes).
  3. Adjust for context: pace, rest, home/away, injuries/usage shifts, matchup factors (lightly).
  4. Generate a distribution of outcomes (not just a point estimate) so we can price “over/under” lines.
  5. Compare to market once we ingest sportsbook odds/lines snapshots; compute EV and only recommend when edge is meaningful.