AI Analyst architecture

The AI Analyst is an agentic Claude session with read-only SQL access to the analytics database. This page covers how it works — the loop, the tools, the data-flow model.

The agent loop

A single report is a turn-based conversation between our backend and Claude (Sonnet 4.6). Each turn:

  1. Our server calls client.messages.create() with the current conversation + available tools
  2. Claude either emits plain text, calls a tool, or signals completion
  3. If it called a tool, our server executes it and appends the result to the conversation
  4. Loop until Claude calls finish_report — or hits the iteration / wall-clock cap

Hard limits enforced on the server side:

  • 25 tool-call iterations — prevents infinite loops
  • 15-minute wall-clock — prevents cost runaway
  • 4k output tokens per turn — keeps responses focused
  • 6 SDK-level retries — absorbs transient 429 rate-limit responses

The 5-tool contract

Claude doesn't see a blank SQL prompt — it works through a small, typed toolbox.

list_tables

Returns table names + row-count hints + a one-line purpose per table. Always the first call for an unfamiliar question.

describe_table(table, include_stats?)

Full schema for one table: columns, types, canonical rules, CAVEATS (hard-won gotchas from the data), plus a live null-rate / cardinality probe. Claude calls this before writing SQL against any table with non-obvious quirks.

sample(table, columns?, where?, limit≤20)

Up to 20 raw rows — for data-shape understanding only, never for analysis. Hard-capped to keep tokens low.

query_sql(sql, purpose)

The workhorse. Claude writes a SELECT or WITH query; our server:

  • Parses + rejects DDL/DML/multi-statement via regex before any DB connection opens
  • Allows leading SQL comments (they're stripped before the prefix check)
  • Wraps with LIMIT 10000 if Claude didn't specify (hard ceiling 50k)
  • Runs with statement_timeout = 30s
  • Surfaces Postgres hint + detail fields on errors (for "did you mean…" suggestions)
  • On 42703 (undefined column) errors, attaches the full real column list of every known table in the failed query — so Claude self-corrects in one shot

finish_report(...)

Terminal. Emits the structured report: narrative, sections (stats | chart | table | text), recommendations, caveats. The agent loop stops here.

Smart data delivery — the data_ref model

A naive agentic BI tool quickly chokes on Anthropic's 30k-input-tokens/minute limit: every tool call round-trip re-sends the whole conversation, so 50-row query previews compound fast.

Luma BI's model:

  1. Every query_sql / sample result caches the full rows server-side in ctx.dataRefs[ref_id]
  2. The response Claude sees includes only: { summary, preview: rows.slice(0, 10), row_count, data_ref }
  3. In finish_report, Claude references chart/table data by data_ref — not by pasting rows inline
  4. Our frontend pulls full rows from the persisted data_refs JSONB column when rendering

Effect: a report can operate on 10k rows of evidence without ever putting more than 10 of them into Claude's prompt. ~80% reduction in per-turn tokens vs. naive approach.

Cost guardrails

  • Per-admin daily dollar cap (default $5, override AI_DAILY_BUDGET_USD)
  • Per-non-admin daily run cap (default 2, override AI_USER_DAILY_LIMIT)
  • Error-exempt counters — errored runs don't burn quota (transient rate limits shouldn't punish the user)
  • Admin spending dashboard — per-user totals, daily trend, top-cost reports

Costs are tracked on every agent turn via response.usage.input_tokens + output_tokens and persisted on the ai_reports row.

Report lifecycle

| State | Meaning | | --- | --- | | pending | Row inserted, agent not yet running | | running | Agent loop is active | | done | finish_report was called, artefact persisted | | error | Exception or deadline exceeded |

Reports default to private (owner-only). One click promotes to public — any signed-in user can then read, star, or comment. Owners can rename, tag, pin (admins), schedule (daily/weekly/monthly), post to Slack, export as Markdown, or delete.


Next: schema reference for the database tables Claude queries, or deployment for host-environment details.