AI Analyst architecture
The AI Analyst is an agentic Claude session with read-only SQL access to the analytics database. This page covers how it works — the loop, the tools, the data-flow model.
The agent loop
A single report is a turn-based conversation between our backend and Claude (Sonnet 4.6). Each turn:
- Our server calls
client.messages.create()with the current conversation + available tools - Claude either emits plain text, calls a tool, or signals completion
- If it called a tool, our server executes it and appends the result to the conversation
- Loop until Claude calls
finish_report— or hits the iteration / wall-clock cap
Hard limits enforced on the server side:
- 25 tool-call iterations — prevents infinite loops
- 15-minute wall-clock — prevents cost runaway
- 4k output tokens per turn — keeps responses focused
- 6 SDK-level retries — absorbs transient 429 rate-limit responses
The 5-tool contract
Claude doesn't see a blank SQL prompt — it works through a small, typed toolbox.
list_tables
Returns table names + row-count hints + a one-line purpose per table. Always the first call for an unfamiliar question.
describe_table(table, include_stats?)
Full schema for one table: columns, types, canonical rules, CAVEATS (hard-won gotchas from the data), plus a live null-rate / cardinality probe. Claude calls this before writing SQL against any table with non-obvious quirks.
sample(table, columns?, where?, limit≤20)
Up to 20 raw rows — for data-shape understanding only, never for analysis. Hard-capped to keep tokens low.
query_sql(sql, purpose)
The workhorse. Claude writes a SELECT or WITH query; our server:
- Parses + rejects DDL/DML/multi-statement via regex before any DB connection opens
- Allows leading SQL comments (they're stripped before the prefix check)
- Wraps with
LIMIT 10000if Claude didn't specify (hard ceiling 50k) - Runs with
statement_timeout = 30s - Surfaces Postgres
hint+detailfields on errors (for "did you mean…" suggestions) - On
42703(undefined column) errors, attaches the full real column list of every known table in the failed query — so Claude self-corrects in one shot
finish_report(...)
Terminal. Emits the structured report: narrative, sections (stats | chart | table | text), recommendations, caveats. The agent loop stops here.
Smart data delivery — the data_ref model
A naive agentic BI tool quickly chokes on Anthropic's 30k-input-tokens/minute limit: every tool call round-trip re-sends the whole conversation, so 50-row query previews compound fast.
Luma BI's model:
- Every
query_sql/sampleresult caches the full rows server-side inctx.dataRefs[ref_id] - The response Claude sees includes only:
{ summary, preview: rows.slice(0, 10), row_count, data_ref } - In
finish_report, Claude references chart/table data bydata_ref— not by pasting rows inline - Our frontend pulls full rows from the persisted
data_refsJSONB column when rendering
Effect: a report can operate on 10k rows of evidence without ever putting more than 10 of them into Claude's prompt. ~80% reduction in per-turn tokens vs. naive approach.
Cost guardrails
- Per-admin daily dollar cap (default $5, override
AI_DAILY_BUDGET_USD) - Per-non-admin daily run cap (default 2, override
AI_USER_DAILY_LIMIT) - Error-exempt counters — errored runs don't burn quota (transient rate limits shouldn't punish the user)
- Admin spending dashboard — per-user totals, daily trend, top-cost reports
Costs are tracked on every agent turn via response.usage.input_tokens + output_tokens and persisted on the ai_reports row.
Report lifecycle
| State | Meaning |
| --- | --- |
| pending | Row inserted, agent not yet running |
| running | Agent loop is active |
| done | finish_report was called, artefact persisted |
| error | Exception or deadline exceeded |
Reports default to private (owner-only). One click promotes to public — any signed-in user can then read, star, or comment. Owners can rename, tag, pin (admins), schedule (daily/weekly/monthly), post to Slack, export as Markdown, or delete.
Next: schema reference for the database tables Claude queries, or deployment for host-environment details.