AI Analyst architecture

The AI Analyst is an agentic Claude session with read-only SQL access to the analytics database. This page covers how it works — the loop, the tools, the data-flow model.

The agent loop

A single report is a turn-based conversation between our backend and Claude (Sonnet 4.6). Each turn:

Our server calls client.messages.create() with the current conversation + available tools
Claude either emits plain text, calls a tool, or signals completion
If it called a tool, our server executes it and appends the result to the conversation
Loop until Claude calls finish_report — or hits the iteration / wall-clock cap

Hard limits enforced on the server side:

25 tool-call iterations — prevents infinite loops
15-minute wall-clock — prevents cost runaway
4k output tokens per turn — keeps responses focused
6 SDK-level retries — absorbs transient 429 rate-limit responses

The 5-tool contract

Claude doesn't see a blank SQL prompt — it works through a small, typed toolbox.

`list_tables`

Returns table names + row-count hints + a one-line purpose per table. Always the first call for an unfamiliar question.

`describe_table(table, include_stats?)`

Full schema for one table: columns, types, canonical rules, CAVEATS (hard-won gotchas from the data), plus a live null-rate / cardinality probe. Claude calls this before writing SQL against any table with non-obvious quirks.

`sample(table, columns?, where?, limit≤20)`

Up to 20 raw rows — for data-shape understanding only, never for analysis. Hard-capped to keep tokens low.

`query_sql(sql, purpose)`

The workhorse. Claude writes a SELECT or WITH query; our server:

Parses + rejects DDL/DML/multi-statement via regex before any DB connection opens
Allows leading SQL comments (they're stripped before the prefix check)
Wraps with LIMIT 10000 if Claude didn't specify (hard ceiling 50k)
Runs with statement_timeout = 30s
Surfaces Postgres hint + detail fields on errors (for "did you mean…" suggestions)
On 42703 (undefined column) errors, attaches the full real column list of every known table in the failed query — so Claude self-corrects in one shot

`finish_report(...)`

Terminal. Emits the structured report: narrative, sections (stats | chart | table | text), recommendations, caveats. The agent loop stops here.

Smart data delivery — the data_ref model

A naive agentic BI tool quickly chokes on Anthropic's 30k-input-tokens/minute limit: every tool call round-trip re-sends the whole conversation, so 50-row query previews compound fast.

Luma BI's model:

Every query_sql / sample result caches the full rows server-side in ctx.dataRefs[ref_id]
The response Claude sees includes only: { summary, preview: rows.slice(0, 10), row_count, data_ref }
In finish_report, Claude references chart/table data by data_ref — not by pasting rows inline
Our frontend pulls full rows from the persisted data_refs JSONB column when rendering

Effect: a report can operate on 10k rows of evidence without ever putting more than 10 of them into Claude's prompt. ~80% reduction in per-turn tokens vs. naive approach.

Cost guardrails

Per-admin daily dollar cap (default $5, override AI_DAILY_BUDGET_USD)
Per-non-admin daily run cap (default 2, override AI_USER_DAILY_LIMIT)
Error-exempt counters — errored runs don't burn quota (transient rate limits shouldn't punish the user)
Admin spending dashboard — per-user totals, daily trend, top-cost reports

Costs are tracked on every agent turn via response.usage.input_tokens + output_tokens and persisted on the ai_reports row.

Report lifecycle

| State | Meaning | | --- | --- | | pending | Row inserted, agent not yet running | | running | Agent loop is active | | done | finish_report was called, artefact persisted | | error | Exception or deadline exceeded |

Reports default to private (owner-only). One click promotes to public — any signed-in user can then read, star, or comment. Owners can rename, tag, pin (admins), schedule (daily/weekly/monthly), post to Slack, export as Markdown, or delete.

Next: schema reference for the database tables Claude queries, or deployment for host-environment details.