- Why not just total token usage in my application code?
- Because the questions are aggregations — spend per user, tokens per model, cost per day — and pulling rows back to sum them in app code is fragile and slow as volume grows. Asking the LLM to add them up is worse: arithmetic over a list is a hallucination generator. nlqdb runs the GROUP BY in Postgres and shows you the SQL it ran.
- How do the token and cost numbers get into the database?
- Write one row per LLM call — user, model, prompt and completion tokens, computed cost, timestamp — with the deterministic `nlqdb_remember` MCP tool or a parameterised INSERT through `POST /v1/run` (`GLOBAL-015`). The row shape stays a trust boundary, built server-side, not LLM-guessed. Then ask the cost questions in English over the same table.
- Can I see the SQL behind the cost numbers?
- Always — every answer returns the result rows plus the compiled SQL under a trace toggle (`SK-WEB-005`), so you can verify the grain (per call vs per user) before trusting a spend figure. nlqdb never hides the SQL behind the answer.
- Is this a replacement for an LLM observability tool like Langfuse or Helicone?
- No — those proxy or instrument your calls and capture token and cost automatically, with tracing UIs built for it. nlqdb is the database half: you decide what to log, and you get a SQL query planner over it for ad-hoc 'spend per X' questions without a per-seat dashboard tool. They compose; nlqdb doesn't proxy your traffic.