- Why not just count tool failures in my application code?
- Because the questions are aggregations — failures per tool, p95 latency, calls per session — and pulling rows back to tally them in app code is fragile and slow as call volume grows. Asking the LLM to count a log is worse: arithmetic over a list hallucinates. nlqdb runs the GROUP BY in Postgres and shows you the SQL it ran, so you can trust the grain.
- How do the tool-call records get into the database?
- Write one row per tool call — tool name, session id, status, latency, timestamp — with the deterministic `nlqdb_remember` MCP tool or a parameterised INSERT through `POST /v1/run` (`GLOBAL-015`). The row shape stays a trust boundary, built server-side, not LLM-guessed. Then ask the reliability questions in English over the same table.
- Can I see the SQL behind the error rates?
- Always — every answer returns the result rows plus the compiled SQL under a trace toggle (`SK-WEB-005`), so you can check the grain (per call vs per session) before trusting a failure rate. nlqdb never hides the SQL behind the answer.
- Is this a replacement for an agent-observability tool like Langfuse or AgentOps?
- No — those instrument your agent and capture every span automatically, with nested trace-tree UIs built for debugging one run. nlqdb is the database half: you decide what to log, and you get a SQL query planner over it for ad-hoc 'per tool / per week' questions without a per-seat dashboard. They compose; nlqdb doesn't trace your runs.