#harnessed.md

Observation is the harness’s third pillar — what closes the loop. Verification catches what’s broken before it ships; observation catches what slips through. Without it, the agent never learns from production.

##The observation stack

LayerWhat it watchesWhen it fires
Error trackingExceptions, failed requests, crashesContinuous
Usage analyticsAdoption, drop-off, feature useContinuous
Agentic investigationAnomalies, root causeOn alert

The first two surface that something is wrong. The third tells you why.

##Error tracking

The minimum bar. Every change carries some risk of a runtime failure; error tracking is what tells you it happened. Sentry-class tools attach source maps to stack traces, group similar errors, and link incidents back to the deploy that caused them.

What to capture beyond the exception itself:

Tools: Sentry, Bugsnag, Rollbar, or platform-native equivalents.

##Usage analytics

Errors tell you when things break. Usage tells you about the silent failures — a new feature ships and adoption stays flat, a working one starts losing traction, a funnel step drops users. Things technically work but something is off.

Patterns that earn their place:

Tools: PostHog (open source, self-hostable), Amplitude, Mixpanel. The pattern matters more than the vendor — pick one and instrument consistently.

##Agentic investigation

When an alert fires, an LLM agent queries your logs and metrics around the time of the incident, looks for what changed, and surfaces a root-cause hypothesis. The shift is from “alert fired, dig manually” to “alert fired, here’s the analysis.”

Vercel Agent Investigation is the clearest GA example — runs automatically when an anomaly alert fires, queries logs and metrics, looks for related errors, and posts findings to the alert. Built on the Vercel MCP server, so the agent has authenticated access to runtime data.

The emerging pattern beyond alert-triggered: hand a coding agent a bug report — screenshots, repro details, severity — and give it tools to query logs, deploy a debug branch with added instrumentation, and compare against baseline. Compose it from primitives: vercel-labs/agent-browser for screenshot capture, vercel logs for queryable history, MCP for live data.

##Closing the loop

Production signals feed two flows. Recurring patterns become harness improvements; one-off signals become tasks. The work is sorting which is which — see Verification: Closing the loop for the three-question framing that applies here verbatim.

The signals you have to wire up are different, though. Errors and analytics live in third-party tools; the harness lives in your repo. Four ways to close the gap, ordered by friction:

Observation closes the loop. Without it, the harness stops learning the moment something passes verification.


##Further reading