Audit — harnessed.md

The audit is a rubric your agent runs against your codebase. It scores each pillar — Guides, Verification, Observation, Closing the loop — flags where the harness is thin, and recommends a next step grounded in the tooling you already use.

##The prompt

Audit this repository against the rubric at
https://www.harnessed.md/audit/llms.txt and follow the instructions there.

##Scoring

Score each item from evidence in the repo, not from intent. For each item, the agent assigns one number:

0 — not present
1 — token: exists but unused or trivially empty
2 — partial: inconsistent or incomplete coverage
3 — solid baseline
4 — thorough, well-maintained
5 — exemplary, no areas to improve

##Output

One markdown table per section:

| Item | Score | Evidence | Next step |
| --- | --- | --- | --- |

The Next step column should reference the tooling already in the repo (extend the existing ESLint config, not introduce Biome alongside it). One concrete action per row, or blank if the score is 5.

Then an aggregate, computed the same way every time:

Pillar score — the mean of that pillar’s item scores, rounded to one decimal place.
Overall score — the unweighted mean of the four pillar scores, rounded to one decimal place.

Average the pillars, not the items, so each pillar counts equally regardless of how many items it has — and don’t weight any pillar above the others.

##The rubric

Each item links to where the best-practice guidance lives on /guides, /verification, or /observation. Those sections are bundled into the rubric page the agent reads, so it grounds each score and recommendation without fetching anything.

###Guides

AGENTS.md at root — exists in the portable form, with tool aliases like CLAUDE.md symlinked to it; focused enough that the agent will actually read it.
Commands documented — build, test, lint, deploy paths the agent can run.
Boundaries declared — explicit “always do / never do / ask first.”
Path-scoped rules — used where the codebase has distinct subtrees (e.g. apps/api vs apps/web).
Skills — repeatable workflows captured in SKILL.md format per the agentskills.io spec, placed in .agents/skills/ with tool-specific paths configured or symlinked.
Hooks — mechanical enforcement for what an advisory rule can’t reliably guarantee.

###Verification

Types — strict mode on; escape hatches (any, untyped) blocked at the linter.
Linting — present, with rules targeting agent anti-patterns (stubs, untyped, etc.).
Security scanning — security ruleset wired into CI (Semgrep, Snyk, or equivalent).
Tests — unit and integration in place; tests don’t mock the system they’re verifying.
Agentic review — PR review by other agents — hosted (e.g. CodeRabbit, Greptile) or a local subagent set.
Mutation testing — for high-stakes code, signal beyond line coverage.
Guardrails — autonomous or background loops have explicit stopping conditions: a max-iteration ceiling, no-progress detection, and a token/dollar budget cap.

###Observation

Error tracking — exceptions captured with stack traces, source maps, and deploy markers that tie incidents back to the PR that caused them.
Usage analytics — meaningful user actions instrumented per change; activation cohorts and drop-off measurable, not just pageviews.
Agentic investigation — an agent (hosted like Vercel Agent, or a custom loop with MCP access to logs) runs on alert and posts a root-cause hypothesis before a human looks.

###Closing the loop

Durable learnings — recurring defects get extracted into rules / tests / hooks, not absorbed as one-off prompt fixes.
Persistence — a mechanism that carries learnings across sessions (memory file, /learn skill, SessionEnd hook, or similar).