This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.
npx playbooks add skill plaited/agent-eval-harness --skill agent-eval-harness
This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.
This compact 19-word instruction set is purpose-built for testing & qa work in AI coding agents. Install with a single command.
This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.
Agent Eval Harness is a free testing & qa skill for AI coding agents. This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.. It provides a specialized system prompt that configures your agent with testing & qa expertise.
Run npx playbooks add skill plaited/agent-eval-harness --skill agent-eval-harness in your terminal to install Agent Eval Harness into your Claude Code session. It works immediately after installation.
Agent Eval Harness is compatible with Claude Code, Cursor, GitHub Copilot, Windsurf, OpenClaw, Cline, and any AI agent that supports custom system prompts or .cursorrules files.
Yes, Agent Eval Harness is completely free and open source. The full source is available on GitHub at https://github.com/plaited/agent-eval-harness/tree/main/.agents/skills/agent-eval-harness. You only need a subscription to the AI agent you use it with.
Weekly roundup of top Claude Code skills, MCP servers, and AI coding tips.