This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.
npx playbooks add skill plaited/agent-eval-harness --skill agent-eval-harness
This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.. This skill provides a specialized system prompt that configures your AI coding agent as an agent eval harness expert, with detailed methodology and structured output formats.
Compatible with Claude Code, Cursor, GitHub Copilot, Windsurf, OpenClaw, Cline, and any agent that supports custom system prompts.
This skill helps you evaluate CLI agent trajectories by capturing full runs and providing structured JSONL for downstream scoring.