This skill helps you build robust LLM evaluation systems by applying direct scoring, pairwise comparisons, rubrics, and bias mitigation.
npx playbooks add skill ken-cavanagh-glean/fieldkit --skill advanced-evaluation
This skill helps you build robust LLM evaluation systems by applying direct scoring, pairwise comparisons, rubrics, and bias mitigation.
The 19-word prompt provides structured testing & qa guidance — covering detailed methodology and consistent output formats. Install it in one command.
This skill helps you build robust LLM evaluation systems by applying direct scoring, pairwise comparisons, rubrics, and bias mitigation.
Advanced Evaluation is a free testing & qa skill for AI coding agents. This skill helps you build robust LLM evaluation systems by applying direct scoring, pairwise comparisons, rubrics, and bias mitigation.. It provides a specialized system prompt that configures your agent with testing & qa expertise.
Run npx playbooks add skill ken-cavanagh-glean/fieldkit --skill advanced-evaluation in your terminal to install Advanced Evaluation into your Claude Code session. It works immediately after installation.
Advanced Evaluation is compatible with Claude Code, Cursor, GitHub Copilot, Windsurf, OpenClaw, Cline, and any AI agent that supports custom system prompts or .cursorrules files.
Yes, Advanced Evaluation is completely free and open source. The full source is available on GitHub at https://github.com/ken-cavanagh-glean/fieldkit/tree/main/plugins/context-engineering/skills/context-engineering/advanced-evaluation. You only need a subscription to the AI agent you use it with.
Weekly roundup of top Claude Code skills, MCP servers, and AI coding tips.