📝 Ai Multimodal

22installs

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e

QUICK INSTALL
npx playbooks add skill samhvw8/dot-claude --skill ai-multimodal

About Ai Multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e

The 38-word prompt provides structured documentation guidance — covering detailed methodology and consistent output formats. Install it in one command.

Use Cases

  • Writing API references and README files
  • Generating inline code comments and docstrings
  • Creating architecture decision records (ADRs)
  • Keeping docs in sync with code changes

Example Prompts

Get started Help me use the Ai Multimodal skill effectively.

System Prompt (38 words)

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e

Frequently Asked Questions

What is Ai Multimodal?

Ai Multimodal is a free documentation skill for AI coding agents. Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e. It provides a specialized system prompt that configures your agent with documentation expertise.

How do I use Ai Multimodal with Claude Code?

Run npx playbooks add skill samhvw8/dot-claude --skill ai-multimodal in your terminal to install Ai Multimodal into your Claude Code session. It works immediately after installation.

Which AI coding agents work with Ai Multimodal?

Ai Multimodal is compatible with Claude Code, Cursor, GitHub Copilot, Windsurf, OpenClaw, Cline, and any AI agent that supports custom system prompts or .cursorrules files.

Is Ai Multimodal free to use?

Yes, Ai Multimodal is completely free and open source. The full source is available on GitHub at https://github.com/samhvw8/dot-claude/tree/main/skills/ai-multimodal. You only need a subscription to the AI agent you use it with.

Related Skills

Get the best new skills
in your inbox

Weekly roundup of top Claude Code skills, MCP servers, and AI coding tips.