Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e
npx playbooks add skill samhvw8/dot-claude --skill ai-multimodal
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e. This skill provides a specialized system prompt that configures your AI coding agent as an ai multimodal expert, with detailed methodology and structured output formats.
Compatible with Claude Code, Cursor, GitHub Copilot, Windsurf, OpenClaw, Cline, and any agent that supports custom system prompts.
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF e
Weekly roundup of top Claude Code skills, MCP servers, and AI coding tips.