LLM SVG Generation Benchmark

Comparing 9 (or 10) leading AI models on 30 creative SVG generation prompts

This benchmark tests Claude Sonnet 4.5 (Anthropic), Claude Opus 4.5 (Anthropic), Grok Code Fast 1 (xAI, 314B MoE), Gemini 2.5 Pro (Google), Gemini 3.0 Pro Preview (Google), DeepSeek V3.2-Exp (685B/37B MoE), GLM-4.6 (Zhipu AI, 355B/32B MoE), Qwen3-VL-235B-A22B-Thinking (Alibaba, 235B/22B MoE), and GPT-5.1 (OpenAI) on their ability to generate creative SVG graphics from natural language prompts. Gemini 3.0 Pro Preview images were added on November 19, 2025, after the release of Gemini 3.0 on November 18, 2025. GPT-5.1 images were added on November 19, 2025. Claude Opus 4.5 images were added on November 25, 2025, bringing the total number of models to 9. And then, on December 12, 2025, the first three prompts were run through GPT-5.2 Pro at an average cost of about 80 cents each.

Inspired by Simon Willison's pelican-riding-a-bicycle benchmark.