Best AI Voice Generators 2026: Top 4 Tested
Best AI voice generators for 2026 tested: ElevenLabs, Murf AI, Speechify, and LOVO compared on quality, cloning, and pricing from $5/mo with audio samples.
Read Article →
xAI shipped Custom Voices on April 30, 2026, adding voice cloning to its Grok API platform. Users record about 60 seconds of natural speech through the xAI console, and the system returns a production-ready voice model in under two minutes. The cloned voice works across Grok’s Text-to-Speech and Voice Agent APIs at standard API rates. xAI also expanded its built-in voice catalog to over 80 options in 28 languages.
At $3/hour for voice agents, xAI is undercutting ElevenLabs and OpenAI on price by a wide margin. The feature set is thinner, but the economics change the math for anyone building voice into a product.
The cloning process runs entirely through the xAI console. Users read aloud several passages of unrelated dialog while the system records. A two-stage verification pipeline handles the rest: first, the speaker reads a verification phrase that Grok’s speech-to-text engine transcribes and matches in real time, confirming intent and presence. Then the system computes speaker embeddings from the verification clip and the full recording to confirm both belong to the same person.
This design means you cannot clone a voice from a pre-existing audio file, and you cannot clone someone else’s voice. Once verified, the system processes the recording and delivers an 8-character alphanumeric voice_id that works wherever xAI’s built-in voices do. Each team can create up to 30 custom voices simultaneously, and any voice can be deleted with a single click.
Record about a minute of natural speech. The system delivers a production-ready voice model in under two minutes.
Speaker verification via real-time phrase matching and embedding comparison blocks unauthorized cloning.
Custom voices inherit multilingual TTS capabilities including speech tags, laughter, whispers, and pauses.
Every custom voice is private to your team. Never shared with other users or used for model training.
The Voice Library is a new section in the xAI console that puts all available voices in one place. Custom voices show up next to the five built-in options (Eve, Ara, Rex, Sal, and Leo). With this launch, xAI also expanded the pre-built catalog to over 80 voices in 28 languages. You can preview any voice across different scenarios before picking one.
Each built-in voice has a different personality: Eve is energetic, Ara is warm and conversational, Rex leans professional, Sal is smooth, and Leo sounds authoritative. Custom voices get the same TTS capabilities as built-ins, including inline speech tags for whispers, laughter, sighs, and emphasis. Output works over both REST and WebSocket streaming.
There is no extra charge for using custom voices. The pricing follows standard xAI API rates:
xAI Voice API pricing as of May 2026
| Service | Pricing | Notes |
|---|---|---|
| Text-to-Speech | $4.20 / 1M characters | 5 built-in + custom voices, 28 languages |
| Voice Agent (real-time) | $3.00 / hour ($0.05/min) | Speech-to-speech via WebSocket |
| Speech-to-Text (streaming) | $0.20 / hour | Real-time transcription |
| Speech-to-Text (batch) | $0.10 / hour | Offline processing |
| Custom Voice creation | Free | Included with API access |
The Voice Agent API runs on grok-voice-think-fast-1.0, which combines reasoning with real time speech. It supports tool use — web search, X search, file search, and external MCP server connections — so the agent can actually do things mid-conversation, not just talk. For client side applications, Ephemeral Tokens let you open WebSocket connections without exposing your main API key.
Programmatic access to the custom voice creation endpoint (POST /v1/custom-voices) is currently limited to teams on an Enterprise plan. The console-based voice creation tool is open to all users with API access.
Custom Voices is available through the xAI console. Full API documentation and voice creation tools are at docs.x.ai/docs/guides/voice.
The pricing difference between xAI and ElevenLabs is large, though they’re not selling the exact same thing:
Comparison based on publicly available pricing as of May 2026
| Feature | xAI Custom Voices | ElevenLabs |
|---|---|---|
| Voice Agent (per hour) | $3.00 | $10.80 - $18.00 |
| TTS (per 1M chars) | $4.20 | ~$3.00 - $18.00 (varies by plan) |
| Built-in Voice Library | 80+ voices, 28 languages | 3,000+ voices, 32+ languages |
| Voice Clone Time | ~60 seconds recording | ~30 seconds recording |
| Clone API Access | Enterprise plan only | Starter plan and above |
| Geographic Availability | US only (excl. Illinois) | Global |
| Safety Verification | Two-stage speaker verification | Voice consent system |
| Marketplace | No | Iconic Marketplace (licensed voices) |
ElevenLabs still has the bigger voice library, works everywhere, and runs the Iconic Marketplace for licensed celebrity voices. xAI wins on voice agent pricing and doesn’t charge for custom voice creation. ElevenLabs requires at least a Starter subscription ($5/month) before you can clone anything.
xAI Custom Voices are currently restricted to users in the United States, with Illinois excluded due to the state’s Biometric Information Privacy Act (BIPA). ElevenLabs operates globally with no geographic restrictions on voice cloning access.
If you’re outside the US or need access to a bigger voice catalog, ElevenLabs works globally and has 3,000+ voices available today.
xAI’s two-stage verification is stricter than what most voice cloning platforms require. The real time phrase matching confirms the speaker is physically present during the cloning session, not submitting a pre-recorded file. The embedding comparison then checks that the verification phrase and the full recording actually come from the same person.
Custom voices stay private to the team that created them. xAI says audio data is processed in real time and never stored or used for training. The platform has SOC 2 Type II certification, HIPAA eligibility, and GDPR compliance for European data — though the cloning feature itself is still US only.
$3/hour voice agents change the economics for anyone running voice at volume. Customer support bots and IVR systems that cost $10-18/hour on ElevenLabs suddenly make more sense on xAI’s stack. The OpenAI Realtime API compatibility also means existing voice apps built for OpenAI can switch over without rewriting much code.
Voice cloning now has three tiers. ElevenLabs has the most features, the biggest library, and global reach — we cover the full landscape in our best AI voice generators roundup. OpenAI sits in the middle with TTS in ChatGPT. xAI is the cheapest option by far, with stricter verification than either competitor.
The US-only restriction matters a lot. Anyone outside the States still can’t create custom voices, which keeps ElevenLabs as the default internationally. For free alternatives, see our best free voice cloning tools guide. If xAI opens this up to more countries, the pricing pressure on everyone else gets real.
xAI Custom Voices lets users clone their voice by recording about 60 seconds of natural speech through the xAI console. The system runs a two-stage verification process: first matching a spoken passphrase in real time, then comparing speaker embeddings to confirm identity. The result is an 8-character voice ID that works across all xAI voice APIs including Text-to-Speech and Voice Agent.
Creating a custom voice on xAI is free. The cost comes from API usage: Text-to-Speech runs $4.20 per million characters, and the Voice Agent API costs $3.00 per hour ($0.05 per minute) for real-time speech-to-speech interactions. There is no additional charge for using a custom voice instead of a built-in one.
No. As of May 2026, xAI Custom Voices is restricted to users in the United States, with Illinois excluded due to the state's Biometric Information Privacy Act. xAI has not announced a timeline for international expansion. Users outside the US can still access xAI's built-in TTS voices but cannot create custom voice clones.
xAI undercuts ElevenLabs on pricing: $3/hour for voice agents vs $10-18/hour for ElevenLabs. ElevenLabs leads on features with 3,000+ voices, 32+ languages, global availability, and the Iconic Marketplace for licensed voices. xAI has stricter safety verification with two-stage speaker matching but is currently limited to the US market.
No. xAI's two-stage verification process requires the speaker to be physically present during cloning. The user must read a verification phrase aloud in real time, and the system compares voice embeddings between the passphrase and the full recording to confirm they match. Pre-existing recordings cannot be used, and cloning someone else's voice is blocked by the verification pipeline.