ElevenLabs Extends Google Cloud Deal, NVIDIA Blackwell GPUs

By GenMediaLab 5 min read
Studio headphones on desk with audio waveform monitors for the ElevenLabs Google Cloud voice AI partnership

ElevenLabs has signed a multi-year extension of its Google Cloud partnership, gaining access to G4 virtual machines powered by NVIDIA RTX PRO 6000 Blackwell GPUs. The deal also integrates Google’s Gemini models into ElevenLabs’ Agents Platform and Veo into its Creative Platform for synchronized video and audio production.

Key Takeaways

  • Multi-year Google Cloud extension brings NVIDIA Blackwell GPUs for faster voice model training and inference
  • Gemini models now power reasoning and multi-step planning inside ElevenLabs voice agents
  • Veo integration enables teams to produce synchronized video and audio content from one workflow
  • ElevenLabs solutions are now available on Google Cloud Marketplace with GCP commit credit support
  • Enterprise customers get faster inference, lower latency, and real-time voice agents in 70+ languages

Try ElevenLabs Voice AI

Build voice agents, generate speech in 70+ languages, and access the full ElevenLabs platform.

Try ElevenLabs Free →

What the Partnership Includes

The expanded collaboration covers three core areas: infrastructure, model integration, and enterprise distribution.

70+ Languages Supported
G4 VMs NVIDIA Blackwell GPUs
4.7/5 GenMediaLab Rating
Multi-Year Partnership Duration

Infrastructure: ElevenLabs will run its voice models on Google Cloud G4 virtual machines equipped with NVIDIA RTX PRO 6000 Blackwell GPUs. These VMs offer up to 96 GB of memory per GPU, up to 768 GB total GDDR7 memory, and up to 9x throughput compared to previous-generation G2 instances. The larger GPU cluster supports faster training cycles and lower-latency inference for enterprise deployments.

Model Integration: Google’s Gemini models are being integrated into ElevenLabs’ Agents Platform for advanced reasoning and multi-step planning in voice assistants. Separately, Google’s Veo video generation model is being added to ElevenLabs’ Creative Platform, allowing teams to produce video and audio content together.

Enterprise Distribution: ElevenLabs solutions are now listed on Google Cloud Marketplace, enabling enterprises to purchase and deploy voice AI tools with simplified billing and compliance. Existing GCP commit credits can be applied toward ElevenLabs services.

NVIDIA Blackwell: What It Means for Voice AI

The G4 VMs represent a significant hardware upgrade for ElevenLabs’ infrastructure. NVIDIA Blackwell GPUs include fourth-generation Tensor Cores and RT cores, purpose-built for AI workloads.

Faster Inference

Up to 9x throughput vs. G2 instances for lower-latency voice generation

Larger Model Training

768 GB GDDR7 memory supports training bigger multimodal models

Flexible Scaling

Configurations from 1 to 8 GPUs with MIG partitioning for workload isolation

Global Reach

Google Cloud's infrastructure delivers consistent performance across regions

ElevenLabs co-founder Mati Staniszewski said the hardware upgrade directly impacts product quality: “Now with G4 VMs powered by NVIDIA Blackwell, we’re pushing our multimodal models even further - faster inference, better reliability, instant replies across languages. The goal stays the same: making voice agents that work at enterprise scale without compromise.”

Ian Buck, VP and GM of Hyperscale and HPC at NVIDIA, added: “This is exactly the kind of ecosystem innovation we envisioned with Blackwell - helping pioneers like ElevenLabs bring smarter, more responsive AI agents and media tools to every industry.”

Gemini Powers ElevenLabs Voice Agents

The Agents Platform integration brings Gemini’s reasoning capabilities to ElevenLabs voice assistants. Gemini handles the “thinking” layer - understanding context, planning multi-step responses, and calling functions - while ElevenLabs handles the voice layer with low-latency text-to-speech.

This combination targets enterprise use cases where voice agents need to handle complex conversations: customer support with multiple systems, sales calls that pull product data, and training simulations that adapt to learner responses.

How It Works

Gemini provides ultra-fast reasoning and function calling as the AI brain behind voice agents. ElevenLabs delivers the human-like voice output. Together, they create conversational AI that can understand intent, retrieve information, and respond naturally in real time.

Veo Integration: Video Meets Voice

The Creative Platform integration brings Google’s Veo video generation model alongside ElevenLabs’ audio tools. Teams can generate video content and add voiceovers, sound effects, and narration within one production workflow.

Target use cases include advertising, corporate training, internal communications, and customer education - scenarios where organizations need both professional video and voice content at scale.

Matt Renner, President and Chief Revenue Officer at Google Cloud, framed the partnership in enterprise terms: “By leveraging Google Cloud’s full AI stack, including our leading AI models, as well as cutting-edge accelerated computing platforms from NVIDIA, ElevenLabs is making it possible for companies to transform how they interact with users.”

Google Cloud Marketplace Availability

ElevenLabs’ text-to-speech, conversational AI, and dubbing solutions are now available directly through Google Cloud Marketplace. This matters for enterprise procurement because it means:

  • Simplified billing through existing Google Cloud accounts
  • GCP commit credits can be applied toward ElevenLabs services
  • Compliance alignment with Google Cloud’s security certifications
  • Faster deployment without separate vendor onboarding

Dai Vu, Managing Director of Marketplace and ISV GTM Programs at Google Cloud, noted: “Bringing ElevenLabs’ solution to Google Cloud Marketplace will help customers quickly deploy, manage, and grow the text-to-speech, dubbing, and conversational AI on Google Cloud’s trusted, global infrastructure.”

What This Means

This partnership reflects a broader trend in AI: voice technology is moving from standalone APIs to deeply integrated enterprise infrastructure. ElevenLabs is no longer just a text-to-speech provider - following moves like Scribe v2 for speech-to-text and the Iconic Voice Marketplace, it is positioning itself as a full voice AI platform backed by hyperscaler compute.

For creators and businesses evaluating voice AI tools, the practical implications are:

  • Lower latency for real-time applications like live dubbing and voice agents
  • Better model quality from training on more powerful hardware
  • Easier procurement for organizations already on Google Cloud
  • Multimodal workflows combining Veo video with ElevenLabs audio

The Gemini integration is particularly significant. Voice agents that can reason through complex requests and pull data from multiple systems represent the next phase of conversational AI beyond simple question-and-answer chatbots.

Build Voice Agents with ElevenLabs

Access text-to-speech, voice cloning, conversational AI, and dubbing in 70+ languages on a single platform.

Get Started with ElevenLabs →

FAQ

What are NVIDIA Blackwell GPUs used for in this partnership?

ElevenLabs uses NVIDIA RTX PRO 6000 Blackwell GPUs through Google Cloud G4 virtual machines to train and serve its voice AI models. These GPUs provide up to 9x throughput compared to previous-generation instances, resulting in faster inference, lower latency, and support for training larger multimodal models.

How does Gemini integrate with ElevenLabs?

Google's Gemini models are integrated into ElevenLabs' Agents Platform to handle reasoning and multi-step planning for voice assistants. Gemini acts as the AI brain that understands context and calls functions, while ElevenLabs provides the human-like voice output for the conversation.

Can I use GCP credits for ElevenLabs services?

Yes, enterprise customers with existing Google Cloud Platform commit credits can apply them toward ElevenLabs voice AI services purchased through Google Cloud Marketplace. This includes text-to-speech, conversational AI, and dubbing solutions.

What is the Veo integration for?

Google's Veo video generation model is being integrated into ElevenLabs' Creative Platform, allowing teams to produce both video and audio content within one workflow. This targets use cases like advertising, corporate training, and customer education where organizations need synchronized video and voice content.

How many languages does ElevenLabs support?

ElevenLabs supports content creation and localization in over 70 languages. The expanded Google Cloud partnership provides the infrastructure to deliver real-time voice agents and text-to-speech across all supported languages with consistent low latency.


Sources

  1. ElevenLabs Blog: ElevenLabs and Google Cloud
  2. PR Newswire: ElevenLabs Partners with Google Cloud
  3. Business Today: ElevenLabs Doubles Down On Google Cloud
  4. Google Cloud Blog: G4 VMs Powered by NVIDIA Blackwell

Was this article helpful?