NVIDIA PersonaPlex-7B: Open Source Full-Duplex Voice AI

Q: What is NVIDIA PersonaPlex-7B?

PersonaPlex-7B-v1 is a 7 billion parameter speech-to-speech AI model from NVIDIA that enables real-time, full-duplex voice conversations. It can listen and speak simultaneously, handle interruptions naturally, and maintain customizable personas through hybrid prompting.

Q: How is PersonaPlex different from regular voice assistants?

Traditional voice assistants use a three-stage pipeline (speech recognition, language model, text-to-speech) that creates delays and cannot handle overlapping speech. PersonaPlex uses a single model that processes audio in real time, enabling natural conversation with sub-second latency of 0.205-0.265 seconds.

Q: Is PersonaPlex free to use?

Yes. The model weights are released under the NVIDIA Open Model License and the code is MIT-licensed. Both permit commercial use. You can download everything from Hugging Face and GitHub at no cost.

Q: What hardware do I need to run PersonaPlex?

PersonaPlex requires NVIDIA GPUs, specifically Ampere or Hopper architecture cards like the A100 or H100. It is not currently optimized for consumer GPUs or non-NVIDIA hardware.

Q: Does PersonaPlex support languages other than English?

Not yet. The current release is English-only. The training data is entirely in English, using the Fisher English corpus plus English synthetic conversations.

Q: How does persona control work in PersonaPlex?

PersonaPlex uses hybrid prompting. A text prompt defines the role, background, and scenario (such as 'You work for First Neuron Bank and your name is Sanni Virtanen'). A voice prompt provides an audio embedding that controls vocal characteristics like accent, tone, and speaking style. Together, they create a consistent persona.

By GenMediaLab • February 16, 2026 • 6 min read

Key Takeaways

✓ NVIDIA releases PersonaPlex-7B-v1, a 7 billion parameter speech-to-speech model that listens and speaks at the same time
✓ Full-duplex design eliminates the pause-talk-pause cycle of traditional voice assistants with sub-second latency (0.205-0.265s)
✓ Hybrid prompting lets you define any persona through text descriptions plus audio-based voice conditioning
✓ Outperforms Gemini Live, Qwen 2.5 Omni, and Moshi on conversational dynamics and task adherence benchmarks
✓ 100% open source: model weights under NVIDIA Open Model License, code under MIT

What Happened

NVIDIA has released PersonaPlex-7B-v1, a 7 billion parameter speech-to-speech model that fundamentally changes how voice AI handles conversation. Unlike every voice assistant you have used before, PersonaPlex does not wait for you to finish talking before it starts responding. It listens and speaks at the same time.

This is called full-duplex interaction, and it is the same way humans naturally converse. You can interrupt it mid-sentence, and it adapts. It produces backchannels like “uh-huh” and “oh, okay” while you are still speaking. It pauses when appropriate. No rigid turn-taking. No awkward silence while the AI processes your words.

🧠 7B Parameters

⚡ 0.2s Avg Latency

📖 MIT Code License

📊 <5K hrs Training Data

Fully Open Source

PersonaPlex-7B-v1 is released under the NVIDIA Open Model License (weights) and MIT License (code). Both permit commercial use. Download from Hugging Face or GitHub.

Why Traditional Voice AI Falls Short

Traditional voice assistants run a three-stage pipeline that creates an unnatural conversation flow:

The cascaded pipeline behind Siri, Alexa, and Google Assistant

Stage	Process	Problem
1. ASR	Automatic Speech Recognition converts speech to text	Adds latency
2. LLM	Language model generates a text response	Cannot hear you while thinking
3. TTS	Text-to-Speech converts response to audio	More latency, no overlap

Each stage adds delay, and the system cannot hear you while it is generating a response. This is why conversations with Siri, Alexa, or Google Assistant feel robotic. You speak, wait, get a response, speak again.

PersonaPlex replaces this entire pipeline with a single Transformer model that processes incoming audio and generates speech simultaneously.

Core Capabilities

🔄

Full-Duplex Conversation

Listens and speaks simultaneously with natural interruptions, backchannels, and rapid turn-taking - no waiting required

🎭

Hybrid Persona Control

Define any role through text prompts (personality, business rules) plus audio voice conditioning (accent, tone, prosody)

⚡

Sub-Second Latency

Average response time of 0.205-0.265 seconds - 5.7x faster than Moshi, the model it builds on

🧠

Emergent Generalization

Handles scenarios outside its training data, like technical crisis management, thanks to the Helium language model backbone

🎙️

Non-Verbal Cues

Produces pauses, emotional tones, stress, urgency, and contextual responses that mirror human conversation patterns

🔓

Commercial-Ready Open Source

NVIDIA Open Model License (weights) and MIT (code) allow full commercial deployment and modification

How PersonaPlex Works

Dual-Stream Architecture

PersonaPlex is built on the Moshi architecture from Kyutai, with Helium as the underlying language model backbone. The architecture uses two parallel streams:

User stream - continuously encodes incoming audio from the user’s microphone
Agent stream - simultaneously generates the AI’s speech and text response

Both streams share the same model state. This means PersonaPlex can adjust its response in real time as the user speaks, enabling barge-in, overlapping speech, rapid turn-taking, and contextual backchannels.

The Mimi neural audio codec handles audio encoding and decoding at 24 kHz, converting waveforms into discrete tokens that the Transformer can process.

Hybrid Persona Control

PersonaPlex uses two inputs to define conversational identity:

Text prompt - describes the role, background, organization, and conversation context (up to 200 tokens)
Voice prompt - an audio embedding that captures vocal characteristics, speaking style, accent, and prosody

This hybrid approach lets you create a customer service agent for a specific company with a specific voice, a wise teacher who sounds warm and patient, or a fantasy character with dramatic inflection. The persona stays consistent throughout the entire conversation.

Demonstrated Personas

PersonaPlex maintains persona consistency across extended conversations

Persona

Scenario

Key Behavior

Wise Teacher

General Q&A assistant

Natural turn-taking, broad knowledge

Bank Agent (Sanni Virtanen)

Flagged transaction investigation

Empathy, identity verification, accent control

Medical Receptionist

New patient registration

Records details from speech, maintains confidentiality

Astronaut (Alex)

Reactor core emergency on Mars mission

Stress, urgency, technical reasoning outside training data

Beyond Training Data

The astronaut scenario is particularly notable. Emergency crisis management, reactor physics vocabulary, and emotional urgency were never in the training data. PersonaPlex generalized from its Helium language model backbone to handle entirely new domains.

Benchmark Results

NVIDIA evaluated PersonaPlex on FullDuplexBench and a new extension called ServiceDuplexBench for customer service scenarios. The results show clear advantages over both open-source and commercial alternatives.

Conversational Dynamics

Success rate (higher is better)

Metric	PersonaPlex	Moshi	Gemini Live	Qwen 2.5 Omni
Smooth Turn Taking	90.8%	1.8%	43.9%	N/A
User Interruption	95.0%	65.3%	54.7%	N/A
Pause Handling	60.6%	33.6%	65.5%	N/A

Latency

Response time in seconds (lower is better)

Metric	PersonaPlex	Moshi	Gemini Live
Smooth Turn Taking	0.170s	0.953s	N/A
User Interruption	0.240s	1.409s	N/A
Average	0.205s	1.181s	N/A

Task Adherence

GPT-4o judge score out of 5 (higher is better)

Benchmark	PersonaPlex	Moshi	Gemini Live	Qwen 2.5 Omni
FullDuplexBench	4.29	0.77	3.38	4.59
ServiceDuplexBench	4.40	1.75	4.73	2.76
Average	4.34	1.26	4.05	3.68

PersonaPlex is the only model that scores above 4.0 on both benchmarks, combining strong general knowledge with reliable task-following in structured business scenarios.

Training: Less Than 5,000 Hours

PersonaPlex was trained in a single stage using a carefully designed blend of real and synthetic conversations.

Real Conversations

7,303 calls (1,217 hours) from the Fisher English corpus provided natural conversational patterns - backchannels, disfluencies, emotional responses, and authentic turn-taking behavior. These recordings were back-annotated with persona prompts using GPT-OSS-120B at varying levels of detail.

Synthetic Conversations

39,322 assistant dialogs (410 hours) - generated with Qwen3-32B and GPT-OSS-120B, synthesized to audio with Chatterbox TTS from Resemble AI
105,410 customer service dialogs (1,840 hours) - covering various business scenarios with structured prompts including company names, pricing, and operational rules

The training design disentangles two qualities: naturalness from real conversations and task adherence from synthetic scenarios. The hybrid prompt format bridges both data sources, letting the model combine natural speech patterns with precise instruction following.

What This Means for Voice AI

PersonaPlex represents a significant shift in what open-source voice AI can do. Until now, the choice was between customizable but robotic cascaded systems and natural but inflexible full-duplex models. PersonaPlex eliminates that trade-off.

For Developers

The model is ready for commercial use. Developers building voice agents, customer service bots, or interactive characters now have an open-source foundation that rivals proprietary systems. The MIT-licensed code means full freedom to modify and deploy.

For the Voice AI Industry

Full-duplex interaction has been the holy grail of conversational AI. Google, OpenAI, and others have invested heavily in making voice assistants feel more natural. NVIDIA has now open-sourced a model that achieves this at the 7B parameter scale, lowering the barrier for anyone to build truly conversational voice interfaces.

For Creators and Businesses

Voice-first interfaces are accelerating across customer service, accessibility tools, gaming, and content creation. PersonaPlex’s persona control makes it practical for specific business use cases where the AI needs to sound on-brand and follow structured scripts while still feeling human.

Explore AI Voice Technology

Compare the best AI voice generators for text-to-speech, voice cloning, and conversational AI.

Try ElevenLabs Free →

Current Limitations

Early Release Constraints

PersonaPlex-7B-v1 is an impressive first release, but there are constraints to be aware of before deploying.

English only - no multilingual support yet
Requires NVIDIA GPUs - optimized for Ampere and Hopper architectures (A100, H100)
Limited training data - under 5,000 hours, which may restrict performance in niche dialects or specialized domains
No production safety testing - NVIDIA notes that bias, explainability, and privacy concerns need additional testing before production deployment

How to Get Started

Everything you need to run PersonaPlex

Resource

Link

License

Model Weights

Hugging Face

NVIDIA Open Model License — commercial use permitted

Source Code

GitHub

MIT License — no restrictions

Research Paper

NVIDIA ADLR

Open Access

Base Model (Moshi)

Kyutai

CC-BY-4.0 — share with attribution

Quick Start (5 minutes)

Requires a Linux machine with an NVIDIA GPU (Ampere or Hopper) and Python installed.

1. Install the audio codec and clone the repo:

# Ubuntu/Debian
sudo apt install libopus-dev

# Clone and install
git clone https://github.com/NVIDIA/personaplex.git
cd personaplex
pip install moshi/.

2. Accept the model license on Hugging Face, then set your token:

export HF_TOKEN=your_token_here

3. Launch the server (auto-generates temporary SSL certs):

SSL_DIR=$(mktemp -d); python -m moshi.server --ssl "$SSL_DIR"

4. Open https://localhost:8998 in your browser. Start talking — PersonaPlex responds in real time.

Low GPU Memory?

Add --cpu-offload to the server command to offload layers to CPU. Requires pip install accelerate first.

FAQ

What is NVIDIA PersonaPlex-7B?

PersonaPlex-7B-v1 is a 7 billion parameter speech-to-speech AI model from NVIDIA that enables real-time, full-duplex voice conversations. It can listen and speak simultaneously, handle interruptions naturally, and maintain customizable personas through hybrid prompting.

How is PersonaPlex different from regular voice assistants?

Traditional voice assistants use a three-stage pipeline (speech recognition, language model, text-to-speech) that creates delays and cannot handle overlapping speech. PersonaPlex uses a single model that processes audio in real time, enabling natural conversation with sub-second latency of 0.205-0.265 seconds.

Is PersonaPlex free to use?

Yes. The model weights are released under the NVIDIA Open Model License and the code is MIT-licensed. Both permit commercial use. You can download everything from Hugging Face and GitHub at no cost.

What hardware do I need to run PersonaPlex?

PersonaPlex requires NVIDIA GPUs, specifically Ampere or Hopper architecture cards like the A100 or H100. It is not currently optimized for consumer GPUs or non-NVIDIA hardware.

Does PersonaPlex support languages other than English?

Not yet. The current release is English-only. The training data is entirely in English, using the Fisher English corpus plus English synthetic conversations.

How does persona control work in PersonaPlex?

PersonaPlex uses hybrid prompting. A text prompt defines the role, background, and scenario (such as 'You work for First Neuron Bank and your name is Sanni Virtanen'). A voice prompt provides an audio embedding that controls vocal characteristics like accent, tone, and speaking style. Together, they create a consistent persona.

Sources

Was this article helpful?

Last Updated: February 16, 2026

Affiliate Disclosure: This review contains affiliate links. If you purchase through our links, we may earn a commission at no additional cost to you. We only recommend tools we've personally tested and believe provide genuine value to our readers.

Key Takeaways

What Happened

Why Traditional Voice AI Falls Short

Core Capabilities

Full-Duplex Conversation

Hybrid Persona Control

Sub-Second Latency

Emergent Generalization

Non-Verbal Cues

Commercial-Ready Open Source

How PersonaPlex Works

Dual-Stream Architecture

Hybrid Persona Control

Demonstrated Personas

Benchmark Results

Conversational Dynamics

Latency

Task Adherence

Training: Less Than 5,000 Hours

Real Conversations

Synthetic Conversations

What This Means for Voice AI

For Developers

For the Voice AI Industry

For Creators and Businesses

Explore AI Voice Technology

Current Limitations

How to Get Started

Quick Start (5 minutes)

FAQ

Sources

Related Articles

Voice AI Rising: How Audio Assistants Are Set to Dominate 2026

Chatterbox: Open Source TTS That Beats ElevenLabs

Best AI Voice Generators 2026