A Major Leap in Emotional and Multilingual Text-to-Speech Tech
ElevenLabs, the company behind some of the internet’s most viral AI voices, has unveiled a new frontier in text-to-speech generation with Eleven v3 (alpha) — a model the company calls its most expressive to date.
Unveiled on June 3, 2025, Eleven v3 marks a significant shift in synthetic voice capabilities, enabling nuanced emotional delivery, natural multi-speaker dialogue, and support for over 70 languages. But this isn’t just about broader linguistic coverage — it’s about realism and responsiveness.
The new model introduces inline audio tags, such as [whispers], [sighs], and [excited], giving creators granular control over the tone and delivery of generated speech. The result: output that doesn’t just sound human, but feels human. In example clips, voices whisper conspiratorially, erupt in laughter mid-sentence, and capture subtle emotional cues that were previously unattainable in synthetic speech.
Subscribe to Startup Digest to stay ahead with the latest news, investments, and must-attend events.
According to ElevenLabs, the release is tailored for use cases that demand emotional depth — such as audiobooks, video narration, gaming dialogue, and immersive storytelling. For live or real-time interactions, the company recommends sticking with its v2.5 Turbo or Flash models due to v3’s higher latency and increased demand for prompt engineering.
Key Upgrades in Eleven v3 (alpha):
What It Enables
Audio Tags: Control tone, emotion, and non-verbal reactions inline
Dialogue Mode: Seamless multi-speaker conversations with interruptions
70+ Language Support: Broad accessibility and localization for global creators
Deeper Text Understanding: Improved cadence, stress, and contextual nuance from input
This launch comes amid growing demand for AI-generated voices that go beyond robotic clarity. Previous versions of ElevenLabs were already adopted by professionals in film, gaming, education, and accessibility — but the missing ingredient, the company admits, was expressiveness. The v3 model was built to close that gap, aiming for voices that “sigh, whisper, laugh, and react.”
Early demos show remarkable emotional range — from whispery suspense to chaotic laughter — and new Text to Dialogue API support lets developers stitch full conversations together using structured JSON prompts.
However, ElevenLabs is transparent about the challenges: v3 requires more advanced prompt engineering and isn’t yet optimized for Professional Voice Clones (PVCs). Instead, users are encouraged to explore Instant Voice Clones or default-designed voices for now.
Availability and Access:
Eleven v3 is live in the ElevenLabs Studio app with an 80% discount through the end of June.
A public API and improved Studio integration are expected soon.
For commercial use or early access to future features, teams can contact sales.
For those who’ve only known robotic monotony in TTS tools, v3 offers something rare: emotionally intelligent voices that sound like they’re truly responding to a moment. Whether you’re animating a cutscene, voicing a character arc, or adding soul to narration, Eleven v3 opens the door to a new generation of AI audio storytelling. Explore more: https://elevenlabs.io/v3