What is voice cloning?

Matt Hardy Matt Hardy SVP, Linguistic AI 31 May 2023 12 mins 12 mins
What is voice cloning?
If you’ve ever heard a voice in a video and thought, “Wait… that sounds exactly like them – but they definitely didn’t record this,” you’ve met voice cloning. 
 
It’s one of the more fascinating and sometimes controversial branches of generative AI. The same family of technology that can write articles, create ‘art’ or generate deepfake videos can now also replicate the sound of a specific human voice – pitch, tone, accent, quirks and all. 
 
Done well, it’s uncanny. And it’s opening new possibilities in content creation, localization and accessibility. But it also raises big questions about ethics, consent and trust. 
 
Let’s explore what it is, how it works, where it’s used and when it might not be the best fit for your needs. 

The basics: what is voice cloning?

Voice cloning is the process of creating an AI-generated voice that sounds just like a specific person. Using a dataset of recorded speech – sometimes minutes, sometimes hours – AI systems learn the unique markers of that voice: pronunciation patterns, pacing, timbre, even age and accent. 
 
Once trained, the system can generate entirely new sentences in that cloned voice. The person never has to step into a recording booth. 
 
In technical terms, voice cloning uses deep learning models to identify and replicate the vocal ‘fingerprint’ of an individual. In practical terms, it’s like building a convincing voice double who can speak any script you give them. 

How does it work?

The process usually follows these steps: 
 
  1. Voice data collection – Audio of the target speaker is gathered. This could be existing recordings, interviews or studio-captured lines. 
  2. Model training – AI systems map the speaker’s vocal characteristics – tone, rhythm, inflection – into a mathematical model. 
  3. Synthesis – The trained model generates new audio, applying the cloned voice to any text input. 
  4. Refinement – Human review can adjust pronunciation, pacing and emotion to ensure the result is both accurate and natural. 
Quality depends heavily on the amount and quality of the training data. A few minutes of clear audio can produce a workable clone, but more data leads to greater realism. 

When would you use voice cloning?

Voice cloning is most often chosen when there’s a clear need for audio to match a specific, recognizable voice. For some, that means preserving an iconic performance – think of an actor whose voice is central to a role or brand identity but who is no longer able to record new material. A clone allows the sound to stay consistent, keeping audiences connected to something familiar. 
 
It can also be a way for busy public figures, executives or content creators to scale their reach. Cloning a voice means they can appear in more content – even in multiple languages – without physically recording each line. In certain cases, estates or rights holders may approve a posthumous voice clone to complete a project or honour a legacy. 
 
In localization, voice cloning is sometimes used to recreate an actor’s sound for dubbed content so that audiences in other languages still experience a consistent audio identity across all versions.

What are the alternatives?

Voice cloning isn’t the only way to produce human-sounding speech using AI. Synthetic voices – created from large datasets that draw on multiple speakers – are another powerful option. 
 
Because they don’t replicate one specific voice, synthetic voices can be tailored in style, tone and emotion without the same level of consent or licensing complexity. This flexibility makes them ideal for many large-scale or fast-turnaround projects. 
 
At RWS, we often pair synthetic voices with human-in-the-loop processes, ensuring tone, pronunciation and delivery are spot-on for the target market. 

How are synthetic voices created?

Unlike voice cloning, synthetic voices are built by training AI models on a wide variety of recorded voices. These datasets capture different accents, tones and speaking styles. The AI blends these features to create entirely new, lifelike voices that are expressive and versatile. 
 
These voices can then be applied to translated scripts, narration or interactive media at scale. They’re especially effective when you need speed, consistency and adaptability in multiple languages. 

When to choose each approach

Deciding between voice cloning and synthetic voices often comes down to the role voice identity plays in your content. If your audience needs to hear a particular voice – perhaps a trusted leader, a celebrity or a beloved fictional character – then cloning may be the most effective choice. It keeps the sound instantly recognizable and emotionally impactful. 
 
Synthetic voices are the better fit when scale and efficiency are priorities. If you need to roll out eLearning across dozens of countries, produce multilingual product explainers or deliver high volumes of localized media quickly, synthetic voices offer the speed and flexibility to make it happen without compromising clarity or quality. 

Benefits of synthetic voices

Synthetic voices bring tangible advantages. They tend to be more cost-effective than voice cloning since they don’t require extensive recording sessions with a single speaker or custom model training. Their scalability allows for rapid production across multiple languages and formats, making them ideal for large localization projects. 
 
They also give you more creative freedom. You can choose from a library of voices, adapt styles to suit different audiences or change voices entirely as your needs evolve. And because these voices aren’t tied to a real individual’s identity, they avoid many of the legal and ethical issues around consent, licensing and long-term usage rights. 

Voice cloning vs synthetic voices at a glance

Feature / Factor Voice Cloning Synthetic Voices
Core concept Replicates the sound of a specific, real person Creates a new, lifelike voice from multiple speaker datasets
Best suited for Projects where the audience must hear that voice for recognition or emotional impact Large-scale, multilingual content where speed, variety and flexibility matter
Data needed High-quality recordings of a single voice Recordings from many different speakers
Scalability Limited – each new voice requires new training High – reusable across many projects and languages
Ethical considerations High – requires explicit consent and clear usage agreements Lower – not tied to an identifiable individual
Cost and time Higher – bespoke training for each voice Lower – ready-to-use voices, quick turnaround
Creative flexibility Fixed – tied to one voice identity Flexible – change voices, tone and style as needed
Risk perception Often associated with deepfake concerns Generally seen as less risky, especially with disclosure

Voice cloning vs deepfake voices

Technically, not all voice clones are deepfakes. A deepfake is usually created to deceive, while a voice clone can be developed transparently and with permission. 
 
But public perception doesn’t always draw that distinction. For many listeners, a cloned voice feels like a deepfake. That’s why consent, disclosure and clear ethical boundaries are critical – especially for brands that rely on trust. 

Is voice cloning ethical?

Voice cloning itself isn’t inherently unethical, but it does demand careful handling. You need to think about consent – has the speaker agreed to have their voice cloned, and do they retain control over how it’s used? You also need to consider brand risk – what happens if the cloned voice is used in a context the original speaker would reject? 
 
Audience trust is another key factor. If listeners aren’t aware they’re hearing a clone, will they feel misled? High-profile examples like the cloning of Anthony Bourdain’s voice after his death show just how quickly trust can become the focus of the conversation. 

The RWS perspective: balancing possibility with responsibility

At RWS, we see voice cloning as one tool among many – powerful in the right context but not a one-size-fits-all solution. When we do use it, we build the process around three pillars. 
 
First, we secure explicit consent, documenting rights and agreed usage before any cloning takes place. Second, we run an ethical review, weighing audience expectations and brand implications. Finally, we combine AI precision with human expertise in quality assurance, ensuring the cloned output is accurate, natural and appropriate for the context. 
 
For many clients, expressive synthetic voices – backed by our global network of linguists and our Genuine Intelligence approach – offer a better blend of scalability, quality and trust. They allow you to connect with audiences worldwide without the added complexity that voice cloning can introduce. 

Looking ahead: where voice cloning fits in a multilingual world

Voice cloning is remarkable technology. It can preserve voices we love, extend creative possibilities and enable new forms of storytelling. But like all powerful tools, it needs to be used with care, transparency and a clear sense of purpose. 
 
The question isn’t just whether voice cloning works – it’s whether it’s the right choice for your audience, your brand and your goals. Sometimes the best solution is the one that builds trust and keeps the human voice – and human values – at the heart of your message. 
 
Ready to explore how voice technology can transform your content? Whether you’re looking to preserve a beloved voice, scale your messaging globally, or create entirely new audio experiences, RWS can help you navigate the possibilities with confidence. Our approach blends cutting-edge AI with human expertise, ensuring every voice – cloned or synthetic – meets the highest standards of quality, ethics and impact.  
 
Connect with us today to find the right solution for your audience, your brand and your goals. 
Matt Hardy
Author

Matt Hardy

SVP, Linguistic AI
With 18 years at RWS, Matt Hardy has a rich background in Language Technology. As SVP of Products for Linguistic AI, Matt is responsible developing our portfolio of AI-enabled technologies and services for clients. Matt's mission is to help translators and organizations navigate, and excel in, the ever-evolving landscape of language services, now and in the future.
All from Matt Hardy