AI Audio – 11ElevenLabs Voices

Note: this is more of a useful-to-know-about tool than one you’re likely to use every day.

Of all the apps producing AI voices, including clones of existing voices, Eleven Labs is generally considered the best. To the point that Findaway Voices, Spotify’s audio book platform, allows books narrated by ElevenLabs voices onto their system.

Being able to use an AI voice – or create a replica of your own voice – and then have it narrate an entire book is disruptive enough, but then allow for this tool’s ability to use the same voice across multiple languages and audio books just got a whole lot more accessible.

To see (hear) what I mean, go to the text to speech option in ElevenLabs and choose a voice from the Voice dropdown menu at the side.

ElevenLabs Text to Speech Prompt screen

Then paste in some text. If you’re stuck, here’s a piece from an old blog post (yes, it’s mine, you can find the whole thing at The DNA of Stories):

When you’re driving out of Market Dingley, towards Wiggleswick, and take the next left after the Horseman’s Head, you’ll find yourself on a winding country lane. Just the type you see in any crime show set in rural England, just to remind you that, yes, you are in rural England, not downtown Los Angeles, or something.

Apparently people get confused if you don’t throw in the occasional shot of a wiggling road with high banks each side and barely enough room for the police officer’s sensibly-priced car. I would have thought the accents and constant supply of tea would be a dead giveaway what would I know? I’m not some television mogul or fancy director after all, I’m just a witch.

And here are translations of that text, courtesy of Mistral (the French LLM)…

In French:

Lorsque vous quittez Market Dingley en direction de Wiggleswick et que vous prenez la prochaine à gauche après le Horseman’s Head, vous vous retrouverez sur une route de campagne sinueuse. Exactement le genre que l’on voit dans n’importe quelle série policière se déroulant dans la campagne anglaise, juste pour vous rappeler que, oui, vous êtes dans la campagne anglaise, pas dans le centre-ville de Los Angeles, ou quelque chose comme ça.

Apparemment, les gens sont confus si vous ne montrez pas de temps en temps une route sinueuse avec des talus de chaque côté et à peine assez de place pour la voiture au prix raisonnable de l’agent de police. J’aurais pensé que les accents et l’approvisionnement constant en thé auraient été un indice évident, mais que sais-je ? Je ne suis pas un magnat de la télévision ou un réalisateur chic, après tout, je ne suis qu’une sorcière.

And Traditional Chinese (which means it’s probably going to be read in Mandarin):

當你駕車離開 Market Dingley,往 Wiggleswick 方向,然後在 Horseman’s Head 之後左轉,你會發現自己身處一條蜿蜒的鄉間小路。這正是你在任何一部以英國鄉村為背景的犯罪劇中看到的那種小路,只是為了提醒你,是的,你在英國鄉村,而不是在洛杉磯市中心,或者其他什麼地方。

顯然,如果你不時不時地展示一條兩邊有高坡的蜿蜒小路,勉強能容納警察那輛價格合理的車,人們會感到困惑。我本以為那些口音和不斷供應的茶應該是明顯的線索,但我又知道什麼呢?我又不是什麼電視大亨或高級導演,我只是個女巫

The voice should be recognisably the same.

I agree with some others that this won’t lead to a complete wipe-out of human audio narrators, simply because AI can’t do nuance. Especially when reading fiction. So they think in future we’re likely to see a choice between ‘bog standard’ audio narration done by AI, and a ‘deluxe’ version done by a human, priced accordingly (think paperback v hardback special edition). Much like translators, and other language-focused roles, the human side of this profession is going to pare back to the very best. How future entrants become the very best is uncertain.

One element of this capability we all need to be careful of though, is phone scams.

It’s now even easier for nasty players to target vulnerable people for money. Imagine someone you know who isn’t too tech savvy getting a panicked phone call from what sounds like a loved one, begging for money as they’re in trouble. These scams have been around for a while but if they’ve got the right voice (or even something close), too many people are going to be taken in.

I have two suggestions to combat this.

  1. Have a family code word. If someone in your family group phones for help, they need to give the person they’re calling a simple code word (random to most people but memorable for your family). If they can’t supply the code word, hang up, and check with the person they’re claiming to be.
  2. More generally (and this advice comes from the business world, were companies have been scammed out of stupid amounts of money when the voice of the CFO demands urgent action on a payment), if it’s urgent, double check. If they tell you there’s no time to verify, DEFINITELY double check.

The usual caveat: This area of tech is moving insanely fast and while I’m aiming to stick to the foundations here, if you’re reading this post more than a month after it was published, check the details, things could be (are probably) out of date.

One thought on “AI Audio – 11ElevenLabs Voices

  1. Pingback: Making Music with AI – AI for Squishy Humans

Leave a comment