How to Record an Audiobook with AI in 6 Steps
As discussed in this guide, there are two main ways to produce an audiobook: hiring a professional voice actor or recording it yourself. Both require substantial investments of time or money. However, advances in AI technology have opened up a third path to simply and inexpensively record audiobooks with human-like synthetic voices.
While AI-narrated audiobooks don't match the quality of professionally produced ones yet (and perhaps never will), they do offer new authors the opportunity to publish their work in the fastest-growing format on the market. In this article, we’ll show you how to use AI technology to narrate your book, using both ready-made voice avatars and a digital replica of your own unique voice.
How to record an audiobook with AI:
1. Choose an AI voice platform
Before we begin, should the narrating voice be based on your own, or should it be that of a digital avatar with a different vocal quality, accent, and gender? Generally, fiction writers prefer the latter, whereas nonfiction writers — who often write in their own voice — may lean towards the former (in which case, skip to Step 4 of this post).
If you’re casting a digital avatar, you’ll be semi-spoiled for choice. Many platforms already offer this service and the selection will only increase over time. Here are some of the most popular options on the market, and the number of distinct voices and languages they offer:
50+ voices, 9 languages
20 voices, English only
130+ voices, 30 languages
120+ voices, 23 languages
50+ voices, 5 languages
Aside from Google (which is already free), each platform offers a free trial, so play around with them to find which one best suits your budget and needs.
For this guide, we chose Speechify, which had the most voice options, and we used Ricardo Fayet’s How to Market a Book as a guinea pig. However, you’ll go through a similar process if you create an audiobook with one of the other tools.
Start a new project and import your book
To get things going, log into your platform of choice and create a new project. Then, you can either import your full manuscript (in .txt or .docx format), or copy-paste it into the editor.
✂️ In Speechify, the text will be separated into blocks with a limit of 1000 characters each, so double-check that you don’t have any overlaps or are missing any words or letters when you copy-paste your text. In general, we suggest working on one chapter at a time.
The next step is where the fun begins: auditioning the digital voice actors to find a perfect match for your book.
2. Cast the perfect digital voice
In general, you want an AI narrator who matches your novel's narrative voice (which is often the same as your main character’s). Consider the tone and mood of your book: is it suspenseful, cheery, romantic, or dripping with menace? While digital voices tend to have neutral tones, there are subtle differences, so choose one that reflects the book’s mood as closely as possible.
With nonfiction, you might pick a voice that resembles your own or one that you think will resonate better with your target audience. For example, the main target audience for How to Market a Book is North American authors between 25-50 years old, so we chose ‘Guy’ as our narrator 一 a young, male American voice with a confident but friendly tone. While some might see this as slightly disingenuous (the author actually has a European accent), it might help the audiobook better appeal to its core audience.
Take your time and explore the voice options offered by your platform. If you can’t find one that suits the tone of your book, you can always look at what the other services have on offer. Listeners may have to spend over 10 hours with this voice in their ears, so it better be good!
Choose one or more narrators
If you’re a fiction writer and your book includes lots of dialogue or multiple narrators telling the story from different perspectives, you may be tempted to use multiple voices to differentiate the characters and add realism to the listening experience.
Although multi-narration for audiobooks is not uncommon, it’s usually reserved for big-budget productions that can afford to hire talented mimics to play multiple parts or an entire cast of professionals (like the 21 actors who perform Taylor Jenkins Reid's Daisy Jones & The Six.)
AI technology has changed that 一 now you can use as many digital voices as you like to tell your story without any added costs. But although that's an exciting possibility, don’t make things too hard for yourself. If this is your first attempt at recording an audiobook with AI, we’d recommend sticking to one voice to avoid adding complexity to your production.
Also, it’s worth bearing in mind that the listener will get used to an AI voice after a few minutes and start accepting it as a real voice. By consistently changing the avatar, you risk ‘breaking the spell’ and alienating your audience. That said, if you think multiple narrators could better bring your audiobook to life (and you don’t mind the extra challenge) then go for it!
But before you hit the button and set an AI narrator loose on your book, you may want to ensure that your manuscript is as ‘robot-readable’ as possible.
3. Edit the text for optimal readability
While human narrators can read your manuscript and naturally understand its tempo, style, and tone, AI tools still require some assistance. Simply copy-pasting your text in the editor won’t cut it. Thankfully, there are ways to adjust your text to achieve a more natural-sounding performance.
🖼️ Nonfiction books often include images and tables to support the text that are hard to fit into the audiobook version, which is why many authors simply don’t make one. If that's your case, you’ll have to weigh if and how much you can work around it.
‘Direct’ the narrator
Most platforms allow you to tweak the AI narration, changing aspects like its pitch, speed, and volume. A few of them (Speechify included) will let you give the voice an expression, such as hopeful, sad, and whispering, to make the narration more emotionally impactful.
So put on your director's hat and add some nuance into your narrator’s performance, so that when a doomed astronaut bids his wife farewell, it doesn't sound like he’s chirpily ordering pizza.
At the moment, the results aren’t mind-bending, but it does go a long way to make your characters come (a little bit more) to life. Here is an example of how it sounds:
To enrich specific lines of dialogue, we split them into distinct paragraphs and assigned them the desired emotional tone.
The narrator in this example has a female voice, so when she has to read dialogue from a male character, we also pitch down her voice by 5% to emulate how a female voice actor would perform this. In the same way, male voices can be pitched up slightly for lines of dialogue when they play female characters.
Now, as you can imagine, manually changing pitch and choosing an emotion for every single line of dialogue could drive you crazy. To keep things simple, this may be something you reserve for only the most critical scenes in your book.
Edit pause duration
Another way to make your narration sound more natural is to adjust pauses where needed. For example, if your sentence includes an em-dash, the AI might “move past it” too quickly. To compensate, you can add a custom pause to the manuscript.
In Speechify, the menu on the right-hand side lets you add weak, medium, and strong pauses (or manually edit the pause duration) to give your narrator’s line-reading a rhythm that better approximates human speech.
Next, you may also want to give a helping hand when it comes to pronunciation.
Your novel might include unique words, acronyms, or terms from a different language that the AI can’t read properly (like saudade, the Portuguese word for melancholy). You can manually edit its pronunciation by entering the pronunciation using the International Phonetic Alphabet (IPA) format (e.g. sˌa͡ʊdˈadɨ).
📢 If you skipped phonetics class in high school, tools like unalengua can help you translate a word to IPA. Or, alternatively, you can search for it on Google or Wikipedia.
If the word is made up (e.g. Wingardium Leviosa), there’s a certain amount of trial-and-error involved (wɪŋˈɡɑrdiəm ˌlɛviˈoʊsə). From our experience, this can be a minor headache as Speechify often marks IPA-formatted words as invalid. In general, the simpler your words are, the better the voice performance.
Consider adding background music
Some publishers include music over the introduction or specific parts of the book. On Speechify, you can add a music track by selecting a file from your laptop, while some other tools offer in-app, royalty-free music options.
Generally speaking, though, music isn’t needed in an audiobook 一 many listeners don’t like the extra fuss, especially those who play back at faster speeds. But if you do go for it, make sure to secure the rights to use the music track and to regulate its volume so it doesn’t drown out the spoken words.
Before we wrap up production on your AI production, let’s take a detour to see how you can narrate your own audiobook (without actually narrating your own audiobook.)
4. Try cloning your own voice instead
If you’re authoring a memoir or a nonfiction book, you may want to add a personal touch by using your own voice to record the audiobook. Of course, this isn’t limited to nonfiction writers, and it might be preferable if your readers already know your voice or if your author brand is quite strong.
AI technology can clone your voice based on a sample recording, leveraging advanced techniques such as fine-tuning LLM (Large Language Models). The results are not perfect, but they’re “good enough” to be considered a valid option. Currently, there aren’t too many tools to clone your voice, but here are some popular options.
Voice Cloning plan
Voice Cloning free trial
We chose to test Descript Overdub, as it’s one of the most highly regarded (and cheapest) tools to create high-fidelity digital voice clones.
Record 30 minutes of good-quality audio
To make a copy of your voice that sounds as closely as possible to your actual voice, you’ll need to follow some best practices. First of all, you’ll need to upload a 10- to 30-minute high-quality recording of you speaking English. Ideally, record yourself narrating your own book, using the pitch, tone, and reading style you’d like your digital voice to take on.
The quality of your audio recording matters 一 a portable USB microphone like the Blue Yeti (which costs less than $100) is much better than your computer's default mic. You should also try to record in a quiet environment, ideally packed with sound-absorbing material (which is why some authors record their audiobooks in their closet!).
But if you don’t have a fancy set up, you can still turn a simple recording on your phone into a suitable source file by using tools like Adobe Enhance or Descript’s own Studio Sound, which use AI to iron out the imperfections. In the audio sample below, you can hear how we turned a low-quality recording into an adequate sample for AI cloning, courtesy of Ricardo Fayet:
Once you have a good quality recording to feed the machine, you’ll have to submit it as training data in order to create its digital soundalike, then wait for the software to do its magic.
Edit word gaps or add pauses
Once the voice is ready, you can copy-paste your manuscript into the editor. Then play the voiceover to hear how it sounds, and edit the text (as explained in Step 3) to make the narration read more naturally.
In Descript, you can manually add pauses in the timeline editor. For our sample chapter, we added approximately 0.40 seconds of pause before each new paragraph so the narration didn’t feel rushed.
Play with styles to add nuance
You may find your voice replica to be flat and overly neutral. To access a wider emotional spectrum for “your” narration, you can create new recordings of yourself speaking with certain tones, like happy, sad, or angry. Then, in the editing process, you can select a sentence and assign it a style. Just as we saw with the off-the-shelf AI avatars, this will give you some room to bring the narration to life.
Finally, in Descript there are a few advanced options to equalize each and every sentence to the last decibel, so if you speak fluent audio engineering you can play around with it. Once you’ve harmonized the pace and sound of your manuscript to your liking, you’re pretty much ready to export the file.
5. Export and finalize your files
Now that you’ve embroidered every detail of how you want your book to sound like, either with a digital narrator or your own (cloned) voice, it’s time to export it. Both Speechify and Descript give you the option to download it in formats like ogg and mp3, but we strongly recommend exporting in wav to avoid quality loss. You can always convert (and compress) the file to mp3 later, with free tools for both Windows and Mac.
So, what does the audiobook sound like? You’ll be the judge.
Here’s the introduction of How to market a book narrated by Guy, our AI voice actor.
And here’s the same bit, narrated by Ricardo's digital clone!
Overall, both narrators fall into the uncanny valley: they sound human, but not quite. They lack some warmth and nuance, but the voice is natural enough to do the job without setting off the creepiness alarm.
Ultimately, the real upside to digital narrators is that they’re cheap, highly customizable, and have infinite digital stamina to narrate as many books as you’d like. Moreover, AI-generated audiobooks can be produced in days instead of months, enabling authors to launch their audiobooks sooner. And speaking of launches…
📈Producing an audiobook can be an important component of your book's marketing plan. Take a look at seven additional proven methods to enhance your book sales.
6. Set your audiobook up for sale
It's time to distribute your audiobook. As you'll learn in the next part of this guide, to reach retailers and libraries worldwide we would normally suggest ACX or Findaway Voices, which distribute to Amazon, Audible, Apple Books, and many others. However, neither platform accepts AI-narrated audiobooks at the moment.
With the big players out of the game, you won't be able to reach a large chunk of the market, but that could change in the future. In the meantime, here are some alternative avenues you can consider:
Google Play Books. The easiest way to publish an AI audiobook on Google Play is to create it with their own digital narration service.
Rakuten Kobo. Kobo happily accepts AI-generated audiobooks 一 you’ll only need to flag in the metadata that the narrator is a synthesized voice.
🍏 Apple Book has recently launched its own digital narration service for authors to narrate their books with genre-optimized synthetic voices. Currently, it's only available for women's fiction and romance, and it's by invitation only, but the company plans to expand the service to other genres soon.
The debate over whether AI-generated content should be distributed alongside human work in bookstores (and elsewhere) continues, as many legal and ethical implications remain to be resolved. In the meantime, authors can experiment with the tools available to better prepare for whatever opportunities may arise in the future.
☝️ As a last piece of advice, remember that to sell your audiobook you’ll need:
The times are a-changin’! As AI technology keeps getting better, closing the gap between human and digital narration, we may see more indie authors entering the audiobook market to promote their work (and make some extra bucks.) At the same time, despite the advancements, AI narration still struggles to replicate the emotion and nuances that human narrators can infuse into their voice acting. So if your budget allows it, nothing beats producing your audiobook with a professional.
Now, if you want to learn more about getting your work into the earholes of eager readers, head on over to our next post on audiobook distribution.