Artificial Intelligence puts the smart into your smartphone, it helps choose what movies you see, and even who you date. It will soon be driving our cars, it already helps to perform complex medical diagnoses, and it makes most of the trades on Wall Street. And now, new AI voice technologies have the potential to transform the audiobook industry.
These new technologies can replicate the human voice to create a listening experience that is virtually indistinguishable from the real thing, and they will enable publishers to create new, high-quality, audio content without the cost and time restraints of traditional production methods.
The audiobook market is enjoying its eighth year of rapid growth across the globe and, according to the latest research from the Audio Publishers Association (APA), it’s new content that’s driving this growth. Last year listeners consumed an average of eight audiobooks, up on six the previous year, yet, barriers to traditional audiobook production mean that only 6% of all books are currently available in audiobook format.
These new, advanced Text-to-Speech (TTS) technologies are breaking down these barriers, reducing the cost of traditional production methods, and also the time to market, by approximately half, making audiobook production more cost-effective for everyone, and much more accessible for small and mid-size publishers.
TTS technology is particularly suited to narrative non-fiction, business, IT, academic, education and social sciences titles, where the cost of studio production may not be justifiable. By removing the constraints of traditional production methods, publishers can produce more audiobooks quickly and cost-effectively.
DeepZen, a UK-based AI company, is a leader in the TTS field and produced the world’s first digitally narrated audiobook in 2019 (the darkly humorous, The Reluctant Cannibals by Ian Flitcroft). It is launching a new Publisher Portal, in association with Ingram, with exclusive discounts available to Ingram partner publishers, which is designed to make things simple, and enables publishers to conveniently manage all their audiobook projects in one place.
So how does it work? And how realistic do AI voices really sound? Even writers sometimes struggle to express the full range and experience of human emotions.
DeepZen’s technology was developed specifically for audiobooks and long form content, and it’s benchmarked against human narration. Its AI voices are a world away from the robotic, monotone, voice assistants we are all familiar with.
The technology incorporates AI voice and natural language processing and next generation algorithms. To put it simply, DeepZen has created a system that has allowed it to create a library of AI voices, based on recordings of the voices of actors and narrators, who are fully paid for their work. This voice data is fed into machines, enabling the machines to ‘learn’ to speak, so that when they are given a new text they can ‘read’ it in the actor’s voice.
The digital voice ‘learns’ to express a wide range of emotions by processing examples of the narrator speaking, for example, in a ‘happy’ voice, or an ‘angry’ voice. In the same way, they also ‘learn’ how to express elements of the human voice, such as pacing and intonation, that produce more realistic speech patterns.
Although it’s new, this technology has been tried and tested with very positive results.
In the last 12 months, DeepZen has signed three co-publishing deals in the UK, signed Worldwide distribution deals, and produced tens of audio books with hundreds more in the pipeline.
Audiobooks produced by DeepZen are accepted by over 50 vendors globally including Apple Books, Google Play, Kobo, Scribd and Spotify.
The new Publisher Portal opens up a range of benefits to publishers by providing a high quality, convenient, and cost-effective production service that converts text into audio format, in approximately half the time it takes with traditional studio production, and at approximately half the cost.
DeepZen is providing a managed service that combines AI technology with human editing to ensure the high quality that listeners expect. The process couldn’t be easier:
1) you upload your manuscript to the Portal, and select a narrator, and you will receive an instant estimate. Once you have confirmed the project, DeepZen starts work on the TTS conversion.
2) DeepZen’s conversion process involves identifying uncommon words that its TTS tool cannot predict, and you will receive an email asking you to record these words using a tool on the Portal.
3) Once DeepZen has received these pronunciations they are added to the lexicon, and the audio is proofed and corrections made. Pacing, expression and intonation are all checked to ensure they are reflective of the content.
4) DeepZen will then email you a link so that you can access the Portal and review the audio it has created. You will have the option to request a further set of corrections. Once you have approved the corrections, DeepZen finalizes the audio and formats the audiobook ready for distribution to vendors. You will also be sent a link to download your audio files.
It’s as simple as that. The whole process takes approximately three weeks, if you adhere to the deadlines for pronunciations and quality control review.
Whether you’re excited or nervous about AI, there is no doubt that it offers real business benefits to publishers.
Ingram offers automated audiobook distribution to an extensive network of worldwide audio retailers. Click to learn more.