Microsoft and Project Gutenberg have used AI technologies to create more than 5,000 free audiobooks with high-quality synthetic voices.
For the project, the researchers combined advances in machine learning, automatic text selection (which texts are read aloud, which are not), and natural-sounding speech synthesis systems.
First, they developed an algorithm that understands the structure of an HTML-based e-book and distinguishes between the main text and unimportant elements such as footnotes, page numbers, or tables.
This so-called parsing is followed by the actual conversion of text into speech (text-to-speech, TTS). In this project, WaveNet, Tacotron and FastSpeech in particular were used, which are capable of producing natural and human-like speech output.
In addition, the team developed a system capable of distinguishing between narrator and dialogue, and here even between individual characters and their emotions, and adapting the generated voice accordingly.
The entire process chain runs on the machine learning framework SynapseML, which is designed to break down the various tasks and process them in parallel.
“We believe that this work has the potential to greatly improve the accessibility and availability of audiobooks,” the team writes. Hear for yourself how “How to Tell a Story, and Other Essays” by Mark Twain sounds.
Have your voice narrate an audiobook
For the conference presentation, the team also developed a zero-shot text-to-speech approach that can capture the character of a user’s own voice from a few recorded sentences and transfer it to the narration of the audiobook.
This allows users to select a book from the digital library and have it read to them in their voice – or in the voice of their choice if they have audio files. It’s not yet clear if this service will be available beyond the conference, but it seems unlikely given the potential costs.
In total, the project has collected more than 35,000 hours of audio data on classical literature, plays, biographies, and more, read “in a clear and consistent voice.”
This dataset alone could be useful for further AI projects. The research team intends to make all audio data available as open source without restrictions.
Project Gutenberg is a free digital library accessible via the Internet. It is created by volunteers. More than 70,000 e-books are available to read and download for free on the project’s website.