Audiobooks of tomorrow: development of the cutting-edge platform for low-resource languages

AAI Labs has been working hard on its latest product - Colibris. We sat down with the project lead Arnas to find out more about it. 

Hi, Arnas! Could you please start with introducing us to the new product?

Yes, of course. Our new platform, Colibris, is a product of our efforts to innovate and facilitate the audiobook production process. The workflow with Colibris is very straightforward – a user uploads any book they want to turn into an audio, and the platform transforms it into a high-quality AI-synthesized audiobook. On top of that, we offer a suite of customisation tools that allow users to personalize the end result. For instance, it is possible to fine-tune each sentence, select a certain voice from the array of options, and control the emotional depth of narration. Our aim is to make audiobook production accessible, versatile, and expressive, catering to both professional publishers and individual enthusiasts. With Colibris, one is able to cut production costs by up to 80%, save a lot of time on audiobook-making, all while being in total control of the end result.

What was your starting point that brought you to developing Colibris?

We started with the analysis of existing Text-to-Speech (TTS) systems with the goal in mind to develop such a TTS model that would be capable of mimicking human speech in terms of naturalness and expressiveness. The ideal outcome would thus be a synthesizer that produces audio so convincingly human that listeners cannot discern whether the audiobook was narrated by a professional human narrator or synthesized by Colibris. While existing synthesizers demonstrate technical superiority in isolated speech samples, they often fall short in delivering a consistent and natural listening experience over the span of an entire book. We decided to address this issue by undertaking innovative approaches in both the training of our neural networks and the processing of text and synthesized speech.

Sounds exciting! And what is your personal view on the technology? How, in your opinion, will it affect publishers?

From my perspective, the traditional method of audiobook production is an exhausting and complex process. Initially, publishers must secure the rights for an audiobook version of a text. Then comes the task of finding and hiring professional narrators whose voices are a fit for the book's tone, style and content. This is followed by the logistical necessity of recording in a professional studio. During the post-recording, audio specialists spend a lot of time editing the recordings, removing superfluous elements, and ensuring that the final product is fully polished. 

Colibris will fundamentally transform this workflow. By offering high-quality, AI-synthesized audiobooks indistinguishable from those read by human narrators, we drastically streamline production. No more hours of search for the right voice, no studio time, no extensive post-production. The result? Publishers can produce more audiobooks, more efficiently, and, eventually, sell audiobooks at competitive prices. 

Not only does it benefit the publishers financially and time-wise but also broadens their scope to include a wider variety of titles, potentially including niche or less commercially viable works. The impact is significant: publishers can reach new markets and audiences, fostering a more diverse and rich literary landscape in the audio format.


Do you see the potential for Colibris to be used outside of Lithuania?

Absolutely. Even though Colibris is for now offering high-quality audiobook production in Lithuanian only, we are already expanding our capabilities beyond, venturing into Polish and German language audiobooks as a start. The main objective is to expand the offerings of Colibris and enable the synthesis of high-quality audiobooks in a multitude of low-resource languages, reaching the publishers outside of Lithuania. We are actively working on ensuring that the quality of synthesis remains consistently high across all languages. Achieving this would not just demonstrate the versatility of our technology but also its potential to streamline audiobook production on a global scale.


Even though Colibris is still in its early stage, what would you highlight to be the team's greatest achievement(s) with the system as of now?

We have successfully collected and refined speech datasets in various languages, which are absolutely critical to the development of a robust and versatile TTS system. It was a long and meticulous process, but that has given us a strong foundation for extending the language options for audiobook synthesis. Additionally, we have successfully established an infrastructure that simplifies model training and research. This foundation is crucial for expediting our progress and enhancing the effectiveness of our work. We have been also working hard to expand our TTS architecture. It already supports features like multi-speaker capabilities, multi-lingual synthesis, and the conveyance of emotional nuances in speech. Although these features are still in the early stages of development, we believe the current success clearly shows the long-term potential of our team to elevate the offering.


And did you identify any risks associated with the project? If so, how did you mitigate them?

One of our main concerns was whether we could make our text-to-speech (TTS) system sound just like a human, especially in different languages. To tackle this, we have been focusing on:

  • Staying informed and applying the latest research - we regularly monitor the newest studies and findings about TTS and neural networks. This helps us understand what is working well in the field and what could be applied to Colibris.

  • Experimenting with different models - we try out various ways of building TTS systems. Some are well-known methods, and some are new ideas we come up with. This experimentation is important to find what works best for our specific needs.

  • Being ambitious with our tech - we are not just using existing methods but are elaborating our own and mend them accordingly. Our goal is to not make Colibris good, we want it to be impeccable.

  • Gathering and preparing high-quality speech data - as I have mentioned before, we put a lot of effort into collecting and refining speech recordings. This high-quality data is crucial because it is what we use to teach our system how to sound natural in different languages.


Despite Colibris still being in development, have you already managed to get some interest from publishers in Lithuania or abroad?

Yes! Colibris is already becoming known on the audiobooks market. We have secured collaborations with publishers, such as Alma littera, Aukso Žuvys and Quickfox Publishing, who are already experimenting with the technology and using it to create audiobooks. In the months to come, we are also looking forward to securing more partnerships in the DACH region and Poland. Our long-term ambition is, of course, to make Colibris a competitive player on the global scale, with the potential of becoming a leader in audiobook production technology.

Interested to know more about Colibris? Contact us or visit the website.

Previous
Previous

AI in building maintenance: shaping the future of heritage protection

Next
Next

LLMs in smart home solutions: new era in voice-driven automation