Module 1 Challenge Eleven Labs Inc
- Eleven Labs
- January 4th, 20221
- Mateusz Jakub Staniszewski
- Piotr Dabkowski
- Childhood experiences: Staniszewski and Dabkowski grew up together in Poland, where they frequently watched American movies that were poorly dubbed into Polish.2
- Frustration with dubbing quality: They were often frustrated by the inadequate dubbing of foreign films, particularly American movies, into their native language.3
- Breaking language barriers: Their goal became to make quality content available across all languages, essentially aiming to break down linguistic barriers in content consumption.4
- Lead Investors: The Series B round was co-led by Andreessen Horowitz5, Nat Friedman (former GitHub CEO), and Daniel Gross
- Other Investors: Additional participants in the latest round include Sequoia Capital, Smash Capital, SV Angel, BroadLight Capital, and Credo Ventures
-
- January 2022: $2 million pre-seed funding6
ElevenLabs is primarily trying to solve the problem of language barriers in content accessibility9. Their main goal is to make content universally accessible in any language and voice. Specifically, they are addressing several key issues:
- Poor quality dubbing and voiceovers: The founders were inspired by their experiences with poorly dubbed foreign films in Poland. They aim to improve the quality of translated audio content.10
- Limited accessibility of content across languages: ElevenLabs is developing AI dubbing tools that can automatically re-voice audio or video in different languages while preserving the original speaker's voice.11
- Inefficient content creation processes: Their platform aims to simplify and streamline the creation of audio content for various purposes, including audiobooks, news narration, and video game character voicing.12
- Communication barriers for individuals with disabilities: Their technology has applications in helping those who have lost their voices or have accessibility needs in daily life.13
- Limited options for content creators: ElevenLabs provides tools for voice cloning and designing synthetic voices, offering new creative outlets for content creators.14
- Content Creators
- People with Disabilities
- Non-Native Speaking Workers
- Enterprise Businesses
- AI-powered voice synthesis: ElevenLabs specializes in developing natural-sounding speech synthesis software using deep learning. Their core technology revolves around cutting-edge voice AI for real-time voice cloning and speech synthesis.17
- Text-to-Speech (TTS): They offer advanced TTS technology that creates realistic, emotional voices in many languages. This allows users to convert written text into natural-sounding speech.18
- Voice cloning: ElevenLabs has developed technology that can clone voices with just a few minutes of audio input. This allows users to create digital replicas of specific voices.19
- Multilingual capabilities: Their platform supports voice synthesis and translation across 3220 languages.21
- AI dubbing: They have developed an AI Dubbing Studio, which likely uses their voice synthesis and translation capabilities to create dubbed content.22
- Voice customization: Users can fine-tune voice outputs, adjusting tone, pitch, speed, and emotion of synthesized voices.
- Voice Marketplace: This is a new product announced in January 2024, likely allowing users to access or share various AI-generated voices.
- Mobile app: ElevenLabs has developed a mobile application to make their technology more accessible.23
- AI Speech Classifier: They have created a tool to detect AI-generated audio, specifically audio created using ElevenLabs technology.
- Digital watermarking: ElevenLabs is working on technology to digitally watermark synthetic voices, allowing users to distinguish between AI-generated and human voices.
- Generative AI
- Audio Engineering
- Integration of deep learning models for more natural-sounding synthesis
- Development of voice cloning technologies
- Expansion of multilingual capabilities
- Incorporation of emotional inflection in synthetic voices
- Increased focus on low-latency, real-time voice generation
-
Major companies in this field:24
- Google Cloud
- Google (NotebookLM)25
- Amazon Polly
- Microsoft (with Azure Cognitive Services)
- IBM (with Watson Text to Speech)
- Replica Studios
- PlayHT
- WellSaid Labs
- Speechify
- Murf AI
- Sythensia
- Resemble
- Descript
- Turbo v2.5 Model:
- An improved AI engine that enhances both the speed and quality of text-to-speech conversion.
- Text-to-music model:
- Launched in May 2024, expanding their capabilities beyond voice synthesis.
- AI Research ElevenLabs is investing heavily in voice AI research and upgrading their infrastructure to handle growth and support new product development.28
- Deepfake Prevention The company is working on developing digital watermarking technology for synthetic voices to help users distinguish between AI-generated and human voices. This is part of a broader effort to combat deepfakes and potential misuse of their technology.29
- Parternships ElevenLabs has signed an accord with other major AI companies like OpenAI, Anthropic, Google, and Meta to combat deepfakes during the 2024 election.
- ElevenLabs, given their funding backers, have accelerated the push into the field. They have been the most successful ‘new’ audio generation and TTS brands that are directly challenging legacy cloud providers from the field. They are catering to a younger, more diverse audience, and the success has been documented in terms of subscriber growth, daily users on the platform, and web hits.
- They have also been a leader in addressing the Deepfake endemic. 30
-
Making everyone a content creator,and consumer on their terms
- During the exploratory phase of this project, my goal was to find a company that is building a product that directly addresses a problem I have felt for a while; cross-platform content customization. Seeing ElevenLabs branch into this field honed my focus onto their path. Seeing them, and Google with NotebookLM31 finally begin to make it easier for non-technical people to interface with content in ways that resonate most directly with them is game changing. I am very excited about the direction of this field and hope to be part of it soon enough.As their technology and platform becomes more user friendly, more people will be drawn to finally get their voice out there (no pun intended). This cuts both ways, but in my opinion, will lean towards being a net benefit. Any tool that enables people to creatively engage with the world will always be a net positive in the long run.
-
ElevenLabs: The YouTube of Audio
- ElevenLabs is uniquely positioned to become a market leader in the audio space, for better and for worse. YouTube brought the world to us, and us to the world. The globalization of video has been hugely beneficial in many ways, but it has also made us more wary, skeptical, and sadly more susceptible to deception. ElevenLabs has a lane to be viewed through the same lens. Although they are only right now available to paying subscribers, and therefore will not compete from a scale perspective, their technology could be just as substantial. Audio cloning sits on the same cultural fault lines as YouTube. In a few years, it’s plausible that we will doubt, or at least consider, every piece of content, news, and phone call with a new and never before reckoned with skepticism.
- Right now, it is hard to take a position on ElevenLabs' refusal to open-source their voice cloning algorithms; are they hindering broader innovation or protecting us from potential misuse? Given their connections to Palantir, and Venture Partners, I air on the skeptical side of this debate thus far.
- One only has to take a small step to see how this debate could have momentous ramifications on the open source AI debate. Depending on which way the winds blow, ElevenLabs will be able to shape the market. They can be the permission slip for those selfishly inclined businesses that are looking for a reason to pick their tech up and go home. Why should they open source tech when a market giant isn’t reciprocating? Even well to do companies wanting to keep ai open source will think twice if their competitors start taking everything in house. This is why knowing their founders, and their funding, is pivotal. If the described scenario sounds familiar, it wouldn’t be surprising. This is the same line that forever fractured digital media when Zuckerberg declared a Pivot to Video.