OpenAI’s decision to delay the release of ChatGPT’s Voice Mode has left many of its fans disappointed. However, it seems that French AI developer Kyutai has taken the opportunity to introduce its own real-time voice AI assistant called Moshi. Similar to popular voice assistants like Alexa and Google Assistant, Moshi aims to provide users with lifelike conversations using voice. What sets Moshi apart is that it is powered by the same large language models as ChatGPT, specifically the Helium 7B model. Kyutai claims that Moshi can speak in different accents and has 70 distinct emotional and speaking styles. Furthermore, the AI is capable of handling two audio streams simultaneously, enabling it to listen and talk at the same time.
Kyutai’s development of Moshi involved fine-tuning over 100,000 synthetic dialogues using Text-to-Speech (TTS) technology. The goal of this process was to teach Moshi the nuances and tones of human communication. To enhance the voice quality, Kyutai even collaborated with a professional voice artist. Additionally, Moshi integrates both text and audio training, allowing it to run on devices like laptops without the need for internet connectivity. This not only ensures user privacy and security but also prevents the transmission of sensitive data over the internet.
One notable aspect of Kyutai’s approach to Moshi is its commitment to open-source development. The company plans to release the model’s codes and framework, providing a foundation for further innovation. This open-source approach not only encourages collaboration but also addresses concerns regarding safety and ethics that larger AI companies often face with their closed models. Backed by French billionaire Xavier Niel and supporters, Kyutai’s open-source strategy is gaining traction.
In addition to the voice capabilities, Kyutai is also working on incorporating AI audio identification, watermarking, and signature tracking systems into Moshi. These features will help identify AI-generated audio, ensuring accountability and traceability. By monitoring and verifying AI-generated content, Kyutai aims to promote trust and reliability in the AI space.
Although Moshi is still in its developmental stage, the voice mode demonstrated in the presentation is impressive. If Moshi gains popularity, it could potentially act as a catalyst for the development of voice-enabled versions of ChatGPT’s competitors. Additionally, it might encourage the integration of large language models into existing voice assistants like Alexa. The possibilities are exciting, and there is a growing anticipation for the full release of Moshi.
For those interested in trying out Moshi, a demo is currently available online. Early access to the complete chatbot can also be obtained by signing up on their website. This provides an opportunity for users to experience firsthand the capabilities of this promising voice AI assistant.
In conclusion, Kyutai’s introduction of Moshi represents a significant advancement in the field of voice AI assistants. By leveraging the power of large language models and incorporating unique features such as multi-stream audio processing, Kyutai has developed a compelling alternative to existing voice assistants. With its commitment to open-source development and plans for implementing AI audio identification systems, Kyutai is positioning Moshi as a trustworthy and innovative solution. As the development of Moshi continues, it will be interesting to see how it shapes the future of voice AI technology and its integration into various platforms and devices.
Source link