Alibaba Qwen 2.5 Omni AI Model With Real-Time Speech Generation Released

headlines4Technology1 year ago1.7K Views

Home
Technology
Alibaba Qwen 2.5 Omni AI Model With Real-Time Speech Generation Released

Alibaba’s Qwen workforce launched a brand new synthetic intelligence (AI) mannequin within the Qwen 2.5 household on Wednesday. Dubbed Qwen 2.5 Omni, it’s a flagship-tier end-to-end multimodal mannequin. The firm claims it may course of a variety of inputs, together with textual content, pictures, audio, and movies, whereas producing real-time textual content and pure speech responses. It is claimed to allow the constructing and deployment of cost-effective AI brokers attributable to its various talent set. Alibaba has additionally employed a brand new “Thinker-Talker” structure for the Qwen 2.5 Omni AI mannequin.

Qwen 2.5 Omni AI Model Released

In a weblog put up, the Qwen workforce detailed the brand new Qwen 2.5 Omni AI mannequin, which is a seven-billion-parameter system. The most notable functionality of this omnimodal mannequin is the real-time speech era and video chat functionality, which can permit the big language mannequin (LLM) to reply queries and work together with customers verbally in a humanlike method. So far, this functionality is barely out there with Google and OpenAI’s fashions, that are closed-source. Alibaba, however, has open-sourced the expertise.

Coming to the options, it accepts textual content, pictures, audio, and video as enter in addition to output. The mannequin can be able to real-time voice interactions and video chats. The Qwen workforce additionally highlights that the mannequin will even supply real-time streaming of speech in a pure method. Additionally, it’s claimed to come back with enhanced efficiency in end-to-end speech instruction.

The Qwen workforce highlighted that the Omni mannequin is constructed on a novel “Thinker-Talker” structure. The Thinker element capabilities like a mind and is chargeable for processing and understanding enter throughout modalities, and producing textual content output. It is actually a Transformer decoder that encodes audio and picture and assists with info extraction.

Qwen 2.5 Omni benchmark
Photo Credit: Alibaba

On the opposite hand, the Talker element operates like a human mouth, the researchers stated. It streams the knowledge produced by the Thinker element and generates a stream-like output for speech fluidity. It is designed as a dual-track autoregressive Transformer decoder. This total structure operates as a single mannequin, permitting real-time textual content and speech era, enabling end-to-end coaching and inference.

Based on inner testing, the Qwen 2.5 Omni AI mannequin is claimed to outperform the Gemini 1.5 Pro mannequin on the OmniBench. It additionally outperforms Qwen 2.5-VL-7B, Qwen2-Audio on single-modality duties.

The AI mannequin is now out there on Alibaba’s Hugging Face itemizing and GitHub itemizing. Additionally, customers can check out the brand new mannequin by way of Qwen Chat in addition to the corporate’s group ModelScope.

Upvote0PointsDownvote

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)