Ant Group open-sources and releases the full-modal large model Ming-Flash-Omni 2.0.
The Ant Group has open-sourced and released the fully modal large model Ming-Flash-Omni2.0, which is the industry's first all-scenario audio unified generation model capable of simultaneously generating speech, environmental sound effects, and music on the same audio track. Users can control aspects such as tone, speech speed, intonation, volume, emotion, and dialect by issuing natural language commands. During the inference stage, the model achieves an extremely low inference frame rate of 3.1Hz, enabling real-time high-fidelity generation of long audio tracks in minutes.
Latest

