Jianghai Securities: Volcano Engine Releases Multiple Bean Bag Models, Continues to Be Bullish on AI Application Investment Opportunities
18/04/2025
GMT Eight
Jianghai Securities released a research report stating that on April 17, 2025, Volcano Engine released the Bean Bag 1.5 Deep Thinking Model, upgraded the Bean Bag word-image model 3.0, and Bean Bag visual understanding model; at the same time, aimed at Agent services, released the OS Agent solution, GUI Agent large model - Bean Bag 1.5 UI-TARS model; aimed at large-scale reasoning, released the AI Cloud Native Serving Kit reasoning suite. The firm continues to be optimistic about AI application investment opportunities, focusing on Hand Enterprise Solutions (300170.SZ), Dark Horse Technology Group (300688.SZ), Intsig Information (688615.SH), etc.
The main points of Jianghai Securities are as follows:
- The daily average tokens call volume of the Bean Bag large model continues to rise significantly, benefiting data elements and computing power sectors.
- As of the end of March 2025, the daily average tokens call volume of the Bean Bag large model has exceeded 12.7 trillion, which is three times that of December 2024 and 106 times that of a year ago when it was just released. An IDC report shows that in 2024, the call volume of large models in China's public cloud surged, with Volcano Engine ranking first in the Chinese market with a market share of 46.4%. The firm believes that the continued increase in Bean Bag large model tokens call volume is beneficial to data elements and computing power sectors.
- The Bean Bag 1.5 Deep Thinking Model is newly released, using the MoE architecture and dual-track reward mechanism.
- The Bean Bag 1.5 Deep Thinking Model, newly released, has performed well in reasoning tasks in fields such as mathematics, coding, and science, reaching or approaching the world's top level; in non-reasoning tasks such as creative writing, the model also demonstrates outstanding generalization ability, capable of handling a wider range of complex scenarios. To enhance its general abilities, the model team optimized data processing strategies, integrating verifiable and creative data processing to meet various task requirements. Large-scale reinforcement learning is a key technology for training reasoning models, and by employing an innovative dual-track reward mechanism, balancing between "right and wrong" and "different perspectives," the algorithm has been effectively optimized. This model uses the MoE architecture, with a total of 200B parameters and only 20B activation parameters, providing a significant cost advantage for training and reasoning. Based on efficient algorithms, the model provides industry-leading concurrent carrying capacity and achieves ultra-low latency of 20 milliseconds. When solving specific problems, large models must be able to query internet information, conduct multi-round searches, and think. Unlike other reasoning models that follow a "search before reasoning" mode, the Bean Bag app has been specifically trained based on the Bean Bag 1.5 Deep Thinking Model and can "search while thinking"; furthermore, this model also has visual understanding capabilities, enabling it to think based on visual scenes. The firm believes that the innovation of the Bean Bag 1.5 Deep Thinking Model lies in its use of the MoE architecture (total parameters of 200B, with only 20B activation parameters) and the dual-track reward mechanism.
- The Bean Bag Word-Image model 3.0 is newly upgraded, providing better text layout, image and picture production effects.
- The Bean Bag Word-Image model 3.0 is newly upgraded, able to achieve better text layout performance, real-life image generation effects, and 2K high-definition image generation methods; it can be widely used in marketing, e-commerce, design scenes such as film and television, posters, painting, and doll design. In the latest arena of ArtificialAnalysis in the Word-Image field, the Bean Bag Word-Image 3.0 model has surpassed many mainstream models in the industry, ranking in the top tier globally. The firm believes that the new upgrade of the Bean Bag Word-Image model 3.0 is expected to be applied in more scenarios.
- The Bean Bag Visual Understanding model is newly upgraded, with more accurate video positioning and smarter video comprehension.
- The Bean Bag Visual Understanding model is newly upgraded, with stronger visual positioning capabilities, supporting the positioning and counting of multiple targets, small targets, and general targets, as well as description of the positioning content and 3D positioning. It can be applied in offline store inspection scenarios, GUI agent, Siasun Robot & Automation training, and autonomous driving training. At the same time, the new version also has significantly improved video comprehension capabilities, such as memory, summarization, speed perception, and long video comprehension. The Bean Bag Visual Understanding model, combined with vector search, can directly perform semantic searches on videos, widely applicable in commercial scenarios such as security and home care. The firm believes that the new upgrade of the Bean Bag Visual Understanding model will continue to empower industries such as Siasun Robot & Automation, smart cars, and security in the future.
- Targeting Agent services, Volcano Engine released an OS Agent solution, GUI Agent large model - Bean Bag 1.5 UI-TARS model; targeting large-scale reasoning, Volcano Engine released the AI Cloud Native Serving Kit reasoning.The kit helps enterprises achieve rapid deployment of models, inference optimization, and observable operation and maintenance. The Serving Kit inference suite can complete the download and preheating of the 671B Deep Seek R1 in 2 minutes, and load the inference engine in 13 seconds. The industry believes that both the application Agent and OS Agent will experience rapid development in the future.Risk warning: Risks of changes in industrial policies, risks of insufficient development of AI applications, risks of underperformance of target company's performance.