Zheshang: General large model and product-type AI application vendors are expected to benefit deeply from the "large model+MCP+A2A" ecosystem.

date
18/04/2025
avatar
GMT Eight
Zheshang released a research report stating that recently, OpenAI launched the o4-mini and full-blooded version o3 models, with the potential to improve the tool usage capability to achieve model tasks in complex scenarios, while Google released a series of multimodal model updates, with improved cross-modal capabilities expected to significantly extend AI application scenarios. On the ecosystem side, many domestic and international manufacturers have announced their access to the MCP protocol recently, and Google has also introduced the A2A protocol aiming to build a multi-agent collaborative application ecosystem, general large models, and product-type AI application manufacturers are expected to benefit deeply. Zheshang's main points are as follows: OpenAI launched the o4-mini and full-blooded version o3 models, with greatly improved multimodal capabilities and intelligence levels. o4-mini and o3 are two multimodal models that can simultaneously process text, images, and audio, and can be automatically called by the agent to search networks, generate images, parse code, and engage in deep thinking modes (thinking about images in the thinking chain). The realization of tool usage capabilities allows the model to handle more complex task scenarios, not just limited to simple text generation. In terms of availability, in addition to ChatGPT Plus, Pro, and Team users being able to call the model, full-blooded o3 and o4-mini are also open to developers through the Chat Completions API and Responses API. The Responses API supports inference summary function, which can retain inference marks during function calls to improve performance, and will soon support built-in tools, including web search, file search, and code interpreter, to enhance the model's inference capabilities. Google released a series of multimodal model updates, significantly improving cross-modal capabilities. Recently, Google released a series of major AI updates at the Google Cloud Next 25 conference, achieving a significant iteration in cross-modal capabilities, which is expected to greatly expand AI application scenarios and meet different user needs, namely: (1) Video generation model Veo2: The Veo2 model now supports features such as generating videos from P videos, keyframe generation, expanding frames, and camera control; (2) Audio understanding and generation model Chirp3: Chirp3 provides natural and realistic speech in over 35 languages (including Chinese), the model can generate very realistic custom voices through a 10-second short recording, and can differentiate speakers' identities in the audio, improving the usability of audio-to-text; (3) Music generation model Lyria: Capable of producing high-fidelity audio, accurately capturing subtle differences, and presenting rich and delicate musical works covering various music genres, which can help enterprises improve their brand experience and simplify content creation. (4) Image generation model Imagen3: The model has improved the editing/repairing function effects, able to quickly remove/repaint unwanted objects or flaws in images. MCP + A2A protocol is expected to promote the prosperity of the agent application ecosystem, focusing on value chain investment opportunities. MCP allows AI models to retrieve data from sources such as business tools, software, databases, and application development environments to complete tasks. Since Anthropic open-sourced the MCP service agreement in November last year, there are already over 1000 MCP servers built by the community available for use by February this year. Recently, OpenAI announced that its Agents SDK supports the MCP service agreement (Chat GPT desktop application and Responses API will also support soon); domestically, recently Alibaba Cloud launched the industry's first full-lifecycle MCP service on Bai Lian, enabling users to quickly build an Agent (Intelligent agent) that connects to the MCP service in just five minutes without having to manage resources, develop deployments, and engineering operations, and Tencent Cloud also quickly followed suit, officially launching the "AI Development Kit," supporting MCP plug-in hosting services to help developers build business-oriented AI Agents in as little as five minutes. On April 10, at the Google Cloud Next 25 conference, Google open-sourced the first standard Agent Interaction Protocol - Agent 2 Agent Protocol (A2A), A2A is expected to break the system's isolation, make qualitative changes to the agents' abilities, cross-platform capabilities, and execution efficiency, supporting mainstream enterprise application platforms such as Intuit, MongoDB, Salesforce, SAP, ServiceNow, and Workday. In practical applications, the client Agent is responsible for formulating and conveying tasks, while the remote Agent takes action based on these tasks to provide the correct information or perform the corresponding operations, and Agents can send messages to each other (which can include context information, replies, or user instructions), enabling them to work better together to complete complex tasks. Suggested focus targets The industry believes that with the acceleration of performance iterations in large models and the perfection and application of data and execution layer protocols such as MCP and A2A, the AI agent application ecosystem is expected to accelerate its construction. It is recommended to pay attention to investment opportunities in general large models and product-type application manufacturers: Agent applications: Iflytek Co.,ltd., Focus Technology, Hangzhou Raycloud Technology Co., Ltd, Servyou Software Group, Digiwin Co., Ltd., Jiangsu Eazytec Co., Ltd., Richinfo Technology, MARKETINGFORCE, Weaver Network Technology; AI vertical applications: Beijing Kingsoft Office Software, Inc, Fujian Foxit Software Development Joint Stock, Wondershare Technology.Group, Intsig Information, Kunlun Tech, ArcSoft Corporation, Shanghai Runda Medical Technology, MEITU, SENSETIME-W.Risk warning: 1. Risks of AI technology iteration falling behind expectations; 2. Risks of AI commercial products not being released as expected; 3. Risks from policy uncertainty; 4. Risks from uncertainty in the downstream market.

Contact: [email protected]