Sealand: Large-scale model technology-driven AI valuation welcomes reshaping, maintaining a "recommended" rating for the computer industry.
21/04/2025
GMT Eight
Sealand releases a research report stating that large model technology is undergoing accelerated changes, from architectural innovations to upgraded training paradigms, driving the accelerated arrival of the AGI era. The fusion of model architectures MoE and Transformer has become mainstream, and synthetic data has become the "new oil". In the post-training phase, the computation and inference time of RL become critical, and DeepSeek is driving a new paradigm of reinforcement learning. Through low-rank decomposition technologies such as MLA, the deployment of 32B-level models locally only requires consumer-grade graphics cards, leading to a true breakthrough in the deployment of large models. The steady improvement of large model technology is driving the accelerated arrival of the AGI era, and technological iterations based on large models may continue to drive the reshaping of domestic AI valuations, maintaining a "recommended" rating for the computer industry.
Key points from Sealand:
Review of the development of large models: Based on Transformer, Scaling law runs through them all
In 2017, the Google team proposed the Transformer architecture, creatively driving the development of attention layers and feedforward neural network layers, accelerating the performance of models. From 2018 to 2020, it was the era of pre-training Transformer models, with GPT-3 breaking through the limits of large-scale pre-training with 175 billion parameters, and technologies such as SFT and RLHF helping models accelerate alignment with human values. Subsequently, with the appearance of diminishing returns described by the Scaling Law on the training side, and high-quality text data gradually being exhausted by AI, inference models began to enter people's field of view; with OpenAI's release of o1-preview, the accuracy of answering questions by the AIME 2024 model increased from 13.4% of GPT4o to 56.7%, and the model continues to be iteratively updated.
Domestic progress of large models: Industry competition to reduce costs and increase efficiency is the main theme
Under limited resources, low-cost high-performance parity with overseas SOTA is expected to be the main theme for domestic large models by 2025. Taking DeepSeek, Doupack, and Alibaba Qwen as examples, 1) DeepSeek-R1/V3 relies on innovative cost reduction and efficiency improvement measures, aiming to greatly increase the utilization of GPUs in computing/communication under limited resources. 2) Doupack's large models made a strong effort in the second half of 2024, with monthly active data ranking second globally and first domestically; similarly, relying on sparse MoE architecture to achieve high performance with small parameters in the paradigm of cost reduction and efficiency improvement; 3) Alibaba Qwen leads the benchmark of domestic open-source models, and the QwQ-32B model launched based on reinforcement learning paradigm has become the world's strongest open-source model, with 32B parameter model performance parallel to the DeepSeek-R1 full-blooded model, with high performance of small parameters continuing to be the main theme.
Overseas progress of large models: Resources are concentrated on AGI
With abundant computing power, resources are tilted towards betting on AGI. 1) OpenAI: Inference models such as o1 and multimodal models Sora are leading the industry, and CEO Altman has mentioned several times that OpenAI's first Agent will be launched in 2025, which will also be the year of the Agent's outbreak; 2) Google: Forward-looking layout of native multimodal Gemini, released multiple Agent products at the end of 2024, while also layouting lightweight models Gemma to seize the edge ecological;3) Meta: In December 2024, Llama3.3 achieved the performance of Llama3.1405B with 70B parameters; based on Meta Live, real-time voice interaction and cross-device collaboration capabilities have been realized, focusing on general intelligent agents; 4) In October 2024, Claude3.5 Sonnet was upgraded to add computer use capabilities, allowing Claude to use computers like humans; in addition, in 2025, a mixed inference model Claude-3.7-sonnet was launched ahead of time.
Future assessment of models: Betting on post-training + algorithm optimization, low-cost deployment + achieving AGI as the ultimate goal
Models are undergoing accelerated changes in architecture, pre-training-post training-deployment. 1) At the model architecture level, the fusion of MoE and Transformer is gradually becoming the mainstream architecture, with the number of global MoE large models exploding in 2024; 2) At the pre-training level, with high-quality data either gradually being exhausted, synthetic data has become the "new oil" of the digital economy, continuing to support model training iterations; 3) In terms of post-training, the key to the leap in performance of inference models is gradually shifting to the consideration of the computation of RL in this stage and the thinking time of the testing inference stage, while DeepSeek drives a new paradigm of pure reinforcement learning; 4) In terms of deployment, DeepSeek drives the trend of accelerated low-cost deployment of models, achieving a significant reduction in memory consumption through low-rank decomposition techniques such as MLA, enabling the local deployment of DeepSeek-R1-32B and below models to only require consumer-grade graphics cards, marking a true breakthrough in the deployment of large models.