Memory tensor x SenseTime's large-scale equipment: Domestic GPGPU reasoning cost surpasses A100

04/12/2025

Memory Tensor Technology Limited and the Big Compute Team of SenseTime jointly announced that they have successfully achieved the industry's first PD-separated commercial inference cluster on a domestic GPGPU cluster with "memory-computation-scheduling" integration as its core, and it is running stably in a real production environment. Test data shows that the comprehensive inference cost-effectiveness of this solution has reached 150% of the same generation NVIDIA A100, indicating that the domestic computing power system has for the first time achieved system-level competitiveness in the commercialization of large-scale models. This breakthrough has found a differentiated path for the domestic computing power ecosystem. PD separation has evolved from hardware optimization to a memory-central design paradigm. In the MemOS system, the separated architecture can extend to higher dimensions such as behavior prediction, context planning, memory layout, etc., becoming an organic part of the overall architecture. This also heralds the formal entry of C-end scenarios into the era of "memory inference."