The first letter delay is reduced by 3.6 times, Tencent Huan Yuan proposes the Stem sparse attention algorithm, and long text reasoning accelerates the new SOTA.
Tencent Hunyuan announced the proposal of the Stem sparse attention algorithm, which has been included in the machine learning conference ICML-26. According to the full-stack acceleration solution of Stem algorithm x HPC operator, at the algorithm level, Stem achieves nearly lossless accuracy with a 25% budget reduction through token position decay and output perception metrics; at the operator level, the open-source Stem+BSA operator from HPC converts sparse benefits into real hardware acceleration, reducing the initial delay of the first word by 3.7 times with 128K context.
Latest

