Tencent's MixOpen Source 0.3B-end model has a memory usage of just 600MB.

date
15:15 10/02/2026
avatar
GMT Eight
Tencent Hunyuan officially launched a "very small" model HY-1.8B-2Bit designed for consumer-grade hardware scenarios, with an equivalent parameter quantity of only 0.3B and a memory footprint of only 600MB, smaller than some commonly used mobile phone applications.
On February 10th, Tencent Hunyuan officially launched a "ultra-small" model HY-1.8B-2Bit targeted at consumer-grade hardware scenarios, with an equivalent parameter quantity of only 0.3B and a memory footprint of only 600MB, smaller than some common mobile phone applications. This model, derived from the previous Hunyuan small-sized language model - HY-1.8B-Instruct through 2-bit quantization-aware training (QAT), reduced the equivalent parameter quantity of the original precision model by 6 times, while improving the generation speed of the model on real edge devices by 2-3 times, maintaining the full reasoning ability of the original model, and greatly enhancing the user experience. The release of HY-1.8B-2Bit model from Tencent Hunyuan allows for easy deployment on edge devices, making it the first practical application of 2-bit industrial-level quantization in edge models. In addition, the HY-1.8B-2Bit model also retains the full reasoning ability of Hunyuan-1.8B-Instruct, providing users with flexibility by offering a concise thinking chain for simple queries and a detailed long thinking chain for complex tasks, allowing users to flexibly choose between these two modes based on the complexity of their application and resource constraints. Tencent Hunyuan has also used data optimization, elastic stretching quantization, and training strategy innovation to maximize the comprehensive abilities of HY-1.8B-2Bit. For deployment, Tencent Hunyuan provides the model weights of HY-1.8B-2Bit in gguf-int2 format and bf16 pseudo-quantization weights. Compared to the original precision model, HY-1.8B-2Bit has decreased in actual model size by 6 times to only 300MB, making it flexible for deployment on edge devices. The model has also been adapted for platforms such as Arm, enabling deployment on mobile devices using Arm's SME2 technology for efficient operation. On the MacBook M4 chip, HY-1.8B-2Bit was tested with a fixed number of threads at 2, testing the first word delay and generation speed at different window sizes. Three models, fp16, Q4, and HY-1.8B-2Bit in gguf format, were selected for comparison. The first word delay showed 3-8 times acceleration within 1024 input size, and the generation speed achieved at least 2 times stable acceleration compared to the original model precision under common window sizes. Tests were also conducted on the Dimensity 9500, showing that the acceleration of the first word delay in HY-1.8B-Q4 format was about 1.5-2 times, while the generation speed was accelerated by approximately 1.5 times. To achieve flexible deployment of large language models on edge devices, HY-1.8B-2Bit utilizes ultra-low bit quantization technology, maintaining performance comparable to the INT4-PTQ method while achieving high-efficiency stable reasoning on edge devices. Currently, the capabilities of HY-1.8B-2Bit are still limited by the supervised fine-tuning (SFT) training process and the performance and resilience of the base model itself. In response to this issue, the Hunyuan team will focus on reinforcement learning and model distillation technologies in the future, aiming to further narrow the performance gap between low-bit quantization models and full precision models to broaden the application prospects for large language model deployment on edge devices.