Huawei will release AI inference innovative technology UCM to achieve high throughput and low latency experience.
On August 12, at the 2025 Financial AI Reasoning Application Landing and Development Forum, Huawei will release the AI reasoning innovation technology UCM. As an inference acceleration suite centered around KV Cache, it integrates various types of cache acceleration algorithm tools and manages the KV Cache memory data generated during the inference process in a hierarchical manner, expanding the inference context window to achieve a high throughput, low latency inference experience, and reducing the cost of inference per token.
Latest
2 m ago