Youke completed the integration of DeepSeek-OCR-2.

28/01/2026

On January 28th, UCloud completed the integration of DeepSeek-OCR-2. It is reported that the latest open-source DeepSeek-OCR-2, adapted the architecture to DeepEncoder V2, eliminating the classic CLIP visual branch, using LLM as the visual encoder, and proposing the visual causality flow paradigm to solve the semantic and sequence mismatch problems that often occur when facing complex tables or non-linear text with multi-modal large models. Specifically, traditional visual language models have inherent inductive biases: raster scanning, and applying fixed absolute position encoding. This is contrary to the human "semantic logic-based jumping scan" visual cognitive mechanism - when humans read documents, their gaze flows logically; when encountering tables, they scan by column or by row, and when encountering columns, they automatically jump.