AI computing power demand enters the "2.0" era! OpenAI plans to spend $30 billion to purchase inference computing power, potentially exchanging it for a 10% stake in Cerebras.

date
11:36 17/04/2026
avatar
GMT Eight
OpenAI has agreed to pay over 20 billion dollars to the chip startup company Cerebras in the next three years to use servers powered by the company's chips.
According to informed sources, OpenAI has agreed to pay Cerebras, a chip startup, over 20 billion dollars in the next three years to use servers with the company's chips. Under the agreement, the developer of ChatGPT may also receive equity in the company. This development comes as OpenAI seeks to maintain its lead in the artificial intelligence competition and meet the growing demand. In January of this year, the company agreed to purchase up to 750 megawatts of computing power from Cerebras over three years, with a transaction valuation exceeding 10 billion dollars. The promised amount disclosed by sources exceeds the agreement previously reported between OpenAI and the chip manufacturer. The deal highlights the increasing demand in the industry for the computing power needed to run "inference" (the process of AI models generating responses). Currently, companies are competing to develop inference models and applications to drive wider adoption of AI. Reports suggest that Cerebras, headquartered in Sunnyvale, California, may disclose portions of the previously undisclosed agreement with OpenAI as soon as Friday. Under the agreement, OpenAI will receive a minority stake in Cerebras through stock warrants, with its ownership percentage potentially increasing with increased spending. It is also reported that OpenAI has agreed to provide around 1 billion dollars to Cerebras to help fund the construction of data centers that operate its AI products. The report also indicates that OpenAI's total expenditure over the next three years could reach 30 billion dollars, which could warrants representing up to 10% of Cerebras' shares. The explosion of inference demand In the early development of artificial intelligence, the industry's focus was mostly on "training". However, with OpenAI's agreement with Cerebras for up to 30 billion dollars of computing power, a clear signal has been released: the competitive focus of the AI industry is shifting from "how models become smarter" to "how to make intelligence cheaper". The main battleground for computing power is migrating on a large scale to "inference". Industry data shows that by 2026, the percentage of incremental computing power brought by inference will reach two-thirds, and it will exceed 80% in the future. The growth rate is also astonishing; according to the latest data from OpenRouter, in just one week in early April, the total calls for AI large models globally reached 2.7 trillion Tokens, an 18.9% increase compared to the previous week, with China's AI large models weekly call volume reaching 1.296 trillion Tokens, surpassing the United States for five consecutive weeks. At the same time, the threshold for inference is rapidly decreasing. According to the 2025 Stanford Artificial Intelligence Index report, the cost of achieving the same performance as GPT-3.5 in inference has decreased by 280 times in two years. With exploding demand and rapidly decreasing costs, these two forces combined are paving the way for the scalable application of AI. Training costs are fixed and predictable capital expenditures. But as the user base exceeds hundreds of millions, the cost of each ChatGPT response and every video generated is incurring real-time expenses. This operational cost increases linearly with the growth of users. Experts predict that over 90% of the AI industry's computing power costs in the future will occur during the inference stage. For companies like OpenAI, if they cannot reduce the cost of individual inferences to the extreme, their business model's moat will be extremely fragile. Training requires strong general performance, while inference values energy efficiency and latency. This provides a significant living space for startups like Cerebras, Groq, and major cloud vendors developing their own chips (such as Google TPU, AWS Inferentia).