OpenAI released its Operator AI agent for ChatGPT on Thursday, which should have been a major milestone for the company and AI development in general. While I wouldn’t pay $200/month to test this early version of Operator, what I saw in the OpenAI demos blew my mind. Operator is miles ahead of Google’s AI agents, at least when it comes to demos. I can’t wait to get my hands on it once OpenAI brings it to other ChatGPT paid tiers, and, more crucially for me personally, to the EU.
However, the real AI story taking over the world isn’t ChatGPT, Operator, or the massive Stargate project that was announced last week. The DeepSeek AI story took over the world when the Chinese startup released its R1 reasoning model that can match OpenAI’s ChatGPT o1.
There’s nothing surprising about that; we expect other AI firms to match o1. After all, OpenAI already unveiled o3, which should be announced in a few days or weeks. What’s unusual about DeepSeek is that the Chinese company made its models open-source, so any company or developer can access and inspect them. More interesting is the R1 research paper that DeepSeek released, which claims the highly sophisticated model was trained at a fraction of the cost of OpenAI’s o1.
News that DeepSeek R1 training is possible with only 3% to 5% of the resources OpenAI needs for similar progress with ChatGPT made waves around the world. AI-related stocks tanked during Monday’s early trading, just as DeepSeek surged to become #1 in the App Store, surpassing ChatGPT.
One of the problems with the current AI software concerns the cost of developing and using the product. Advanced models like o1 can cost tens of millions to develop. The process requires high-end graphics cards (GPU) that provide the necessary computing power and energy expenditures.
That’s why finished products like ChatGPT o1 can’t be available for free without limitations. Companies like OpenAI need to cover costs and turn a profit. That’s why the massive $500 billion Stargate program is such a monumental decision for AI development, especially considering the inevitable AI arms race between the US and China.
Add the US sanctions that prevent China from accessing the same high-end chips and GPUs that make the development of ChatGPT o1 products possible, and you’d think ChatGPT, Gemini, Meta AI, and Claude can’t get significant competition from China.
That’s where DeepSeek stunned the world. The Chinese startup knew it could not compete against OpenAI relying on raw power. It could not have access to the same number of GPUs that companies like OpenAI hoard. So, the DeepSeek researchers took another approach for R1, finding ways to train an advanced reasoning model without access to the same hardware.
It’s not just that, but DeepSeek made access to R1 much cheaper than OpenAI’s ChatGPT, which is a significant development. Add in the open-source nature of DeepSeek models, and you can see why developers would flock to test the Chinese firm’s AI and why DeepSeek would surge in the App Store.
According to the research, the Chinese startup replaced the Supervised Fine-Tuning (SFT) tech that OpenAI uses to train ChatGPT with Reinforcement Learning (RL) to produce faster, cheaper results. SFT relies on showing the AI ways to solve problems by providing access to data so the AI knows what sort of answers to provide to various prompts.
RL relies on the AI model, trying to figure out the answers with a reward system in place and then providing feedback to the AI. RL allowed DeepSeek to improve the reasoning capabilities of R1 and overcome the lack of compute. However, as VentureBeat explains, some SFT training, where humans supervise the AI, was needed in the early phases of R1 before they switched to RL.
While I pointed out the obvious drawbacks of relying on a ChatGPT rival from China right now, there’s no question that DeepSeek deserves attention.
At the very least, the innovations DeepSeek researchers developed can be copied elsewhere to achieve similar breakthroughs. After all, early versions of DeepSeek showed the Chinese startup might have copied ChatGPT development work. Whether AI or something else, tech innovations will always be stolen and adapted.
Think about it: DeepSeek came up with a more efficient way to train AI using only about 50,000 GPUs, 10,000 of which were NVIDIA GPUs purchased before the US export restrictions. Comparatively, the likes of OpenAI, Google, and Anthropic operate with more than 500,000 GPUs each, per VentureBeat.
I’d imagine researchers at these companies are now racing to see how and whether they can replicate the success of DeepSeek R1. I’d also imagine they will find ways to do it.
With so much compute and resources at the disposal of OpenAI, Google, Meta, and Anthropic, R1-like breakthroughs will soon be possible on top of what’s already available from AI models.
Also, while the market took a hit on the DeepSeek AI news from China, don’t think hardware, compute power, and energy will not matter in the future of AI development. Again, combine the innovations from DeepSeek with, say, a $500 billion fund and access to NVIDIA high-tech graphics cards, and you might get the early phases of AGI.
Once DeepSeek R1-like methods are employed for ChatGPT and Gemini development, the costs of advanced AI access will probably go down for premium users. This would be a key win for consumers.
Western AI firms won’t be able to keep costs up and compete with DeepSeek R1 and its successors. Some developers will always choose the cheaper models despite the AI’s country of origin and training bias. As a reminder, DeepSeek models will show bias towards China. This is still a piece of software that has to abide by local censorship laws.
I’ll point out that China won’t be just sitting idly by. These are early victories. DeepSeek isn’t alone, as ByteDance also released an o1-grade chatbot. Billions of dollars will be poured into AI development in the country for compute and energy. Remember, not everything coming from China can be taken at face value. It’s unclear whether the costs of training DeepSeek are real. Transparency works only up to a point.
Thankfully, because DeepSeek is open-source, others will soon be able to see if R1-like training can be successfully done elsewhere.
VentureBeat does a great job explaining the intricacies of DeepSeek R1 development at this link. DeepSeek’s technical paper accompanying Monday’s release of R1 can be found on GitHub.
The post Why is DeepSeek AI suddenly so popular? appeared first on BGR.