Chinese startup DeepSeek tanked tech stocks on Monday after releasing its DeepSeek R1 reasoning AI model. In a research paper, the company revealed that it trained R1 using software innovations rather than having access to massive hardware resources like OpenAI and other US companies.
US sanctions block Chinese companies from purchasing the AI chips they need to match OpenAI, Google, and other AI firms in the West. The claim that software optimizations can replace hardware is what hurt the stock market, especially NVIDIA and other AI hardware companies.
In the aftermath of Monday’s bloodbath, I said the worries were overblown. Software optimization can’t fully replace hardware capabilities. I also pointed out that US companies like OpenAI will likely test and deploy some of the software innovations DeepSeek came up with. But they will also have a massively superior infrastructure. In turn, OpenAI and others can deliver similar breakthroughs, further reduce the cost of access, and outperform the likes of DeepSeek.
While that was speculation from yours truly, someone who doesn’t develop artificial intelligence models for a living, you’ll also want to read Dario Amodei’s detailed explanation of what DeepSeek achieved, what it means for the current moment in the AI war between the US and China, and how it impacts the road to AGI.
As the CEO of Anthropic and a former lead engineer at OpenAI, Amodei is among the most qualified AI experts to dissect the DeepSeek breakthrough.
Amodei does a great job explaining how AI development works, why DeepSeek’s innovations are important, why the training costs the Chinese startup proposed are misleading, and why the US has a big advantage over China thanks to access to superior hardware.
The cost of DeepSeek isn’t that big of a surprise
Amodei makes the case for continued control over key hardware exports to China, arguing that the current measures work despite what DeepSeek has achieved. Yes, the Chinese startup came up with software ideas to improve the efficiency of AI development and cut costs. But in the grander scheme of things, DeepSeek’s development and spending is on par with some US AI firms:
DeepSeek does not “do for [$6 million] what cost US AI companies billions”. I can only speak for Anthropic, but Claude 3.5 Sonnet is a mid-sized model that cost a few $10 M’s to train (I won’t give an exact number). Also, 3.5 Sonnet was not trained in any way that involved a larger or more expensive model (contrary to some rumors).
Sonnet’s training was conducted 9-12 months ago, and DeepSeek’s model was trained in November/December, while Sonnet remains notably ahead in many internal and external evals. Thus, I think a fair statement is “DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)”.
Amodei also singled out DeepSeek V3 as the real breakthrough from the Chinese startup, which then made the development of R1 possible. However, the Anthropic CEO said that training V3 several months after similarly powerful US AI models for a cheaper cost is “totally normal, totally ‘on trend,’” considering what’s happening in the industry:
All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.
This has never happened before and is geopolitically significant. However, US companies will soon follow suit — and they won’t do this by copying DeepSeek, but because they too are achieving the usual trend in cost reduction.
Amodei also looked at the rumored hardware capabilities of DeepSeek, which reportedly has 50,000 Hopper generation chips at its disposal for AI training. While the figures can’t be confirmed, and some of these GPUs might have been smuggled due to the US sanctions, Amodei concluded that DeepSeek must have spent about $1 billion on the hardware:
Thus, DeepSeek’s total spend as a company (as distinct from spend to train an individual model) is not vastly different from US AI labs.
Amodei noted that it’s also normal for several AI companies to come up with AI models similar to ChatGPT o1. That’s what DeepSeek did. But there will soon come a time when developing more advanced models will not be as easy for every player in the field:
However, because we are on the early part of the scaling curve, it’s possible for several companies to produce models of this type, as long as they’re starting from a strong pretrained model. Producing R1, given V3, was probably very cheap. We’re, therefore, at an interesting “crossover point,” where it is temporarily the case that several companies can produce good reasoning models. This will rapidly cease to be true as everyone moves further up the scaling curve on these models.
Who will reach AGI first, the US or China?
Amodei’s main point is that spending to train more advanced hardware will not stop just because DeepSeek stunned the world with software efficiencies. Billions of dollars will continue to be poured into making better AI models, and that involves getting millions of high-end chips to get the job done but also coming up with DeepSeek-like efficiencies:
To the extent that US labs haven’t already discovered them, the efficiency innovations DeepSeek developed will soon be applied by both US and Chinese labs to train multi-billion dollar models. These will perform better than the multi-billion models they were previously planning to train — but they’ll still spend multi-billions. That number will continue going up, until we reach AI that is smarter than almost all humans at almost all things.
That “AI that is smarter than almost all humans at almost all things” sounds like Amodei’s version of AGI or artificial general intelligence. That’s AI that can handle any task with the same creativity as a human and with the added advantage of possessing nearly infinite knowledge.
Amodei predicts that AGI will happen in 2026-2027 and will require “millions of chips, [and] tens of billions of dollars (at least).”
“DeepSeek’s releases don’t change this, because they’re roughly on the expected cost reduction curve that has always been factored into these calculations,” he said.
Amodei sees two possible scenarios. In a bipolar world, the US and China will develop AGI around the same time. The “powerful AI models that will cause extremely rapid advances in science and technology” will be available to both nations. But that’s only if China gets the millions of high-end chips needed for AGI development. Eventually, China could get an edge over the US in this scenario.
The other scenario is a unipolar world with the US and its Western allies at the top of AI innovations.
It’s unclear whether the unipolar world will last, but there’s at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.
Amodei argues that for the latter to happen, sanctions must continue so that China can’t easily buy or smuggle the millions of chips needed for AGI development.
“[We] shouldn’t hand the Chinese Communist Party technological advantages when we don’t have to,” Amodei says early in the blog, explaining later that DeepSeek is not the adversary and the Chinese researchers are actually smart scientists looking to develop useful tech:
Given my focus on export controls and US national security, I want to be clear on one thing. I don’t see DeepSeek themselves as adversaries, and the point isn’t to target them in particular. In interviews they’ve done, they seem like smart, curious researchers who just want to make useful technology.
But they’re beholden to an authoritarian government that has committed human rights violations, has behaved aggressively on the world stage, and will be far more unfettered in these actions if they’re able to match the US in AI. Export controls are one of our most powerful tools for preventing this, and the idea that the technology getting more powerful, having more bang for the buck, is a reason to lift our export controls makes no sense at all.
Amodei’s full blog post is available at this link, and you should read it in full to better understand DeepSeek’s breakthroughs in the context of the looming AI war between the US and China.
The post DeepSeek isn’t as great as it seems and won’t help China beat the US to AGI: Dario Amodei appeared first on BGR.