~/Adi
On DeepSeek and Export Controls
Quoting Dario Amodei:
People are naturally attracted to the idea that “first something is expensive, then it gets cheaper” — as if AI is a single thing of constant quality, and when it gets cheaper, we’ll use fewer chips to train it. But what’s important is the scaling curve: when it shifts, we simply traverse it faster, because the value of what’s at the end of the curve is so high.
because this type of RL is new, we are still very early on the scaling curve: the amount being spent on the second, RL stage is small for all players. Spending $1M instead of $0.1M is enough to get huge gains. Companies are now working very quickly to scale up the second stage to hundreds of millions and billions, but it’s crucial to understand that we’re at a unique “crossover point” where there is a powerful new paradigm that is early on the scaling curve and therefore can make big gains quickly.
DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.
If the historical trend of the cost curve decrease is ~4x per year, that means that in the ordinary course of business — in the normal trends of historical cost decreases like those that happened in 2023 and 2024 — we’d expect a model 3-4x cheaper than 3.5 Sonnet/GPT-4o around now. Since DeepSeek-V3 is worse than those US frontier models — let’s say by ~2x on the scaling curve, which I think is quite generous to DeepSeek-V3 — that means it would be totally normal, totally “on trend”, if DeepSeek-V3 training cost ~8x less than the current US models developed a year ago.
the economic value of training more and more intelligent models is so great that any cost gains are more than eaten up almost immediately — they’re poured back into making even smarter models for the same huge cost we were originally planning to spend.
To the extent that US labs haven’t already discovered them, the efficiency innovations DeepSeek developed will soon be applied by both US and Chinese labs to train multi-billion dollar models. These will perform better than the multi-billion models they were previously planning to train — but they’ll still spend multi-billions. That number will continue going up, until we reach AI that is smarter than almost all humans at almost all things.
© 2025 ~/Adi - Credits - Disclaimer