Deepseek is More Wall Street Than Silicon Valley
[Opinion] It's a Chinese company. But chip curbs and big-power rivalry are less relevant than people think.
Good Morning from Taipei,
I get the sense that many people have spent more time opining about Deepseek than actually reading and trying to understand the company and its recent developments.
Rather than weigh in with yet another quicktake, I decided to step away until the noise died down. Now seems as good a time as any to explain what I think most people have missed.
First, I think the obsession with Deepseek’s nationality misses the most important characteristic of the company. I understand the knee-jerk reaction to paint Deepseek as some poster child of US-China rivalry. That framing fits, if you trim some facts and squeeze the truth, but it’s actually kind of lazy.
There’s also the two-part narrative about how Deepseek’s light-weight approach to AI modeling relates to US export policy. Part 1: Chip curbs will backfire. Part 2: See, Deepseek is proof. This asinine argument popped up a lot in the past two weeks, and is a good example of confirmation bias. It’s also wrong.
Spend a little time trying to understand the company and you learn that Deepseek’s country of origin and the US-China tech Cold War are the least interesting facets of the story.
In fact, I think Deepseek’s genesis as the Skunk works project of High Flyer, a quantitative-analysis focused hedge fund, tells us a lot more about the startup’s approach to artificial intelligence than its location or nationality.
Paul Mozur at the NY Times gives us a hint: “Having spent more time with his interviews, it is striking how much he does NOT sound like a typical Chinese tech boss,” Mozur wrote on LinkedIn. Mozur, who is fluent in Chinese, drew from a couple of rare interviews founder Liang Wenfeng gave to Chinese technology site 36Kr in July to write a profile.
But, to date, I think no one has a better job of collating and translating interviews and information about Liang and Deepseek than Jordan Schneider and his team at ChinaTalk. Schneider’s own interviews and guest posts further strengthened his team’s reportage.
Distilling Deepseek’s Essence
Deepseek’s breakthrough is generally summarized as: Comparable with OpenAI, but at a fraction of the price.
Hmm, er. Kind of true. But, not really. Basically, Liang and his team cut corners to come up with a model that gets the job done with a lot less grunt work. That’s not a diss of Deepseek. In fact, it’s in the DNA of hedge funds to find and exploit inefficiencies. Get it right, and they make money. But efficiency is key because the cost of computing, infrastructure, and staff eat into profits. In financial markets we call this arbitrage. In computer science, we call it optimization.
The essence of a quantitative hedge fund’s business model is to draw on publicly available information, such as trading data, financial reports, and economic statistics. This might also be supplemented by proprietary data, such as satellite images or industry reports. The quants then distill all of this down to only what’s relevant. Then, using their own algorithms, they draw inferences about the state of the world, or financial markets.
Hopefully you’re noticing some common key words here: data, algorithms, distill, inference. Let me spell it out more clearly: Deepseek’s breakthrough was having methodologies and algorithms in place which allowed it to discard unnecessary data and use only what’s needed. That it went the open-source route further heightens its savvy because (among other things): a) many hands make light work, b) Deepseek’s team is incentivized when they feel part of a global community.
Seeds of Innovation
To say that Deepseek was driven by chip curbs to find more efficient solutions misses the mark. Take the invention of the Multi-head Latent Attention architecture (MLA), which cuts the amount of memory needed and is an important part of the models Deepseek released. According to Liang himself, it was one of his young researchers who happened upon the idea.
“After summarizing some of the mainstream changes in the attention architecture, he suddenly came up with an idea to design an alternative solution. However, it was a long process from idea to implementation. We formed a team for this and it took several months to get it working,” Liang told 36kr.
And rather than successfully skirting the lack of GPUs, Liang himself noted that access to computing was in fact a curb. “The problem we face has never been money, but the ban on high-end chips,” he said in the same interview.
“For researchers, the desire for computing power is endless. After doing small-scale experiments, we always want to do larger-scale experiments,” Liang said in an earlier interview with 36kr. “After that, we will also consciously deploy as much computing power as possible.”
If chip curbs truly were the inspiration for a Chinese company deploying a more efficient approach to AI then surely Baidu, Alibaba, Tencent and ByteDance would have done the same, and earlier. But that’s not the case.
In reality, those Chinese tech companies share more in common with foreign counterparts such as OpenAI, Microsoft, Meta and Google than they do with a domestic quantitative hedge fund. That is, they’re all massive, bloated, hierarchical corporations.
The upside of this bloat is having the resources to build massive models from scratch. The downside of this bloat is spending resources to build massive models from scratch.
Deepseek, where efficiency and innovation are deeply ingrained in its parent-company’s culture, seems to have “borrowed” these models — by tapping into the tokens created by other companies. Instead of reinventing the wheel, Deepseek built a bicycle. Oh, and it cost a lot more than the $6 million figure that’s widely cited. Anyone who reports this number without at least couching it with a qualifier like “claimed” deserves to be ignored.
The big-model approach by OpenAI and its peers isn’t wrong, however, and Deepseek’s success shouldn’t be taken as a repudiation of their strategies. Big-tech’s foundational models are more comprehensive, more robust against hacks and jailbreaking, and more adaptable across media types (ie, multimodal). That the established players didn’t get onto distillation earlier could be seen as a failure on their part, sure, but there’s no fundamental moat around Deepseek’s work that prevents others embarking down the simplification route. AI has a long journey ahead, and we’re only in the early stages.
Deepseek’s Unsolvable Weakness
There is a massive flaw in Deepseek’s models, one which will be hard to rectify: censorship. A favorite sport these days is to grab a Chinese service and try to trick it into admitting that Taiwan is a country or that people were killed at Tiananmen Square in 1989. And when the inevitable response comes, the protagonist can exclaim with glee “see! censorship.” That’s cute, but not particularly useful.
The real concern goes much deeper. Mankind’s march toward artificial intelligence will be impossible if swathes of information are removed from the corpus of data. An AI can’t correctly explain the significance of Deng Xiaoping’s 1992 Southern Tour without making reference to the 1989 Tiananmen Square massacre. It would be like explaining the US Civil War without mentioning slavery and instead summarizing it as merely a conflict over states’ rights. Deepseek can skirt the censorship issue by becoming an expert in maths and science; solving trigonometry equations or folding proteins. But leaving out the humanities, and fudging history, is a curb on progress that cannot be overcome if the goal is to achieve general intelligence.
Financial Markets and Innovation
This whole story reminds me of a theory put forth by Morris Chang, the founder of TSMC, who believes financial deregulation contributed to the US falling behind in semiconductors.
According to his telling, before the shackles were removed from banks in the 1980s all the smartest talent went to the chip sector. At that time, software was not really a thing and semiconductors truly were at the forefront of technological progress. But when Wall Street was given free rein, financial institutions suddenly had the money and incentive to hire the best STEM grads they could find. Financial-industry innovation took off to the detriment of technological innovation.
I’m not sure if I fully subscribe to this theory. But it cannot be denied that even today, in the age of exciting advances in AI, Wall Street remains a huge draw for thousands of the world’s best and brightest. Sure, smart people still go to Silicon Valley, but not all of them. The FANGs are competing with the Wolves of Wall Street.
The way I see it, Deepseek not only built on the cultural foundation of its parent, a quantitative hedge fund, but tapped into a zeitgeist in China. Young graduates are rejecting the old 996 approach to work, and are hungry for both meaning and to be part of something exciting.
A Challenge for Wall Street
So now I lay down this challenge to Wall Street and I ask you, my loyal reader, to pass it on to anyone and everyone in the hallowed halls of finance:
To the hedge funds and quant teams.
To the engineers building high-frequency trading systems, and the computer scientists hard at work tweaking their market-beating algos.
Put your heads down and build something better, smarter, more efficient and more powerful than anything a Chinese hedge fund or the entirety of Silicon Valley can manage.
AI is here, show us what you can do.
Thanks for reading.
We've been experimenting with the Chinese open source models....one interesting thing we've found is that, when installed and run on a local server, none of them are censored. Asked about Tiananmen 1989, they all return info about the student protests and crackdown. This implies that the censorship is happening downstream of the models, like a black mirror of western AI system's safety module to filter hallucinations. It also makes sense if you consider that everyone is essentially using the same training data and all the foundation models, Chinese or US, are inevitably heading towards commoditisation.
unsub'd