Why China’s DeepSeek AI Is Blowing Everyone’s Minds—And Blowing Up the Market

by shayaan

A Chinese artificial intelligence lab has done more than just building a cheaper AI model -it has exposed the inefficiency of tackling the entire industry.

The breakthrough of Deepseek showed how a small team, in an attempt to save money, was able to reconsider how AI models were built. While Tech giants such as OpenAi and anthropic editions Several billions of dollars On computing power alone, Deepseek reportedly achieved comparable results for just over $ 5 million.

The model of the company corresponds to whether GPT-4O (OpenAi’s best LLM), OpenAI O1-Opendai’s best reasoning model that is currently available and Claude 3.5-Sonnet from Anthropic on many benchmarkests, with approximately 2,788 m H800 GPU hours for The full training. That is a very small part of the hardware that traditionally needed.

The model is so good and efficient, it climbed in a matter of days to the top of the category of the iOS productivity of Apple, and challenged the Dominance of OpenAi.

Need is the mother of innovation. The team was able to achieve this with the help of techniques that American developers did not even have to consider – and do not even dominate today. Perhaps the most important thing was that instead of using complete precision for calculations, Deepseek has implemented 8-bit training, which reduced memory requirements by 75%.

“They discovered an 8-bit training with a floating comma, at least for some of the numeric,” Pertlexity CEO Aravind Srinivas told CNBC. “As far as I know, I think the training of the Floating-Point 8 is not understood so well. Most training in America is still running in FP16.”

See also  2024’s market ends on a sleepy note

FP8 uses half of the memory bandwidth and storage compared to FP16. For large AI models with billions of parameters, this reduction is considerable. Deepseek had to control this because the hardware was weaker, but OpenAi never had this limitation.

Deepseek also developed a “multi -token” system that processes entire sentences at the same time instead of individual words, making the system twice as fast, while retaining 90% accuracy.

Another technique that used was something that was called “distillation” – making a small model replicates the outputs of a larger one without having to train in the same knowledge database. This made it possible to release smaller models that are extremely efficient, accurate and competitive.

The company also used a technique called “Mix of Experts”, which contributed to the efficiency of the model. While traditional models constantly keep all their parameters active, the system of Deepseek uses 671 billion total parameters, but only activates 37 billion at a time. It is as if you have a large team of specialists, but only call the experts needed for certain tasks.

“We use Deepseek-R1 as the teachers model to generate 800k training samples and to coordinate various small dense models. The results are promising: Deepseek-R1-Distill-Qwen-1.5B performs better than GPT-4O and Claude-3.5-Sondet on mathematical benchmarks with 28.9% on aime and 83.9% on mathematics, ”” wrote in his newspaper.

For the context, 1.5 billion is such a small amount of parameters for a model that it is not considered an LLM or large language model, but rather an SLM or small language model. SLMS requires so little calculation and VRAM that users can perform them on weak machines such as their smartphones.

The cost implications are amazing. In addition to the reduction of the 95% in the training costs, the API of Deepseek charges only 10 cents per million tokens, compared to $ 4.40 for comparable services. A developer reported Processing 200,000 API requests For about 50 cents, with No tariff restriction.

See also  DeFi Market Experiences Revenue Drops as On-chain Activity Reduces 

The “Deepseek effect” is already noticeable. “Let me say the calm part aloud: AI model Building is a cash trap,” said investor Chamath Palihapitiya. And despite the punches that were thrown in Deepseek, OpenAI CEO Sam Altman quickly pumped the brakes on his search to press users for money, after all raves on social media about people who reach Deepseek for free, which opened openi $ 200 a month for free To do.

In the meantime, the Deepseek app has beaten the download diagrams at the top of the top six trend Repositories on Github are related to Deepseek.

Most AI shares are no more than investors wonder if the hype is at bubble levels. Both AI -hardware (NVIDIA, AMD) and software shares (Microsoft, Meta and Google) suffer from the consequences of the apparent paradigm shift caused by the announcement of Deepseek and the results shared by users and developers.

Even AI Crypto tokens took a hit, with scads or deep chat ai -token that appear in an attempt to lift days.

Apart from the financial wreck, the collection meal of all this that the breakthrough of Deepseek suggests that AI development may not require huge data centers and specialized hardware. This could fundamentally change the competitive landscape, so that what many were considered to be permanent benefits of large technology companies in temporary leads.

The timing is almost comical. Only a few days before the announcement of Deepseek, President Trump, Sam Altman of OpenAi and Oracle’s founder Stargate – an investment of $ 500 billion in the American AI infrastructure. In the meantime, Mark Zuckerberg doubled the dedication of Meta Pour billions in AI developmentAnd Microsoft’s investment of $ 13 billion in OpenAi suddenly resembles strategic genius and more as expensive Fomo fed by a waste of resources.

See also  ‘We Are Not Selling’: Trump-Linked DeFi Project Addresses $250M in Ethereum Token Moves

“Whatever you did not to catch up, didn’t even matter,” Srinivas told CNBC. “They are catching up anyway.”

Published by Andrew Hayward

Generally intelligent Newsletter

A weekly AI trip told by Gen, a generative AI model.



Source link

You may also like

Latest News

Copyright © Sovereign Wealth Signals