Artificial Intelligence: How far is it?

fhdowntheline · 28th January 2025, 05:56

I think the stock markets are telling the true story. As to whether Deepseek is equal or better or "free" or "censored" is incidental-that's only for casual users. The fact that it can be deployed at a fraction of the investment cost that the US companies have is like a Sputnik moment as some are calling it.

Samurai · 28th January 2025, 07:37

The comments in one of the youtube videos talking about Deepseek is golden...

Artificial Intelligence: How far is it?-screenshot-20250128-073434.png

https://www.youtube.com/watch?v=Nl7aCUsWykg

AltoLXI · 28th January 2025, 08:47

Liang Wenfeng, founder of DeepSeek, started his company High-Flyer which was into quantitative trading. Once US imposed restrictions on the sale of AI chips to China, DeepSeek was born. Started in May 2023, it is rather impressive that within a span of 1.5 years, it became a game changer !

Quote:

In July 2024, Liang was interviewed again by 36Kr. He stated that when DeepSeek V2 was released and triggered an AI price war in China, it came as a huge surprise as the team did not expect pricing to be so sensitive. He also stated that as China's economy develops, it should gradually become a contributor instead of freeriding. What is lacking in China's innovation is not capital but a lack of confidence and knowledge on organizing talent into it. DeepSeek has not hired anyone particularly special and employees tend to be locally educated. When it comes to disruptive technologies, closed source approaches can only temporarily delay others in catching up.

Link

And the hiring practices in the company

Quote:

DeepSeek's hiring preferences target technical abilities rather than work experience, resulting in most new hires being either recent university graduates or developers whose A.I. careers are less established. Likewise, the company recruits individuals without any computer science background to help its technology understand other topics and knowledge areas, including being able to generate poetry and perform well on notoriously difficult Chinese college entrance exams

Link

volkman10 · 28th January 2025, 09:04

Biggest market loss in History: Nvidia stock sheds nearly $600 Billion as DeepSeek shakes AI favorite.

Artificial Intelligence: How far is it?-screenshot-20250128-090300.jpg

- The world’s 500 richest people, led by Nvidia Corp. co-founder Jensen Huang, lost a combined $108 billion, as a tech-led selloff tied to Chinese AI developer DeepSeek sent major indices plunging.

- Huang saw his fortune fall $20.1 billion, a 20% drop, while Oracle Corp. co-founder Larry Ellison’s $22.6 billion loss was larger in absolute terms, but represented just 12% of his fortune. Dell Inc.’s Michael Dell lost $13 billion, and Binance Holdings Ltd. co-founder Changpeng “CZ” Zhao shaved $12.1 billion.

-While Huang and Ellison suffered losses, other major tech billionaires’ fortunes escaped unscathed. Zuckerberg’s net worth ended the day up, gaining $4.3 billion as Meta rebounded from an early-session decline. Bezos’ wealth climbed by about $632 million.

-Shares of Nvidia plunged 17% by close, suffering its worst daily percentage loss since March 2020, when stocks briefly crashed at the start of the COVID-19 pandemic.

- The slide knocked Nvidia from its position as the world’s most valuable company, sending its valuation from $3.5 trillion to $2.9 trillion, less than Apple’s and Microsoft’s.

- A Nvidia spokesperson called DeepSeek’s model an “excellent AI advancement” which is “fully export control compliant” while still requiring “significant numbers” of Nvidia’s graphics processing units (GPUs).

Link:

Gsynch · 28th January 2025, 10:26

Quote:

Originally Posted by volkman10

- A Nvidia spokesperson called DeepSeek’s model an “excellent AI advancement” which is “fully export control compliant” while still requiring “significant numbers” of Nvidia’s graphics processing units (GPUs).

Deepseek r1/v3 needs 4 HGX H100 on BF16 and 2 HGX H100 with FP8*to*be*served. Additionally on the networking side, they have been using NVIDIA H/w stack.

This should be more bullish for NVDA than*bearish, but I guess markets being markets, emotions have taken over logic.

PreludeSH · 28th January 2025, 10:56

Not much time left for Nvidia too. There are others who are developing similar chips and will take couple of years to come to market. China itself might be developing few of them. They were already surprised with the 7nm silicon process and are pursuing EUV lithography for 4nm and lower.

Musk-Trump can use this to say, "See we need more skills in US to keep up" to support H1B visas

dragracer567 · 28th January 2025, 14:43

Quote:

Originally Posted by Gsynch

This should be more bullish for NVDA than*bearish, but I guess markets being markets, emotions have taken over logic.

Partly true but based on my understanding, the stocks for Nvidia & AMD have been oversubscribed for the past year (effectively making most Nvidia employees millionaires EACH). So, the way I see it, this has been course correction where it has ostensibly become evident that the US tech industry as a whole doesn't have the formidable lead that everyone thought it had or atleast it doesn't take much for China to catch up.

As far as the sentiment goes, if the Chinese can bring a new AI model into the market for a fraction of Silicon Valley's cost, then they might probably be able to do the same for chips as well.

Offcourse, those with common sense will realize that these are apple & oranges, making advanced chips from scratch is not the same as building an AI from an open-source model BECAUSE existing players already spend billions developing them. It's much easier (& cheaper) to build or adapt existing technology than building one from scratch that too at a much cheaper cost.

Thing is, based on my amateur observation, US stocks have a high level of retail investor participation, just like Indian stocks which is why we see idiosyncrasies where the stocks fell just because the ruling party lost a couple of seats in the general election, but actually won the election. As you rightly pointed out - emotion over logic

m8002? · 28th January 2025, 18:55

https://www.cnbc.com/2025/01/27/deep...strations.html

Quote:

DeepSeek on Monday said it would temporarily limit user registrations “due to large-scale malicious attacks” on its services, though existing users will be able to log in as usual.

The Chinese artificial intelligence startup has generated a lot of buzz in recent weeks as a fast-growing rival to OpenAI’s ChatGPT, Google’s Gemini and other leading AI tools.

The age if cyber warfare is here and now !!

ascension · 29th January 2025, 10:30

The gap between AI development and its adoption is widening. Chatbots are terrible as a replacement for Search Engines. They do not optimize for finding the right facts. An AI requires context and cognition both. If you ask the best beaches are in..., very likely that the LLM chatbot will give you a poor answer because it is not trained on your context.

We invented Machines. Logic gates, etc. Highly mechanical and very complex logic requiring specialized skills to even do simple operations. As operations became complex, we created languages to convert what we spoke into instructions that the machines could follow. Finally, with the advent of LLMs, machines are able to speak human. This just unlocks the power to change, build and influence things without having to learn machine. This results in exponential development in every single field.

These LLMs are also learning how to learn with pure Reinforcement Learning (RL) as seen with R1-Zero and through distillation, we are able to radically improve reasoning capabilities of smaller models.

I have created detailed learning plan for my kids, and myself, with automated reminders and dashboards using LLMs. Just crazy what a deep generalist can achieve with the help of AI. I have also seen companies automate their entire auditing process, billing processes, CRM, marketing and the like. The difference is that now the friction in learning how to operate has gone down significantly. Machine understands you.

Sharing a simplified version of what Deepseek R1 and R1-Zero are as a NotebookLM podcast. 100% of it is AI generated; my value add has been to provide the right context and the right level of abstraction - https://notebooklm.google.com/notebo...97939ae2/audio

rkv_2401 · 29th January 2025, 10:49

Quote:

Originally Posted by SmartCat

Did some testing on DeepSeek. It's way of answering questions (style/tone etc) is suspiciously similar to ChatGPT

I'm not surprised, because without significant fine tuning all models tend to be similar to each other when answering generic questions. They're all trained on similar sets of training data and more-or-less copy one another's architectures, so it's to be expected at this stage. OpenAI making a fuss about this when huge amounts of their training data also (allegedly) breaches copyright, breaches web scraping guidelines, etc. is hilarious to me though.

DeepSeek R1 seems to be a fascinating combination of serious engineering to get the model training and inference to be this efficient, as well as the trend of reasoner foundational models (o1, o3, R1) being a lot smaller in parameter size than their "chat" foundational counterparts (GPT4 non-O, Claude Opus, etc.). I haven't read their paper yet myself, but some summaries like this (https://x.com/wordgrammer/status/1883712727073607) and their price per token, even excluding discounts, being like 1/30th to 1/50th that of OpenAI's offerings indicates just how efficient this model is.

DeepSeek R1 is, to me, the most exciting moment since the launch of GPT4 or Claude 3 Opus. I tried the other model, DeepSeek v3, and it was good enough on the few questions I asked it. I got similar results to if I'd ask Sonnet 3.5. The difference is, I pay $32 AUD a month for Sonnet and get rate limited on it frequently anyway. This was free on the DeepSeek website, and looking at how cost effective the base model seems to be, would be offered for close to free on model aggregator websites that host their models outside of China as well.

Additionally, because of the lower parameter size of the base R1 model and the publicly available weights + published paper, many people have it hosted and running inference on a variety of computer hardware that are all cheaper than a single H100 card.

Exo Labs are running it on 8 M4 Macs, and this is already great to see it running on PCs, but is still a cost ineffective way of doing it - https://x.com/alexocheema/status/1884017524368720044

Sentdex (popular YouTuber w/ good Python tutorials) is running it in Q6_k quantization (not too much precision loss) on a 64-core Threadripper with 1TB of RAM, getting 3.5 tokens/second - https://x.com/Sentdex/status/1883596161778696247

This guy has a guide for a $6000 setup that can run R1 w/ Q8 quantization (close to full quality) on a server, getting 6-8 tokens/second - https://x.com/carrigmat/status/1884244369907278106

Even 6-8 tokens/second for a reasoner model isn't fast by any means. In complex cases, it might take 5-10+ mins just to list out it's chain-of-thought before providing an answer. But running a state-of-the-art model on a CPU server at home was unimaginable before. Partly because of the amount of RAM you'd need to run a 2 trillion parameter model without all these optimizations, but mostly because the weights and architecture weren't publicly available. Bringing down the inference costs this drastically means making it more accessible for the community, which means they'll further tune deployment options and make it more efficient - like Tinygrad did with the 7900XTX drivers (this was for training, though), all the llama.cpp improvements over the years, etc. It also provides opportunities for many more, smaller businesses to provide inference services.

And as a consumer. 1 million tokens is roughly 4000 pages of text (This is not exact, sometimes single special characters are tokens, etc. but an approximation). GPT4o currently costs $2.5 input per million tokens/4000 pages and $10 output per 4000 pages. I wouldn't use GPT 4o even if I was paid (nominally) to use it. I value my time more than having to interact with that abomination of a model, and have been vocal about my disdain for it since the time it was first released to all that hype. Maybe it's better now, but I don't feel it's worth the time to find out.

DeepSeek R1, which is an o1/o1-Pro competitor, arguably state-of-the-art in reasoning models and so 2 levels above GPT-4o (weak chat model) if you treat reasoning as one tier better than chat, costs $0.5 input per 4000 pages and $2.2 output per 4000 pages. This is one-fifth of the cost of GPT-4o.

I love to see OpenAI swallow a bitter pill and steep in their own hubris. Who wants to pay $200/speculated $2000 a month for individual, rate-limited access to O1-pro, when you can pay $6000 or slightly more and get access to R1 inference forever, better models as and when they come out,and keep your data within your company? Why are investors burning tens of billions ($5 billion loss in 2024, projected to touch $14 billion in losses in 2026) on OpenAI when open source is barely 2-3 months behind anyway? Does AI really need a $500 billion project Stargate, with OpenAI at the helm, to advance to the next step?

ChatGPT itself was an engineering feat (GPT3 was built by scaling the transformer architecture, training infrastructure and data to extremes that hadn't been seen before and ChatGPT was adding a useful web app as a wrapper around already existing APIs), so I have no doubt they can drastically cut down development costs if they want to. But someone had to bring OpenAI down to earth. Now that they're being chased, I feel the Americans will step their game up and build AI that's closer to what it's currently being hyped up to be.

SmartCat · 29th January 2025, 12:51

Quote:

Originally Posted by rkv_2401

I'm not surprised, because without significant fine tuning all models tend to be similar to each other when answering generic questions. They're all trained on similar sets of training data and more-or-less copy one another's architectures, so it's to be expected at this stage.

It's not just the generic questions. ChatGPT has an excellent sense of humor. It triggers when a user chats with it in a certain tone (humor/sarcasm/self-deprecation). DeepSeek has EXACTLY the same triggers and responds in a similar way. But not Gemini/Grok/Claude/Meta AI.

And looks like I'm not the only one who suspects a good old hack/espionage:

Microsoft probing if DeepSeek-linked group improperly obtained OpenAI data
https://www.deccanherald.com/busines...i-data-3377757

myavu · 29th January 2025, 13:09

I asked DeepSeek why India is behind in AI assistance technology and got some meaningful replies.

Quote:

India is not necessarily "behind" in AI assistance technology, but there are several factors that have influenced its pace of development compared to global leaders like the United States, China, and some European countries. Here are some key reasons:

Artificial Intelligence: How far is it?-screenshot_20250129_112439.jpg

Artificial Intelligence: How far is it?-screenshot_20250129_112510.jpg

Artificial Intelligence: How far is it?-screenshot_20250129_112538.jpg

gauravanekar · 29th January 2025, 14:17

Quote:

Originally Posted by ascension

Chatbots are terrible as a replacement for Search Engines. [/url]

I would urge you to try perplexity before generalizing.

Everlearner · 29th January 2025, 14:22

Got this as a whatsapp forward. Not sure of accuracy, but a very logical and easy to understand explanation for non techies.

Quote:

Finally had a chance to dig into DeepSeek’s …

*Let me break down why DeepSeek's AI innovations are blowing people's minds (and possibly threatening Nvidia's $2T market cap) in simple terms...*

First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It's like needing a whole power plant to run a factory.

*DeepSeek just showed up and said "LOL what if we did this for $5M instead?" And they didn't just talk - they actually DID it.* Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.

How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like "what if we just used 8? It's still accurate enough!" Boom - 75% less memory needed.

Then there's their "multi-token" system. Normal AI reads like a first-grader: "The... cat... sat..." DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you're processing billions of words, this MATTERS.

But here's the really clever bit: They built an "expert system." Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed.

Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It's like having a huge team but only calling in the experts you actually need for each task.

The results are mind-blowing:
- Training cost: $100M → $5M
- GPUs needed: 100,000 → 2,000
- API costs: 95% cheaper
- Can run on gaming GPUs instead of data center hardware

"But wait," you might say, "there must be a catch!" That's the wild part - it's all open source. Anyone can check their work. The code is public. The technical papers explain everything. It's not magic, just incredibly clever engineering.

*Why does this matter? Because it breaks the model of "only huge tech companies can play in AI." You don't need a billion-dollar data center anymore. A few good GPUs might do it.*

*For Nvidia, this is scary. Their entire business model is built on selling super expensive GPUs with 90% margins. If everyone can suddenly do AI with regular gaming GPUs... well, you see the problem.*

And here's the kicker: DeepSeek did this with a team of <200 people. Meanwhile, Meta has teams where the compensation alone exceeds DeepSeek's entire training budget... and their models aren't as good.

*This is a classic disruption story:* Incumbents optimize existing processes, while disruptors rethink the fundamental approach. DeepSeek asked "what if we just did this smarter instead of throwing more hardware at it?"

*The implications are huge:*
- AI development becomes more accessible
- Competition increases dramatically
- The "moats" of big tech companies look more like puddles
- Hardware requirements (and costs) plummet

Of course, giants like OpenAI and Anthropic won't stand still. They're probably already implementing these innovations. But the efficiency genie is out of the bottle - there's no going back to the "just throw more GPUs at it" approach.

Final thought: *This feels like one of those moments we'll look back on as an inflection point. Like when PCs made mainframes less relevant, or when cloud computing changed everything.*

*AI is about to become a lot more accessible, and a lot less expensive. The question isn't if this will disrupt the current players, but how fast?* One of the reason of market fall across the globe

Credit: Morgan Brown

NetfreakBombay · 29th January 2025, 16:00

Quote:

Originally Posted by Everlearner

Got this as a whatsapp forward. Not sure of accuracy, but a very logical and easy to understand explanation for non techies.

Like most WhatsApp university things, it can be ignored. Specifically :

expert system : Very common pattern. For example, Phi 3.5 MOE (Mix of Experts). This model is approx 100 GB , but can be run on 16 GB VRAM since only parts of model are active at a time . Details : https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
8 bit : This model (and almost all models in last 2 years) were released in 1 bit to 32 bit size : https://ollama.com/library/sqlcoder/tags . user can select a trade-off between quality and speed

28th January 2025, 05:56	#346
fhdowntheline Senior - BHPian Join Date: Sep 2019 Location: —- Posts: 2,767 Thanked: 8,652 Times	Re: Artificial Intelligence: How far is it? I think the stock markets are telling the true story. As to whether Deepseek is equal or better or "free" or "censored" is incidental-that's only for casual users. The fact that it can be deployed at a fraction of the investment cost that the US companies have is like a Sputnik moment as some are calling it.
	(3) Thanks

28th January 2025, 09:04	#349
volkman10 Distinguished - BHPian Join Date: Dec 2010 Location: -- Posts: 25,817 Thanked: 79,204 Times View My Garage	Re: Artificial Intelligence: How far is it? Biggest market loss in History: Nvidia stock sheds nearly $600 Billion as DeepSeek shakes AI favorite. - The world’s 500 richest people, led by Nvidia Corp. co-founder Jensen Huang, lost a combined $108 billion, as a tech-led selloff tied to Chinese AI developer DeepSeek sent major indices plunging. - Huang saw his fortune fall $20.1 billion, a 20% drop, while Oracle Corp. co-founder Larry Ellison’s $22.6 billion loss was larger in absolute terms, but represented just 12% of his fortune. Dell Inc.’s Michael Dell lost $13 billion, and Binance Holdings Ltd. co-founder Changpeng “CZ” Zhao shaved $12.1 billion. -While Huang and Ellison suffered losses, other major tech billionaires’ fortunes escaped unscathed. Zuckerberg’s net worth ended the day up, gaining $4.3 billion as Meta rebounded from an early-session decline. Bezos’ wealth climbed by about $632 million. -Shares of Nvidia plunged 17% by close, suffering its worst daily percentage loss since March 2020, when stocks briefly crashed at the start of the COVID-19 pandemic. - The slide knocked Nvidia from its position as the world’s most valuable company, sending its valuation from $3.5 trillion to $2.9 trillion, less than Apple’s and Microsoft’s. - A Nvidia spokesperson called DeepSeek’s model an “excellent AI advancement” which is “fully export control compliant” while still requiring “significant numbers” of Nvidia’s graphics processing units (GPUs). Link: Last edited by volkman10 : 28th January 2025 at 09:19.
	(8) Thanks

28th January 2025, 10:56	#351
PreludeSH Senior - BHPian Join Date: Oct 2021 Location: Bengaluru Posts: 1,012 Thanked: 2,281 Times	Re: Artificial Intelligence: How far is it? Not much time left for Nvidia too. There are others who are developing similar chips and will take couple of years to come to market. China itself might be developing few of them. They were already surprised with the 7nm silicon process and are pursuing EUV lithography for 4nm and lower. Musk-Trump can use this to say, "See we need more skills in US to keep up" to support H1B visas
	() Thanks

29th January 2025, 10:30	#354
ascension Newbie Join Date: Jan 2017 Location: Hyderabad Posts: 18 Thanked: 28 Times	Re: Artificial Intelligence: How far is it? The gap between AI development and its adoption is widening. Chatbots are terrible as a replacement for Search Engines. They do not optimize for finding the right facts. An AI requires context and cognition both. If you ask the best beaches are in..., very likely that the LLM chatbot will give you a poor answer because it is not trained on your context. We invented Machines. Logic gates, etc. Highly mechanical and very complex logic requiring specialized skills to even do simple operations. As operations became complex, we created languages to convert what we spoke into instructions that the machines could follow. Finally, with the advent of LLMs, machines are able to speak human. This just unlocks the power to change, build and influence things without having to learn machine. This results in exponential development in every single field. These LLMs are also learning how to learn with pure Reinforcement Learning (RL) as seen with R1-Zero and through distillation, we are able to radically improve reasoning capabilities of smaller models. I have created detailed learning plan for my kids, and myself, with automated reminders and dashboards using LLMs. Just crazy what a deep generalist can achieve with the help of AI. I have also seen companies automate their entire auditing process, billing processes, CRM, marketing and the like. The difference is that now the friction in learning how to operate has gone down significantly. Machine understands you. Sharing a simplified version of what Deepseek R1 and R1-Zero are as a NotebookLM podcast. 100% of it is AI generated; my value add has been to provide the right context and the right level of abstraction - https://notebooklm.google.com/notebo...97939ae2/audio
	(6) Thanks