Quote:
Originally Posted by SmartCat Did some testing on DeepSeek. It's way of answering questions (style/tone etc) is suspiciously similar to ChatGPT |
I'm not surprised, because without significant fine tuning all models tend to be similar to each other when answering generic questions. They're all trained on similar sets of training data and more-or-less copy one another's architectures, so it's to be expected at this stage. OpenAI making a fuss about this when huge amounts of their training data also (allegedly) breaches copyright, breaches web scraping guidelines, etc. is hilarious to me though.
DeepSeek R1 seems to be a fascinating combination of serious engineering to get the model training and inference to be this efficient, as well as the trend of reasoner foundational models (o1, o3, R1) being a lot smaller in parameter size than their "chat" foundational counterparts (GPT4 non-O, Claude Opus, etc.). I haven't read their paper yet myself, but some summaries like this (
https://x.com/wordgrammer/status/1883712727073607) and their price per token, even excluding discounts, being like 1/30th to 1/50th that of OpenAI's offerings indicates just how efficient this model is.
DeepSeek R1 is, to me, the most exciting moment since the launch of GPT4 or Claude 3 Opus. I tried the other model, DeepSeek v3, and it was good enough on the few questions I asked it. I got similar results to if I'd ask Sonnet 3.5. The difference is, I pay $32 AUD a month for Sonnet and get rate limited on it frequently anyway. This was free on the DeepSeek website, and looking at how cost effective the base model seems to be, would be offered for close to free on model aggregator websites that host their models outside of China as well.
Additionally, because of the lower parameter size of the base R1 model and the publicly available weights + published paper, many people have it hosted and running inference on a variety of computer hardware that are all cheaper than a single H100 card.
Exo Labs are running it on 8 M4 Macs, and this is already great to see it running on PCs, but is still a cost ineffective way of doing it -
https://x.com/alexocheema/status/1884017524368720044
Sentdex (popular YouTuber w/ good Python tutorials) is running it in Q6_k quantization (not too much precision loss) on a 64-core Threadripper with 1TB of RAM, getting 3.5 tokens/second -
https://x.com/Sentdex/status/1883596161778696247
This guy has a guide for a $6000 setup that can run R1 w/ Q8 quantization (close to full quality) on a server, getting 6-8 tokens/second -
https://x.com/carrigmat/status/1884244369907278106
Even 6-8 tokens/second for a reasoner model isn't fast by any means. In complex cases, it might take 5-10+ mins just to list out it's chain-of-thought before providing an answer. But running a state-of-the-art model on a CPU server at home was unimaginable before. Partly because of the amount of RAM you'd need to run a 2 trillion parameter model without all these optimizations, but mostly because the weights and architecture weren't publicly available. Bringing down the inference costs this drastically means making it more accessible for the community, which means they'll further tune deployment options and make it more efficient - like Tinygrad did with the 7900XTX drivers (this was for training, though), all the llama.cpp improvements over the years, etc. It also provides opportunities for many more, smaller businesses to provide inference services.
And as a consumer. 1 million tokens is roughly 4000 pages of text (This is not exact, sometimes single special characters are tokens, etc. but an approximation). GPT4o currently costs $2.5 input per million tokens/4000 pages and $10 output per 4000 pages. I wouldn't use GPT 4o even if I was paid (nominally) to use it. I value my time more than having to interact with that abomination of a model, and have been vocal about my disdain for it since the time it was first released to all that hype. Maybe it's better now, but I don't feel it's worth the time to find out.
DeepSeek R1, which is an o1/o1-Pro competitor, arguably state-of-the-art in reasoning models and so 2 levels above GPT-4o (weak chat model) if you treat reasoning as one tier better than chat, costs $0.5 input per 4000 pages and $2.2 output per 4000 pages.
This is one-fifth of the cost of GPT-4o.
I love to see OpenAI swallow a bitter pill and steep in their own hubris. Who wants to pay $200/speculated $2000 a month for individual, rate-limited access to O1-pro, when you can pay $6000 or slightly more and get access to R1 inference
forever, better models as and when they come out,and keep your data within your company? Why are investors burning tens of billions ($5 billion loss in 2024, projected to touch $14 billion in losses in 2026) on OpenAI when open source is barely 2-3 months behind anyway? Does AI really need a $500 billion project Stargate, with OpenAI at the helm, to advance to the next step?
ChatGPT itself was an engineering feat (GPT3 was built by scaling the transformer architecture, training infrastructure and data to extremes that hadn't been seen before and ChatGPT was adding a useful web app as a wrapper around already existing APIs), so I have no doubt they can drastically cut down development costs if they want to. But someone had to bring OpenAI down to earth. Now that they're being chased, I feel the Americans will step their game up and build AI that's closer to what it's currently being hyped up to be.