More

dansquizsoft · 2026-06-13T04:16:28 1781324188

Hard agree

dansquizsoft · 2026-06-13T01:59:40 1781315980

Thinking that on prem models will be a halfway decent solution against what can be served out of a data center is a fools take... One that is more common than it should be on here...

wolttam · 2026-06-13T02:09:09 1781316549

The point is not to be as good as the multi-trillion parameter model you can host in across 72 GPUs (or whatever).

I'm running a 248B model on a paltry amount of hardware and getting plenty of good use out of it.

Sure, the most demanding tasks will demand the best models (and always will). There's still less demanding tasks for other models.

I think some people are fooling themselves that coding of all tasks is always going to requires the biggest models ever. Again, maybe some coding tasks will, but the majority of business CRUD apps probably don't. Same goes for virtually any other type of task. The biggest models are really only useful for the most complex tasks.

sgc · 2026-06-13T03:18:19 1781320699

If you wouldn't mind, could you explain a bit what the 248B model is good for, and where it breaks down and you need something better? I hear this take often, but it is always a fleeting remark so I have no idea what the 'useful' looks like - at all.

wolttam · 2026-06-13T04:37:41 1781325461

To answer this and my sibling, it's DeepSeek V4 Flash at native FP4 quantization, on two Nvidia DGX Sparks. Which is a bit of kit but still paltry relative to the data centre. ~40 TPS generation, ~2000 TPS prompt processing, which makes it feel approximately as fast as typical APIs.

I primarily use it with my own harness for coding. I'm not going to say it will compete with Opus in the most challenging domains, because it won't, but I will say that there's a reasonable likelihood that Opus is used for tasks that a model like Flash could comfortably handle at 1/100th the cost.

So far I've only seen it struggle at tasks that I myself would struggle with. Tasks that I can describe the shape of the solution for, it has a high success rate at implementing.

Useful is going to be different for everyone. I'm not working on the hardest problems, I don't need the best models.

ihateolives · 2026-06-13T05:57:08 1781330228

In my experience they require much more hand holding and more specific directions with less possibilities to interpret a command in several ways. You do the planning, keep on eye on that they're producing and they do the legwork. It's not that their knowledge of Java or PHP or what have you is lacking, it's the long horizon planning that you have to do yourself. Technically they're good. You just have to do more thinking and more reviewing yourself. YMMV.

rhipitr · 2026-06-13T03:32:34 1781321554

Depending on quantization I figure they need at least a p4 and likely a p5 EC2 (or similar instance in another provider) for a model with that many parameters. Maybe they are hosting on bare metal but I imagine not. Those instance types (assuming not using spot) are quite expensive to run.

upbeat_general · 2026-06-13T02:34:51 1781318091

If we’re defining on-prem as fitting in a rack - then every frontier model can be hosted on-prem.

Now this might not be the most cost effective (and may require a bit extra power), but you only need a datacenter for training or cost optimization.

aerhardt · 2026-06-13T12:10:40 1781352640

It’s perfectly reasonable to believe that a law of marginal decreasing returns will kick in at some point (if it hasn’t already), and that what one point looked like an exponential may start looking like an s-curve.

I do not see how being experienced in engineering, or having higher studies in computer science and economics should make that view less common.

johndough · 2026-06-13T08:17:12 1781338632

The recent MiMo-V2.5-Pro-UltraSpeed can be served from 8 GPUs, which is certainly within the reach of sophisticated on-prem setups. https://mimo.xiaomi.com/blog/mimo-tilert-1000tps

dansquizsoft · 2026-06-09T07:40:03 1780990803

2 top labs...

NitpickLawyer · 2026-06-09T10:15:44 1781000144

Huh? I don't think there's much doubt out there that there are 3 top labs that are mostly at the same level - oAI, Anthropic & Goog (not necessarily in this order, depending on the month, but they've been trading SotA status on various verticals for a while now).

There's also 2-3 other trailing labs in MS, xAI and Meta. All of them are blundering behind, but at one point or the other they've been up there for some verticals as well.

I think this is good. Having one clear winner would be worse than this SotA of the week rotating thing they've got going on. For us as consumers anyway.

dansquizsoft · 2026-06-09T13:44:36 1781012676

> Huh? I don't think there's much doubt ...

I am doubting. I will be very surprised if Google ends up top or second place (again?) at any point in the next few years.

> I think this is good. Having one clear winner would be worse than this ...

I agree that it would be better to have 3+ top labs as well.

dansquizsoft · 2026-06-09T07:38:44 1780990724

> owns it's own models, hardware, large corpus of data, talent and network effects

How's that talent been working out for them the last few years?

benterix · 2026-06-09T07:48:39 1780991319

As it happens in large orgs, with mixed results. The biggest irony being the whole Transformer architecture being actually conceived at Google, only to be implemented as a product/service by another company.

icepush · 2026-06-09T10:21:12 1781000472

This is relatively common historically. Two examples I can recall to mind without doing any research are Xerox/Apple and IBM/Oracle. I can only imagine there must be millions of other instances.

dansquizsoft · 2026-06-09T06:30:04 1780986604

"I got the top tier of Gemini AI." - that's your issue, get a subscription to a lab offering actual frontier models like Anthropic and OpenAI.

Mistletoe · 2026-06-09T06:45:26 1780987526

Nonsense.

https://livebench.ai/#/?highunseenbias=true

tmoravec · 2026-06-09T10:22:15 1781000535

Model != AI tool. Especially with Gemini, the model is fantastic, both in benchmarks and in the API. Sentiment is not that positive for Google AI.

dansquizsoft · 2026-06-06T22:07:50 1780783670

You only have to look at both the ridiculiously terrible "Q&A chatbot" that is in FaceBook under some posts (do they still have this?) and the fact that their system can't tell the difference between an inappropriate and a non-inappropriate comment most of the time to understand just how far behind Meta is in AI...

dansquizsoft · 2026-06-02T04:27:35 1780374455

It can be everywhere, but that doesn't mean users are paying or even value it.

brookst · 2026-06-02T04:58:48 1780376328

See also: Windows / Notepad / M365 / GitHub / Paint / Xbox / Azure / Solitaire / D365 / Security Copilot.

dansquizsoft · 2026-06-01T23:19:40 1780355980

While not quite what you are saying here, I have found that I like usbc charging on my 15in m3 MacBook Air way more than its magsafe. The magsafe is always falling out as soon as I move the computer on and off my lap and is another cable that I can only use for one purpose. Nowhere near as good as it was back in the early 2010s macbook days...

jazzyjackson · 2026-06-02T04:07:51 1780373271

They are saying they don’t own a MagSafe cable and have never experienced the above anecdote of a MacBook that was too dead to be charged by USBC alone. Likewise, my M1 MacBook Pro has only ever been charged by USBC and I let it die all the way often.

dansquizsoft · 2026-05-31T12:31:04 1780230664

I didn't know you could block class actions just by stating it in the TOS for a product - thanks for the tip!

dansquizsoft · 2026-05-25T05:30:06 1779687006

People have been trying to replace Excel for the last 40 years...