Thinking that on prem models will be a halfway decent solution against what can be served out of a data center is a fools take... One that is more common than it should be on here...
The point is not to be as good as the multi-trillion parameter model you can host in across 72 GPUs (or whatever).
I'm running a 248B model on a paltry amount of hardware and getting plenty of good use out of it.
Sure, the most demanding tasks will demand the best models (and always will). There's still less demanding tasks for other models.
I think some people are fooling themselves that coding of all tasks is always going to requires the biggest models ever. Again, maybe some coding tasks will, but the majority of business CRUD apps probably don't. Same goes for virtually any other type of task. The biggest models are really only useful for the most complex tasks.
If you wouldn't mind, could you explain a bit what the 248B model is good for, and where it breaks down and you need something better? I hear this take often, but it is always a fleeting remark so I have no idea what the 'useful' looks like - at all.
To answer this and my sibling, it's DeepSeek V4 Flash at native FP4 quantization, on two Nvidia DGX Sparks. Which is a bit of kit but still paltry relative to the data centre. ~40 TPS generation, ~2000 TPS prompt processing, which makes it feel approximately as fast as typical APIs.
I primarily use it with my own harness for coding. I'm not going to say it will compete with Opus in the most challenging domains, because it won't, but I will say that there's a reasonable likelihood that Opus is used for tasks that a model like Flash could comfortably handle at 1/100th the cost.
So far I've only seen it struggle at tasks that I myself would struggle with. Tasks that I can describe the shape of the solution for, it has a high success rate at implementing.
Useful is going to be different for everyone. I'm not working on the hardest problems, I don't need the best models.
In my experience they require much more hand holding and more specific directions with less possibilities to interpret a command in several ways. You do the planning, keep on eye on that they're producing and they do the legwork. It's not that their knowledge of Java or PHP or what have you is lacking, it's the long horizon planning that you have to do yourself. Technically they're good. You just have to do more thinking and more reviewing yourself. YMMV.
Depending on quantization I figure they need at least a p4 and likely a p5 EC2 (or similar instance in another provider) for a model with that many parameters. Maybe they are hosting on bare metal but I imagine not. Those instance types (assuming not using spot) are quite expensive to run.
It’s perfectly reasonable to believe that a law of marginal decreasing returns will kick in at some point (if it hasn’t already), and that what one point looked like an exponential may start looking like an s-curve.
I do not see how being experienced in engineering, or having higher studies in computer science and economics should make that view less common.
Huh? I don't think there's much doubt out there that there are 3 top labs that are mostly at the same level - oAI, Anthropic & Goog (not necessarily in this order, depending on the month, but they've been trading SotA status on various verticals for a while now).
There's also 2-3 other trailing labs in MS, xAI and Meta. All of them are blundering behind, but at one point or the other they've been up there for some verticals as well.
I think this is good. Having one clear winner would be worse than this SotA of the week rotating thing they've got going on. For us as consumers anyway.
As it happens in large orgs, with mixed results. The biggest irony being the whole Transformer architecture being actually conceived at Google, only to be implemented as a product/service by another company.
This is relatively common historically. Two examples I can recall to mind without doing any research are Xerox/Apple and IBM/Oracle. I can only imagine there must be millions of other instances.
You only have to look at both the ridiculiously terrible "Q&A chatbot" that is in FaceBook under some posts (do they still have this?) and the fact that their system can't tell the difference between an inappropriate and a non-inappropriate comment most of the time to understand just how far behind Meta is in AI...
While not quite what you are saying here, I have found that I like usbc charging on my 15in m3 MacBook Air way more than its magsafe. The magsafe is always falling out as soon as I move the computer on and off my lap and is another cable that I can only use for one purpose. Nowhere near as good as it was back in the early 2010s macbook days...
They are saying they don’t own a MagSafe cable and have never experienced the above anecdote of a MacBook that was too dead to be charged by USBC alone. Likewise, my M1 MacBook Pro has only ever been charged by USBC and I let it die all the way often.
reply