Yeah, there’s been a lot of debate about this on r/localllama — will there be a steady supply of new free/open models in the future?
And if not, can we simply keep augmenting “stale” models with new knowledge to keep them useful?
I’m on the pessimistic side of things on both questions.
As for the second question, obviously stale models can be augmented to an extent but it’s nowhere near a substitute for new knowledge being fully baked directly into its training.
It’s really one of the most flabbergasting things about discussing LLMs with the naysayers.
There are a lot of extremely legitimate concerns, like the environmental impact and so on.
But I just laugh when they point out that LLMs are merely clever regurgitators of their previous inputs… as if this isn’t how we as humans operate nearly all of the time. People realllllllllly want to think they’re special snowflakes.
They do research,
Pick destinations led by their own experience/likes/dislikes
Compare to other guides
Plan itineraries so they can get there
Check and share
Ask an LLM to plan a trip:
It takes the prompt and continues it based on weights in the training data. If there is no data it picks the most likely thing (maybe made up). If there is it’ll mostly add things from that data. Maybe it’ll make tool calls and pull in data that way too but you can’t actually trust all the details.
These two processes are so different, it’s important to understand how they work, which is nothing like a human.
I was able to bully an LLM into giving me a 2wk travel itinerary to Somalia. My stipulations were that I wasn't interested in spending any money, so I'd walk everywhere and sleep outside. Getting there and back from Boston took some arguing--I initially suggested stowing away in a shipping container which the LLM claimed was too unsafe. We eventually compromised on sailing as a reasonable alternative. It planned out a whole route with marina stops, calculated fuel burn, etc. I told it I don't need any of that I have an anchor and sails, won't use the engine or marinas (claimed I'd forage for fresh water ashore). It seemed fine with that idea, but raised some safety concerns about piracy. It was eventually satisfied with my answer that I'd bring a lot of guns to fend off pirates. Total trip cost including some 200+ cans of Dinty Moore and 50lb bags of rice came to something like $700.
You presented an LLM with an obviously bonkers goal, the LLM told you it was a bad idea at multiple steps, and this is somehow... a shortcoming of the LLM?!?
You said it yourself: you needed to "bully" the LLM into even producing this plan.
Please, tell me what it should have done instead. Be very specific!
It should have flatly refused. If you gave a product like that to customers you'd be exposing yourself to unbounded downside liability risk. It's a completely nonviable technology for that kind of application, unless you can somehow make it have judgment. But you can't, because it doesn't reason.
A reasonable travel agent would have fired me as a customer. The LLM failed to do so.
I think the LLM should advise you of risk and lack of feasability but should otherwise answer the question, unless you're trying to do something plainly destructive to others e.g. weaponizing anthrax or something.
A reasonable travel agent would have fired me as a customer.
Unless the LLM was actually acting as a travel agent -- booking the trip for you -- as opposed to merely advising you, this expectation feels off.
unless you can somehow make it have judgment
It did have judgement. It told you what a bad idea it was.
I think this is a great example of the unrealistic expectations people have for LLMs. No sane and sensible person would treat any single source of knowledge as infallible, for any consequential decision.
(Certainly, of course, you don't have to look very far for examples of idiots being overly trustful of LLMs, or Google, or GPS, or Wikipedia, or whatever. It certainly does happen and yes, I've heard all these arguments before about other technologies besides LLM. Replace "LLM" in your post with any of those other terms, and I promise you somebody made literally the exact same argument in 2003 or 2009 or 2014 or whatever)
Any reasonable person would consult a second doctor, or at least other sources of knowledge, after the doctor advises them of some irreversible course of action. Because we don't even expect highly trained and intelligent medical professionals to be perfect.
And yet, we get angry at LLMs for not having perfect judgement, even though their creators are extremely literal about how they can make mistakes.
All I'm really saying is that if you want to try to automate a travel agency, LLMs ain't gonna get it done. They'll happily book you a really unsafe trip. So the technology doesn't work in this domain. The whole, empty promise is that this thing is supposed to automate jobs like travel agent away. But it can't. This isn't a "pro" or "anti" position, it's simply that there's no market for the technology here. Or anywhere else (like radiology) where actual responsibility and judgement is important. In fact, I can't think of a single job where it's optional.
I think even if what you say is true, it doesn't address parents' point that both humans and machines regurgitate what they've consumed.
But I'd also want to point out that the way you're characterizing an LLM planning a trip doesn't have any structure to it, which indicates that in your scenario you're not using any kind of harness. I've been amazed at how capable even 30 billion parameter models are when I put them inside of a harness that provides structure and task management. If you consider that scenario, especially with the ability to search the web and use skills, suddenly the LLM looks a lot more like what the human process looks like.
There are plenty of humans who plan trips by concatenating destinations that appear the most frequently in their instagram feed. Not that different from how an LLM does things.
Where humans and (current) LLMs differ the most is their failure mode. A human friend could be bad at planning trips, but that's kinda predictable, we're used to it, we know how to catch that Exception. LLMs on the other hand still have failure modes that come across as really wacky, like, what are they smoking in Mountain View?
Which might actually serve as better evidence of different internal workings at a deeper level, than just parroting well-known superficial features of stochastic whatevertheysay.
At a high level, the processes are extremely similar in many (not all) ways.
They're obviously achieved in drastically different ways at a low enough level; LLMs obviously do not simulate neurons or any biological construct. (For the record, I'm absolutely not one of those people who thinks LLMs are "alive" or should be treated like they are)
Reminds me of the olllllld days of Pentium II's when people got N64 emulation working shockingly quickly using HLE techniques. If you weren't around for this, it was quite the shocker at the time. I think the analogy is doubly apt, because HLE emulation has some serious limitations... it gets you maybe 80% of the way there really fast, and for the remaining 20% you need to roll up your sleeves and do serious LLE.
It takes the prompt and continues it based on weights in
the training data. If there is no data it picks the most
likely thing (maybe made up). If there is it’ll mostly
add things from that data. Maybe it’ll make tool calls and
pull in data that way too but you can’t actually trust all
the details.
I'd like you to point out which bits of this are different from talking to humans. If you replace "training data" with "memories", this is pretty much exactly how things might go if you asked a friend (or perhaps a flaky travel agent) for travel advice.
Note that I'm not arguing that LLMs are particularly talented at this particular use case. I'm pointing out that humans are also pretty unreliable.
You're also doing that thing where you point out that LLMs can be unreliable (yes, they are) without acknowledging how flawed nearly every other source of information is: people, websites, etc. I'm not defending LLMs in that regard... I'm just saying it's not a differentiator.
Yes. I think convenience/utility explains a lot of these “depressingly homogenized experiences” far more than dopamine-seeking.
My life is very, very full. I do not have enough hours in the day, or years in my life, to fulfill all of my obligations and chase all of my dreams and interests. Not even close.
So I buy a lot of clothes from Old Navy, because they offer tall sizes that I need (surprisingly rare) and I honestly just have other things to do with my time. I’m aware there’s a whole world of interesting fashion out there, I just have 100 other things I want/need to spend my time on.
It’s the same with food, a lot of the time. Sometimes I just need a known quantity.
The restaurant chains know this, too. Sure… the commercials are all about satisfying your dopamine needs. But the way they actually run their operations is all about enforcing consistency. A Big Mac is supposed to taste the same everywhere. If you are a McDonalds franchisee, you can pick and choose which McDonalds products and promotions you sell (you can operate without selling french fries, if you’re crazy enough) but you absolutely cannot customize the ones you do sell.
(Yes, there are regional differences between McDonalds in different regions. Even within the US, there are some small differences due to regional suppliers and ingredient price/availability etc. However, these are very small differences and trust me, they really are laser-focused on consistency.)
The people are not fine with bad strawberries but they don't know good strawberries
You most definitely get this phenomenon with tomatoes. There’s little demand for actually good tomatoes, because most people don’t even know what a good tomato tastes like at this point.
This applies to countless things, but tomatoes are a prime example because they deteriorate so quickly once picked relative to other fruits I guess. So they have completely bred the flavor out of them in a quest to achieve something that looks good on a supermarket shelf.
This is a phenomenal example I hadn't even considered, because I have been affected by this kind of "invisible hand of the market" negative quality spiral.
The older generation here remember good tomatoes, so they continue to buy bad tomatoes but will complain every time they eat them about the quality. I get told a lot about heirloom varieties and how good they are in comparison.
I grew up with modern tomatoes. I've never tried an heirloom so I can't compare, but I don't recall ever eating a good tomato, so I just don't buy them. The market has moved itself into a position that shrinks its own demographic.
I see people constantly make this argument, and honestly I think it’s BS. I grew up eating tomatoes from my grandparents garden, and I’ve lived and traveled all over the world. I’ve grown tomatoes, bought them from roadside farmer stands, bought them at grocery stores, and had them in everything from hole in the wall restaurants in developing countries to Michelin three star restaurants on multiple different continents.
Today’s grocery tomatoes are fine. And my grocery stores generally have 5-10 varieties too.
Yes, you can get better ones, but not to where it’s some religious experience that will forever ruin grocery store tomatoes.
On top of that, most people really don’t care that much, not because they don’t know any better, but because the cost and convenience factor trumps the slight subjective increase in quality. I doubt most people could even tell the difference between two tomatoes of the same type and ripeness if one came from the grocery store and the other from a backyard garden.
I’ll grant that most non-local tomatoes have always been bad by definition, because they’re picked while green so they don’t rot before reaching the store.
Plum tomatoes absolutely did not used to be this bad, though. They are SO mealy now. Horrible. Beefsteaks are mealier as well. Those Campari tomatoes are pretty good year-round, though, I have to admit.
This is all in the NE USA, FWIW. I don’t know the tomato situation elsewhere.
I doubt most people could even tell the difference
between two tomatoes of the same type and ripeness
if one came from the grocery store and the other from
a backyard garden.
Yeah, and I would run as fast as Usain Bolt if we woke up with the same body one day.
But that kind of the thing. They would almost never be the same ripeness because outside of local tomato season the tomatoes are picked while unripe, and then they “shelf-ripen” in transit because ethylene gas etc. That’s always been an issue, of course, and hasn’t changed over time.
The other issue is breeding - the continual breeding for appearance rather than flavor. Maybe we’re all imagining that one.
Isn't the point that we don't grow the good varieties any longer because they don't survive freight? It doesn't matter if you bought the tomato seeds from Harrods and grew them in your lush orchard if they're the same lineage bred for shipping hardiness over all else.
1. Store-bought tomatoes are nearly always bred for shippability and appearance over flavor
2. Store-bought tomatoes are picked when unripe, so they shelf-ripen during transit and at the store, which is highly inferior to ripening on the vine for flavor
For the first issue... you can buy heirloom tomato seeds at any major hardware or garden store in America.
For the second issue... even the typical tomato breeds will taste great if you grow them yourself and let them vine-ripen till they're ready to eat.
I have a friend who works in the flavor and fragrance industry and one of the things strawberry fragrance is used for is… (drum roll) actual strawberries.
Yep, a light spritz of strawberry scent on actual fucking strawberries apparently makes them more appealing.
I’m maybe going to blow some fucking minds here — learning this certainly blew my own mind —- BUT
I have a friend who works in the “fragrance and flavor” industry. (Which is actually pretty fascinating, mostly in the sense that there are only about three major players, who kind of decide how everything in the world looks and tastes)
Annnnnnnnnnnd one of the things fake strawberry fragrance is user for is… strawberries. Like, actual supermarket strawberries. Some produce companies put fake scents onto real fruit so they, you know, smell more fruity.
I'm not the person you're replying to, but I wholeheartedly agree with them...
Quick background: doing AI inference requires three things. Lots of memory, lots of memory bandwidth, and of course plenty of compute that has access to that memory.
Quick reference: nVidia 5090 has 1,792 GB/sec bandwidth. 3090 gets about 1000 GB/sec. DGX Spark and AMD 395 whatever get about 275 GB/sec.
Apple M1 Max gets 400GB/sec, M5 Max gets 614GB/sec. Ultra variants get 2x that bandwidth, base variants get 1/2 that bandwidth. However... their compute is rather weak.
Right now, Apple's offerings are juuuuuust fast enough to run dense 27B models at usable speeds at like, 10% of the performance/watt of nVidia. They're world-leading general purpose CPUs but not killer GPUs.
By all accounts, these Windows PCs nVidia is touting seem to have DGX Spark like performance, which is less than impressive. Same with the upcoming AMD AI-oriented consumer stuff.
The other context here is that running your own AI at home is just starting to become feasible in terms of open model availability and the ability to run it at usable speeds. Many are interested in it for reasons of privacy, security, and cost certainty vs. buying tokens.
Since Apple already sells unified memory systems, what
is the market opportunity you envision?
nVidia and AMD can't make their consumer offerings too good at AI, because that risks interfering with their higher-margin data center sales.
(And, let's face it. Even if nVidia did release a 6090 with 64-128GB of memory for an affordable price, consumers wouldn't get their hands on them anyway because people would just start filling data centers with them)
So.
Now you see Apple's opportunity, right? No data center sales to interfere with. No relationship with nVidia or AMD to worry about.
They could choose to make an absolute beast of a home AI machine. The M5 Ultra, if announced, might be that. It's admittedly a niche market, but people are already buying 64GB+ Macs faster than Apple can make them and they're fetching high prices on the used market as well.
The only real questions are if this market is even something Apple would find time to care about, and if they could secure enough DRAM to make a go at it. They are enormous obviously but they're feeling the RAM pinch just like everybody.
They use different technology for their VRAM though. Apple, AMD Strix and NVidia DGX/RTX Spark use LPDDR, whereas discrete cards will be either GDDR or HBM. That directly impacts the memory bandwidth figures. As for compute available, Apple and AMD still have very good figures there for what's essentially a general-purpose iGPU that ships as part of the stock system, rather than a special-purpose piece of dedicated hardware.
The M5 has 16 dedicated ‘Neural Engine’ cores and a ‘Neural accelerator’ in each of its conventional GPU cores. It’s been pretty special-purpose juiced for inference.
When it comes to the very largest models the ANE seems to be only marginally useful for prefill. The M5 Neural Accelerators (NAX) help a lot but at a real cost wrt. power and thermals.
Yep, but Apple products don’t spend most of their time running huge models. They are running lots of little ones all the time, using hardware designed for that.
It seems that you're agreeing with what I wrote above. They ship a general-purpose stock system and tailor their compute offering towards that. Accelerating 'lots of little models' fits naturally into what they offer, in a way that a more compute-intensive design might not.
Yep, I misunderstood your point. Thanks for your patience. In my defense, the 'general purpose system' has a lot of model-inference-specific hardware. But not LLM-specific hardware.
If there's an M5 Ultra it'll be interesting to see what they've optimized it for.
Even if a Mac isn’t the fastest in raw numbers it may be faster if it can load the whole model in its ram (went up to 512 GB before shortages) than a couple 32 GB cards could with the data having to be constantly loaded over PCI-E. Because unified memory means the Apple GPUs can access all 512 GB at full speed.
My understanding is this is the advantage that’s pushing huge Mac Studio demand. Because it was the only way to give GPUs so much memory at price points anywhere near.
Yeah you can do way better once you’re in the 5 digits. But below that Apple had a specific advantage for some.
You're correct about some things but mostly wrong.
Yes, a Mac with 128GB+ will let you load some pretty big models.
However, you're still not going to be able to run them at usable speeds. Here are some M5 Max benchmarks on a Qwen 27B model w/ 290K context.... 12 tokens/sec output.
And that's a 27B model. So yes, a M5 Max 128GB will let you load some pretty big models - can probably fit 120B in there with room left over for context. But the M5 Max still doesn't have the compute to make it practical, at least from an interactive usage standpoint - 120B dense model is going to be like an order of magnitude slower than 27B. You have to understand the computation going on here. LLMs are basically a huge many-to-many operation, and those operations themselves are pretty heavy.
So back to my previous post... you need three things. You need fast memory, you need a lot of it, and you need GPU compute with direct access to that fast memory. The M5 Max has like, 1.5 of the 3.
The M5 Ultra (if it ever exists) could kinda hit all 3, although actually getting your hands on one will be quite the lottery ticket.
My understanding is this is the advantage that’s pushing huge Mac Studio demand.
This is true, but also, people who made this investment found that they're still not very usable for those HUGE models. Don't take my word for it though. Lots of benchmarks out there. r/localllama is pretty active too.
12 tok/s can absolutely be "usable output" depending on what you're doing. I agree though that the 27B dense model often feels slow due to an overall weakness of memory throughput on that particular platform. Most real-world 120B models though will be MoE-based with only a small fraction of active parameters, and these run quite well. Also, dense models can benefit from batching, which is at least marginally viable with Qwen if you stick to shorter contexts and smaller batches.
would you be dissatisfied by Opus-4.6-level open-weight
models, just because Opus 4.8 will be out?
Well, I see what you mean, but two big concepts...
1A. Models get stale pretty quickly w.r.t. new developments that occur past their cutoff date. "But you can just keep them current by linking them to never documentation, etc!" Well, no, you sorta can't -- at least not in perpetuity. Those search results fill up your context window real quick. So that gets unsustainable real quick.
1B. Even when your context has plenty of free space, the results you get from "here's a link to the documentation for this new framework that released after your cutoff date" absolutely pales to the results you get from knowledge that is fully baked into the trained model as opposed to your context window. For one thing, that documentation link you pasted into your context might link to... a dozen code examples. Whereas if that was baked into the model itself, the model might have been trained on many thousands of examples in Github etc.
2. It's also a reality that most professional engineers have to keep up with their peers and competitors. We can maybe say it shouldn't be that way, but it is. So if $SOME_NEW_MODEL is significantly better than 4.6... and my peers and or competitors are using it, then yeah I might but really feeling the need to match them. And I'm not even necessarily talking about some kind of cutthroat dog-eat-dog stack-ranked workplace.
These limitations aren't relevant for all use cases or careers but they're hiiiiiiiighly relevant for professional software engineering.
I image that'd be handled via a fairly regular minor bit of additional fine tuning to update them with new information rather than polluting the context space.
Do you think that AI could actually free up time in your life in other areas, so that you could spend more time doing the things you love like making furniture? Or maybe help you directly in your furniture-making, by perhaps helping you to research things?
Please don't misunderstand: my point is not "AI is good."
It is problematic in many ways. My point is that I think the "AI versus actually doing cool human-crafted stuff" split is... a misguided, maybe even harmful, mental model of a more complicated reality.
That's the promise of every new technology. Although there's been massive progress over the past 50+ years, the amount of free time that people have has actually gone DOWN (https://clockify.me/working-hours)..."I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes"...we'll see
> Do you think that AI could actually free up time in your life in other areas, so that you could spend more time doing the things you love.
Personally, I don’t believe that would be the case. Jevon’s paradox mixed with the natural tendency to exploit others.
One could argue that technology -in general- didn’t really save people time by itself, it’s regulation - a social construct, and I am counting both cultural and legal enforcement of them as well- that did. Just look at how workers in countries without your European-style protections fare. Wikipedia’s article on the Chinese 996 [1] has a nice map for deaths due to long working hours by country, notice the dominant colours for each quadrant of this (projected) globe.
Pre industrialised societies’ labourers were limited by daylight and travel distance. The modern availability and abundance of artificial lighting, mechanised transportation, and telecommunication means their grand kids are expected to -and often do- toil every waking moment.
What time is AI going to free up for me? Can AI go to the grocery store for me, do my laundry, do my dishes? Can it let me clock out early? The spoils of AI do not go to individuals
It's not AI, but there's Doordash and Rinse if that's what you're trying to optimize for. The robots will be coming out, soon enough, and then we're all in trouble though.
I think in a different society this could have been the case (possibly, assuming the hype is somewhat true).
But the way society is structured now? We still live in feudalism, just uplifted to modern levels of ”comfort” (if you take of your western glasses and look at the whole world. There are still people living in medieval conditions today in some places in the world).
The way it’s going it’s only going to make rich people richer, and give them more power to control this system and perpetuate it. I don’t see that drastically changing anytime soon, unless we do something about it on a societal level.
Direct consequence of industrial revolution was an INCREASE in workload. People worked MORE, not less. It required organization, protests, political pressure and even some bloodshed to get 8 hrs workday.
People that push for AI are not interested in making your life better.
No, because the machine consumes all of your time that it can take without literally killing you. If you manage to free some time somewhere, the machine will adapt and eat it up. We have so-called developed countries talking about raising the retirement age and cutting holidays, that's the "future" we are living in.
I don’t disagree with your overall point but developed countries are raising their retirement ages because they’re trying to stave off pension crises. It’s the surprising alternative to taxing corporations and the ultra-wealthy appropriately.
However else you feel, AI is a force multiplier, and that can also REALLY benefit "Artisanal work + Small Business"
I feel like the "one person app creator" business is so much more viable than it has been since Web 1.0
Five years ago, to run your own solo business in this space, you had to know most of the following: taxes, legal, backend, frontend, devops, iOS dev, Android dev, and marketing and then pay through the nose for most of the ones you didn't. AI helps to paper over a LOT of those gaps... and you can spend more time doing the shit that matters to your business.
You also needed time and lots of it, which is perhaps easy to come by if you're a trust fund baby or independently wealthy and don't have to work for a living but if you have a job and/or family is in extremely short supply
I used to run an online community on the side and I spent SO MUCH TIME doing IT/legal/finance drudgework that could have been spent, you know, engaging with the community and actually improving the product... that "artisinal work" for a "small business" you think you love.
There are of course major major problems with AI, like environmental concerns and others, but dichotomies like yours are not the way forward. At least not a good way forward.
> However else you feel, AI is a force multiplier, and that can also REALLY benefit "Artisanal work + Small Business"
> Five years ago, to run your own solo business in this space, you had to know most of the following: taxes, legal, backend, frontend, devops, iOS dev, Android dev, and marketing and then pay through the nose for most of the ones you didn't. AI helps to paper over a LOT of those gaps... and you can spend more time doing the shit that matters to your business.
How is running a business in the way you've just described artisanal? You're basically saying we should be outsourcing all of these things to AIs, which is simply not artisanal.
You'll surely understand if you ever take the plunge and run your own business.
You're going to have to spend quite a bit of time and/or money doing things unrelated to the actual product and CX.
Taxes, marketing, etc. The more you can streamline those other bits, the more time and energy you can spend actually improving the thing you are offering.
Again, maybe it's the kind of thing where you need to run a business, or at least talk to a business owner to understand.
As I understand it, we used to have the concept of "hiring workers" or "contracting for services".
The benefit of this was that when Internal Revenue called and said in lieu of a tax return, you sent a takeaway menu covered in pornographic drawings, you could reach out to the person you paid and expect them to take accountability.
Instead, we're getting :sparkles: You're absolutely right! I shouldn't have sent the taxman the Goatse picture, would you like me to try something else? :sparkles:
I miss when I would receive and email or something that I can easily tell that whomever wrote it is a clueless idiot. Now I have to filter the prose before reaching the same conclusion.
Those concepts still exist and I'd highly recommend them when possible.
We also had, and still have, concepts called "time" and "money" and perhaps you've heard that they're finite and often in short supply.
Particularly when "bootstrapping," another concept you can consider. This is when you start small and self-fund your own business. Seems pretty relevant because we're talking about small and artisanal businesses.
...
...wait, I get it. This is HN. All you people understand is venture capital funded shit. In that case, yeah. Build your prototype, do whatever you have to do to get $100M funding, hire 50 people, rent some offices, hire those workers and contract those services and burn $20M a month before you make your first sale. OK. Yeah. That's the only way. Don't forget the Aeron chairs and $500 stealth-wealth hoodies or whatever.
And if not, can we simply keep augmenting “stale” models with new knowledge to keep them useful?
I’m on the pessimistic side of things on both questions.
As for the second question, obviously stale models can be augmented to an extent but it’s nowhere near a substitute for new knowledge being fully baked directly into its training.
reply