Hacker Newsnew | past | comments | ask | show | jobs | submit | losvedir's commentslogin

China will, but they'll only be useable by hackers torrenting it and running it on small GPU clusters you learn about on IRC. Everything old is new again.

I have no idea why people keep thinking this

Do you have more info about this? I can't tell if you're being misled by the unfortunate "Mixture of Experts" terminology (which don't work the way you're describing), or alluding to something different.

Or, maybe I'm wrong, but my understanding is: MoE is just an architecture to keep the activated weights smaller per token. The experts get routed basically token-by-token, and the "experts" themselves don't have a semantic domain so the "expert" word was maybe a poor choice.


No, this is an agent-level thing, not a feature of the model (ish, unsure for Fable).

You talk to a smart, heavy model to build a plan composed of smaller steps. Then you have the heavy model spin up smaller, cheaper LLMs to actually implement the tasks.

The heavy model is basically read-only in that mode. It can read files, execute tests, etc, but it can’t write code. It just tracks what needs to be done, offloads the work to dumber LLMs, validates the task is done, and moves on to the next step.

It sort of pushes humans up the stack. Instead of having a human sitting there prompting the LLM to start the next task, you have another LLM do that loop.

It’s been on my list to try out.


https://en.wikipedia.org/wiki/Mixture_of_experts#Sparsely-ga...

"The sparsely-gated MoE layer,[21] published by researchers from Google Brain, uses feedforward networks as experts, and linear-softmax gating. Similar to the previously proposed hard MoE, they achieve sparsity by a weighted sum of only the top-k experts, instead of the weighted sum of all of them."

"Top-k experts," in case of some DeepSeek's models k=1.


See OpenRouter’s recent announcement on a model fusion setup, which they now support via API:

https://openrouter.ai/blog/announcements/fusion-beats-fronti...


What does "use agents" mean from your perspective? Just Claude Code with some MCPs? Or like a full on GasTown type setup?

The former. I would not expect any candidate - especially a new grad - to have actually used GasTown style setups unless they were coming from another company with money to burn on those sorts of multiple-agent scenarios. I barely run more than one or two Claude sessions at once because it slams into the usage limit so fast.

I could steer the conversation to explore if the candidate had insight into why GasTown or other "swarm" models exist though. Do they know what a system prompt is? How context is managed/affects output? How harnesses can delegate to simpler models, and when to use different classes of model? I wouldn't expect anything particularly deep here, but bonus points for a solid understanding/experience of the tooling and how to get the most out of token budgets is valuable.

To the parent point: even if you refuse to use LLMs for whatever reason, I would expect any coding applicant to have at least tried them. I'd assume the person is living under a rock if they said no.


> If you think the solution here is going to be open source Chinese models and / or running on your own hardware, think again. Do you think China is going to allow

I think this also misses the point. The precedent here almost surely implies that it will be illegal to use these frontier models as well.

I can see a future where weights are distributed on the darkweb or bittorrent, or people are trying to use small fly by night hosts of models.

But if this says these models are dangerous and the companies and people can't be trusted with them, then I don't see why that wouldn't also apply to open weight models.


How old are the kids? Honestly two hours of screens and devices seems like a lot to me, so I'm surprised that's where you cut down to. How much were they using them before? But my kids are young, so it's probably different if you're talking about like teens or something.

I think maybe "finding themselves on the space station" could be like humans finding themselves on Earth? You're born onto the planet and are simply grateful to be here. But the more you learn about it and existence generally, the larger and more grand you find it to be. Ancient peoples looking up at the vastness of the stars is probably how all religions began.

LotR is an oldie but goodie! I finally dug it out and read it this past month and it was quite enjoyable. I had tried back in high school but was kind of bored with the frilly language and songs and gave up. But this time around in my 40s, having read lots more books since then and developed a stronger vocabulary and reading stamina (e.g. I've read and enjoyed Stormlight Archive twice, which is 4x the length), it was actually pretty quick and easy, and I regret not having done it earlier. I paid a lot of attention to the journey and all the cardinal directions and feel like my sense of direction improved actually. And I'd always liked the movies but the books are so much better! It feels like a book from a strange almost-on-the-spectrum nerd who also spend time on the front lines in World War 1. I think these days the nerdy authors I like and the people who are grunts in the military are almost distinct circles, so maybe unlikely to get quite the same book in terms of the lore but also the realistic emotional punches.

Beyond that, I just started Captive's War, written by the people behind The Expanse, which I adore, and it's looking similarly good (similar to The Expanse; not LotR. I think it will be hard for things to match LotR for me).


Break between training runs?

No, the article has an Alaska focus because it's some Alaskan news agency, but I believe it's a nation-wide block. The judge in question is based in Massachusetts.

Side note, but I'm sort of surprised that this "level" of judge (I think there's almost 700 of them in the country) is able to block these orders. It seems like almost no executive order is possible if you need a unanimous agreement of 700 people.


Recently the Supreme Court has curtailed these nationwide injuctions: https://www.scotusblog.com/2025/06/supreme-court-sides-with-...

> It seems like almost no executive order is possible if you need a unanimous agreement of 700 people.

No "executive order" of the type commonly thrown around these days, sure. It's quite easy for "700 people" to agree on settled law when those people are federal court judges. Where the law is not settled, the President should not be making up dictatorial "executive orders" that are intended to be applied with the force of settled law. Legal ambiguity bears a cost, and that cost should fall onto the government itself rather than people being illegally harmed.

With the political mandate for immigration reform, all of these things could have been straightforwardly accomplished through the appropriate avenue of Congress - in fact there already was a bipartisan immigration bill well on its way which was killed! The sensible way to see these brash unilateral executive dictats are for the purpose of creating a mere appearance of addressing the problems, while ultimately just setting the stage for deliberate failure.


I was wondering the same. I have to imagine it's mostly Gemini, unless Apple has a big, secret, SotA foundation model no one has heard of? But if it is Gemini, how does that work with their Private Cloud thing? Are they able to load the Gemini weights into it?

IIRC Apple cut a deal to have their own version of Gemini that is hosted just for them.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: