I've been contemplating a decentralized model training system for some time using volunteer machines that we all contribute. But, it is astronomically difficult. The communication speeds are untenable.
And, there is the issue of data poisoning from untrusted nodes. I've almost cracked that last issue with a self-healing checkpointed rollback system that doesn't have to throw out anything that follows the corrupt datum.
But, I'm just one person with an idea and I don't have infinite funds to make this happen. This isn't a small project.
Maybe there would be interest in something like this, now that entire frontier labs are being banned from making further progress.
The total power of all GPUs on the planet dwarf their capabilities, if we had a way to harness them in a distributed way efficiently. We wouldn't be able to train a Fable as fast as them, but eventually having access is better than never having access.
As I replied to a child comment - this is a nice idea that just isn't tenable in reality. AI hardware isn't just hilariously faster than consumer GPUs, it's also hilariously more power-efficient and has hilariously better connectivity. Every one of these dimensions kills the idea.
The far, FAR superior power efficiency means that even if you did harness every public GPU or GPU-like device on earth, you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.
And even if electricity was free, having those GPUs spread over the world with internet-level latency will slow everything down by factors of thousands to millions - if it's feasible at all. Regardless, you're not getting fable-oss this decade, maybe even not this century.
It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do.
AI hardware is for inference, not training. Training uses normal HPC crap. Superpods aren't really power efficient, it's kind of a meme, and it stems from limiting the power draw of other components by having less of them. It's more of a rounding error.
> you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.
Costs spread over a large population, it really doesn't matter. You're not getting hundreds of thousands of people to pitch half their monthly electric bill to pay for someone else's datacenter. They will pay the electricity themselves quite happily though, if all they need to do is give you compute. This isn't new.
Interconnect is the bottleneck for distributed training, nothing else really.
You got it wrong. Inference can use crap GPU's. Training needs the 100x more expensive big guns. Our training machine is 100x more expensive than our inference machine.
How is the result of training stored? How big is that? It seems reasonable to assume we’ll eventually plateau and all we’ll need is relatively infrequent training.
Not so often. The GPU's are running 100% for 3 weeks for a training run. We do images only, but it's the same process. And then we can use the costly GPU's for inference, local model coding agents.
Training is about 4x a year. But it depends what ideas the PM or the costumers have. If they has more, more training tasks. Eg. more viruses to detect.
Not sure what you are referring to, unless you don't think h100/h200/b200 are "AI hardware"
> Superpods aren't really power efficient
Maybe not compared to a specialized rig with multiple 4090s, but that is the best case for consumer hardware - the vast majority will be dramatically less efficient than that
Anyway, I agree the interconnect is by far the biggest obstacle and seems insurmountable, I should probably have led with that.
I recall getting really excited over hinton's FF foray, right before he bailed on AI as a societal direction (which, if anyone ever had the right, I suppose he does). If one squints, one can see a backprop-free base being much easier to train on geographically distributed and heterogenous hardware.
Efficiency difference between training on GPUs and TPUs is 2x at best. You can get very efficient with tensorcores, converging to TPU efficiency. In the end math is math, you can't make a multiplication more efficient than it already is on GPU.
If you were to take 500 computers with older 1080 GPUs, you might have enough compute/ram equivalent to an H200 GPU for training such a model. Maybe take 10000.
But if those machines are spread over 10000 homes, wired with residential internet service, training a large model will not get anywhere.
You go from "data in the same HBM memory chip" at 4.8TB/s or "data in adjacent GPU" with NVlink at 1.2 TB/s down to 25 MBit/s upload speed. Accessing the next piece of data is going to be about a Million times slower.
At the same time you will heat a thousand times more, for a Million times longer.
You need to train independently and merge rarely. The problem is the merge step. Weights are too entangled, you are not going to get an improvement commensurate to the effort. Otherwise, everyone would do it. It is an open research problem.
The power-constrained part of compute is data movement, not the elementary arithmetic per se. Anyway, it's very possible to tweak the underlying design to increase throughput a lot for any given power budget at the cost of high latency. This seems especially useful for training workloads where we don't really care about latency as much.
Could you put some numbers and examples behind the efficiency gap between data center and consumer-grade AI hardware? Did you include examples like the RTX Spark on the consumer side? I was always amazed at the low power consumption of unified memory style architectures. In absolute terms and even more so compared to consumer-grade GPUs. I'd be genuinely interested in a comparison with data-center-grade hardware.
DGX Spark is effectively prosumer hardware, better than most consumer stuff but still not comparable to actual datacenter gear. You can't just look at TDP in isolation without also comparing performance.
It's more than the raw hardware, it's the interconnect and communication between the hardware at scale. These models are trained on hundreds of thousands of GPUs today. You _will_ start to see cross-datacenter training runs but this needs to efficiently decide when and how to communicate across datacenter, which bears a very high cost compared to intra-datacenter communication.
Dunno, in a sense, torrents came among similar restrictions. Everything at consumer level was just plain awful and at dial up level, mebbe ISDN if you were very lucky, with fiber only available to ridiculously rich people and corps. But with restrictions, came approaches on how to mitigate them.
Yes but not violations of the laws of physics. You need extremely fast communications, memory bandwidth, etc; you cannot get that with distributed training. You're up against the speed of light and the interconnect that powers the internet. You will always have horrifically slow latency compared to if you pack the servers together in the same place with specialized networking.
<< You will always have horrifically slow latency compared to if you pack the servers together in the same place with specialized networking.
Agree about the physics; disagree about the larger point.
I am not questioning that servers packed together may achieve an optimal result in how we are currently doing things, but, and this is my point, what if we didn't.
<< you cannot get that with distributed training
This is entirely the wrong question to ask. The question to ask is: how it could be adapted to distributed training.
You know what I'm surprised to find out this is far more feasible than I assumed; DiLoCo + INTELLECT models demonstrate how feasible decentralized training is already, that is very surprising to me that you can get that far with so much less communication bandwidth. Not only that, but that distributed training is _more_ feasible as you scale since compute needed scales as the square of parameter count but communication scales linearly so the overhead penalty goes down.
I think the most important problem is that you have to marshall enough compute to be meaningful, and that is going to be more and more difficult as frontier compute requirements grow.
It is a genuinely interesting problem ( above my mental abilities, but there are people smarter than me that could make it work ). I agree that compute could end up being an issue as things progress. Still, it seems that portions of what would be necessary kinda exists.
But, and it is not a small but, there is no money in it. In fact, big orgs are bound to lose money should something like that succeed.
I used it as an example. I understand the problem is hard. My larger point was that this is exactly how actual progress tends to take place. Well, that and porn.
> It would be better for governments to buy and own their own datacenters, maybe as a coalition, and dedicate their operation to the public good. I believe that is what we actually have to do.
100% agree. The US government basically has to nationalize AI and capture an outsize portion of the revenue from it in order to fix the economy, as the combination of debt burden and interest rate pressure from de-dollarization/global realignment is going to push us into a death spiral, and even if AI is a smash hit, the ~19% federal capture of corporate revenue isn't nearly enough to pull us out of it. The people owning the compute infrastructure and capturing more profit from AI at that layer is the safest, cleanest way to increase revenue capture, a sovereign wealth fund is a mediocre idea because it's possible to play shell game with stocks and redirect profit/debt (venture capital is quite good at this!).
>> The US government basically has to nationalize AI and capture an outsize portion of the revenue from it
Currently AI has generated no profit. And as it sits, is a non viable business.
I refuse to include the sellers of shovels as AI revenue.
If the companies buying the shovels are still losing money, then the tool supplier fortunes have nothing to do with the economics of the AI application layer, who is losing money on every prompt.
It's the most naive opinion that keeps getting shoveled around. You have a product that is viewed as essential by businesses, with revenue growing by 10x a year and geopolitical ramifications that have continued to rear their heads and your opinion is "this is all an unprofitable shill". It is extraordinary to me that people really believe this. Whether or not labs run at a loss today is absolutely irrelevant. There is of course steady state economics that make sense, and its currently not well known what the profitability picture is right now, so to say "Currently AI has generated no profit" is also just speculation and not a very insightful one at that.
That businesses view it as essential...is not a profitability argument.
Businesses also bought dot com infrastructure, telecom fiber, crypto platforms, metaverse tools, and overbuilt SaaS. The question is whether the AI application layer can charge more than its full cost and the costs are inference, infrastructure, depreciation, R&D, customer acquisition, support, compliance, security, and error remediation.
The numbers so far do not inspire confidence. OpenAI reportedly did $4.3B in revenue in the first half of 2025 while burning $2.5B, and Microsoft said OpenAI related losses reduced its own quarterly net income by $3.1B. An MIT 2025 enterprise AI study found $30 to 40B spent on GenAI with 95% of organizations seeing zero return.
One of the core technical reason is that hallucination destroy enterprise economics. If SAP hallucinated 2% of invoices, or Oracle returned fake rows 2% of the time, nobody would call that early stage friction. They would call it unusable for core operations.
In legal AI, even specialized tools have been measured hallucinating 30% of the time. The problem is that as AI gets better it is confidently, plausibly wrong. That forces humans to verify it.
So the cost does not disappear. It moves from doing the work to checking the work. AI coding has the same issue. If an autopilot got you there faster but one flight in ten became unstable unless the pilot constantly supervised it, that is not productivity.
For the bull case to work, the usage must explode, the quality must improve, prices must fall, reliability must rise, legal risk must shrink, and margins must expand and all this at once. I would say that instead of a business model, this is five miracles stacked on top of each other.
I've heard that the API calls by themselves are ~60% profit if you ignore capital expenditures. The labs haven't generated profit because they're constantly sinking money into the next generation of larger models to stay relevant. Dario has talked about the economics of this a lot, and I do believe him there.
There's clearly also a lot of pent up demand in the corporate world for inference, the problem is that it's currently expensive enough that enterprises are balking at the cost before they've had a chance to refine processes and see projects through to fruition. That's a tractable problem to solve though.
That's true, but if the frontier doesn't advance there's no depreciation or ongoing capital expenditure. If all the frontier labs agreed to stop making stronger AI and just try to sell what they've already trained today, their books would turn green in a hurry.
> The US government basically has to nationalize AI and capture an outsize portion of the revenue from it in order to fix the economy, as the combination of debt burden and interest rate pressure from de-dollarization/global realignment is going to push us into a death spiral, and even if AI is a smash hit, the ~19% federal capture of corporate revenue isn't nearly enough to pull us out of it.
Any actual numbers to back this up? I don't see how nationalizing a very cutting edge technology outside of wartime is going to go super well. The leverage that these companies have is the same leverage that TSMC has: you can't just take over and expect things to rocket at the pace its going
WRT government data centers, there is certainly precedent for independent researchers getting HPC time on systems owned by US national labs, research institutions, universities, and then publishing their results as part of the public good.
One would question why this hasn't already happened as the rule and as opposed to the proliferation of private data centers. However, I am sure the answers are plain and perhaps saddening to us all.
DeepSeek and GLM (plus Kimi) are at or above Sonnet level wrt. favorable workloads like coding. They're not close to Opus or the latest GPT yet, and Fable is even higher than that. Other workloads relying more on real-world knowledge have them even further behind, and this can't be mitigated without making the model itself bigger and harder to host locally.
Not true. Big models buy you baked in knowledge and long context cohesion. A model can be trained to use search and knowledge base tools more efficiently to mitigate the former, and harnesses/workflows can be designed to push models into small parallel threads to mitigate the latter.
The thing that big models will always bring to the table is the ability to YOLO weak/under-specified prompts, and spend less time in the loop making sure work gets partitioned correctly. For smaller/simpler tasks the P(success) difference isn't that big.
Knowledge-base access is not very useful in general because a model doesn't have well-defined "known unknowns" that might trigger an agentic search of the outside knowledge base. Plus surfacing knowledge you don't know much about is itself hard.
These things sound plausible, but have they actually been demonstrated? Wouldn't anyone who succeeded in making such a small but useful LLM be raking in the money now?
Cursor's composer 2.5 is a perfect example. It's right on the heels of the frontier (for coding only) for an order of magnitude cheaper. As much as I've shit on Cursor in the past, I do think the company is well positioned to pick up people getting sticker shock on Anthropic tokens, if they can get their marketing down.
It is, but the US labs have been pushing parameters heavily. There was a pullback from big models after GPT4.5 in particular, but with a shift towards emphasis on post training and the good results Google got with scaling Gemini 3, all the labs started to push scaling again, which is the reason the frontier is getting more expensive. So that 1T isn't as big as it sounds, the American frontier is probably sitting at 3-5T at least.
Disagreed. GLM-5.1 is easily as good as Opus 4.5 for all the coding purposes I could throw at it, which is the model that kicked this entire hype cycle into overdrive in the first place.
Writing does not rely on real-world knowledge all that much, other than knowledge of language itself. Even tiny models can achieve that, it's even easier than coding.
The challenge with writing is the lab collapsing the distribution around "tasteful" writing, when the people making decisions about training data aren't able to effectively discriminate it.
The key thing here is that effective intelligence = model capability / cost. If you drive down the cost of inference you can have higher effective capability even with a technically less capable model. There is nothing in Anthropic/OpenAIs general reasoning capabilities that can't be easily done much better with a purpose built harness for a domain specific task.
One being that extrapolating from like 3 data points is hardly science. All trends break at some point.
The other is that the measures to prevent distillation of their models (if it was a secret sauce of Chinese models) could work if nobody is allowed to use them.
> It would be better for governments to buy and own their own datacenters,
I mean thats good, but they'd have to also build thier own dataset. Which involves either paying people, or breaking the law.
Plus if they do manage to make it work, they will not get any tax revenue from it, as it'll remove the need for labour, which is where a huge amount of tax revenues come from.
its a deeply hard problem with lots of second/third order effects.
> As I replied to a child comment - this is a nice idea that just isn't tenable in reality. AI hardware isn't just hilariously faster than consumer GPUs, it's also hilariously more power-efficient and has hilariously better connectivity. Every one of these dimensions kills the idea.
The first part is not really true though, the chips are not that much faster, the DRAM is not that much faster, and in aggregate it does not matter because there is just so much more consumer hardware out there (although perhaps that is changing as supply shifts toward datacenters).
The interconnect and data locality is the problem. If you could train it like e.g. you can render a scene with monte carlo ray tracing, any result from any node could be merged with any other and the combined result would have converged closer to the limit. I am sure research in that direction exists, it just has not proven effective within the scales it has been attempted.
>But when people think of decentralized training, they don’t first think of gigantic datacenters, owned by the same company, training models across large distances. Instead, they imagine thousands of small datacenters, or individual consumers, pooling their spare compute over the internet to orchestrate a training run larger than any single actor could manage alone.
Many companies are pursuing this vision: Pluralis Research, Prime Intellect and Nous Research have already successfully decentrally trained models at scale. But in practice, training decentrally over the internet has lagged far behind more centralized training. Even their largest models (Pluralis’ 8B Protocol Model, Prime Intellect’s INTELLECT-1, and Nous’ Consilience 40B) have been trained with 1,000x less compute than today’s frontier models (such as xAI’s Grok 4).
https://epoch.ai/gradient-updates/how-far-can-decentralized-...
I think it's fundamentally not useful as long as there are other open source model releases. E.g. suppose you make SotA model at a particular size via decentralized training. Amazing. In a month Qwen/Deepseek/etc release a new model which is better. So why would you use the "decentralized one"?
Models have limited shelf live while things are improving rapidly, and decentralized training is just more wasteful.
However, things might change if we get to what Karpathy calls "cognitive core" - a stable model backbone which can be extended via skills/adapters/etc. Development of extensions to the core can be a lot more decentralized.
But for now these decentralized training attempts function largely as a deterrent to anti-open-source collusion
> The total power of all GPUs on the planet dwarf their capabilities
That just isn't true. It misunderstands exactly how much silicon has gone directly to those companies, and exactly how much more powerful said silicon is compared to consumer grade gear.
If folding@home is a useful yardstick by which we might estimate the amount of GPU-ish capability that civilians might be coaxed into donating to a shared enterprise, yeah, it doesn't look pretty. This is extremely rough napkin math but comparing to xAI's Collosus 2 for example, for training workflows you're probably looking at 4-5 orders of magnitude the capability of all of folding@home combined. That's 100,000 times faster.
Very rough math like I said but I doubt it's directionally wrong.
And even if you did force literally everyone on earth with some sort of GPU to max it out 24/7 in service of an open source AI training enterprise - you would waste so much power trying to use that inefficient consumer hardware with the worst latency imaginable that it would be cheaper and faster to get everyone to instead chip in some cash to buy a datacenter with blackwell chips instead! So the idea has no legs whatsoever.
Plus a scientific project to benefit all of humanity doesn’t have quite the same ring as the thing thats stealing your job, from the volunteer’s perspective
it's down 99% since that peak. But let's compare to it anyway.
It's pretty useless to compare raw FLOPS, but as a general hand-waving guesstimate, F@H is currently doing about 25 petaflops in a mix of FP16 and 32. AI usually trains at FP8, but to keep things fair the H100 is quoted at 60 FP64 teraflops per unit, so that's 12 FP64 exaflops given its 200k count.
So F@H at its peak did 2.43 exaflops@FP16/32. Colossus 1 does 12@FP64. These numbers are very hand-wavy, but I think the point is made.
By the way, I'm not trying to crap on F@H - I think it's an outstanding project and I've run it in the past. But a volunteer group simply cannot compete with well-funded, concentrated effort like what's going into AI.
Is the total compute capacity outside of meta, google, amazon, anthropic, oai and x is higher than even the capacity of any of them? In any case, there's no chance a public collaboration gets to anthropic levels of compute even if communication were no issue.
Is the issue that training with less compute takes more time? Or is it just not possible? I think a collective using distributed training could tolerate the idea that it takes 10x as long as Anthropic to train a model, or whatever.
It's possible but it's not linear. A modern AI training cluster is a supercomputer that uses very different architectures and hardware to a bunch of small PCs connected via normal networking. The networking advantage alone kills any chance of decentralized training.
Maybe the training approaches taken to date are wrong for decentralized systems. Setup a virtual subnet you can trust and do training on that. Create a AI model island in a trusted/federated model system -- definitely slower than the typical 'one big model' approach, but scalable to world size modeling.
Also, it wouldn't be able to use a transformer architecture. For inspiration, take a look at Google Maps and how it a much more efficient A* divide/conquer hill-climbing architecture. Think minimized matrix math.
Other comments also hint at this idea, a distributed training solution is currently an open research problem. Solving it is not easy, yet. But 10 years ago what we have today for LLMs would have looked similarly impossible, so have hope, and apply yourself to the problem if you find it interesting!
It seems this project is serious and very promising. They have the Psyche network which seems real and operational. They're able to produce ~50B-class models, this will only grow over time of course. Very cool.
Man, that project is such bait for my particular sensibilities but just looking at the copy about not sharing your data and only sharing weights has me feeling very disappointed in the project already. I would want a project like this to not elide fact that sharing your weight updates probably effectively means sharing your data too.
Have you checked out [petals](https://petals.dev/)
It’s doing the same thing, however the project is written in python and there can be some optimizations to make it much more faster.
If you take a model, make two copies, and fine-tune each one on different data, what happens when you merge them? Does it work if you freeze different layers?
I think this works if the steps are small enough. And the transfer should become tenable if the steps are big enough. Where's the cutoff?
Yes it can be parallelized, it already is in real AI datacenters and no it doesn't help you. Like everyone else is saying, an AI datacenter is not just a bunch of gaming GPUs connected via normal ethernet and hasn't been for years.
At most a decentralized effort could contribute a little bit to some bigger centralized effort by doing inference and sandboxed CPU work. Modern model training isn't just backprop, it's got a huge and growing CPU and inferencing component too, which doesn't require intense inter-node communication. For instance, doing RL rollouts for agentic coding requires a lot of plain old inferencing and sandboxed containers for the models to practice in. The final results are just a set of rollouts and scores that can be uploaded back to a central datacenter for GRPO to adjust the weights (relatively cheap). But then, of course, you'd have to stick to models small enough to fit on people's computers so it'd never be competitive.
Ya that'd be an awesome project, the only issue is how do you verify it's not being poisoned? To actually validate it would require more analysis than the training took to run. It would require a trusted network, not an open one, unless that can get solved somehow.
Could it be done by making a sparse MoE of thousands, or tens of thousands, of smaller experts in very niche domains? Maybe a tree-like structure of experts which can delegate from relatively general but inaccurate to extremely niche but accurate? Also these experts might be plug-and-play, easily swap out an inferior expert with a stronger one in the future without having to redo the whole pile?
That's not really how the experts in an MoE work. They activate on token probabilities and are activated on every token. You don't necessarily have a discrete math expert and a discrete physics expert. And if it were you would still need a router that is trained on all of those domains.
MoE models are typically designed for datacenter deployment, where per-token load-balancing is more important, but it's also possible to use a different training objective that encourages domain-specialization of experts: https://allenai.org/blog/emo But yes, this isn't really useful for distributed training as such because of the router.
Since SAGI can't be practically distributed, and it puts so many people out of work, how about moving all of the unhoused people into the nice warm data centers, and call it home@SAGI.
>I've been contemplating a decentralized model training system for some time using volunteer machines that we all contribute. But, it is astronomically difficult. The communication speeds are untenable.
It is already possible: https://arxiv.org/abs/2603.08163 . You don't need to sync so frequently, so it can be done over normal internet, it's just less efficient (takes longer to converge).
Someone with AI psychosis would say it was easy. I'm saying the opposite. I'm stating that it'd be cool, but at the moment I don't see how it is feasible. And, for fun I tried to solve one small aspect of the problem.
I also didn't bring up the concept out of nowhere, this is in response to an article about open source AI. The premise of the post is releasing control to the public. What is more open than a decentralized system? And, why wouldn't you brainstorm in a comment on such a thread?
I also didn't ask an AI for the idea, it's just an idea I have. There's a difference.
You're mistaken, this is a cratering of the userbase inside and outside of the US. The ban is on any foreigner whether abroad or living in the USA, so Anthropic has no choice but to completely shut down access to the model for the whole world including the US.
Their IPO is well and truly fucked now. This also means no other frontier lab in the US is allowed to exceed Opus 4.8 capabilities.
If you're a luddite or a decel you should literally be dancing in the streets right now. And, if you're a tankie you'll be dancing right next to them. And, if you were hoping for a Star Trek-like future, you just adjusted your timeline for the worse.
>this is a cratering of the userbase inside and outside of the US.
Is it really? It was limited release anyway (like hypebeast merch!). Everything people are gonna talk about for a week is gonna be about how Fable was so cool that it got banned by the feds. If it's just the Trump admin being the Trump admin, Amodei is just gonna have to pay up as a racket / marketing expense. Or it is like I'm suspecting and this was pre-bribed and the ban is kabuki theater.
>And, if you were hoping for a Star Trek-like future, you just adjusted your timeline for the worse.
The funny thing is that solar and batteries advancements are actually this, not LLMs, but your framing kinda fits anyways.
The main error of the AI bubble is expressed in the The Jetsons cartoon from the early 60s.
In the future, everyone obviously would be running nuclear powered cars. It was just an engineering problem to be solved. Ford made the Ford Nucleon prototype in 1958.
The nuclear optimism completely blinded people to the ridiculous idea of an individual handling nuclear material for personal use.
The AI bubble error is this idea that everyone is going to have "AGI" in their pocket. It is just a completely absurd idea that is not going to happen.
Fable was interesting from what I tried but nothing close to AGI yet here we are. The models don't get smarter and LESS restricted from here.
To me, right away it seemed that the "Mythos moment" was extraordinarily bearish for the assumptions the AI bubble is built on.
I'm verbose, and no one was ever interested in anything I ever had to say before. And, they certainly wouldn't be now if I used AI to do it.
These days I only bother commenting if I really feel I need to get something off my chest, but I don't assume anyone gives a shit. I frequently just delete my responses. I don't generally feel anyone needs what I'm saying.
It takes a bit of self-awareness to realize that no one is interested in what you have to say, so don't bother. And, I think it takes twice as much self-awareness to realize that they certainly don't care about what a bot has to say, unless they're the ones asking.
I have a friend/coworker that is verbose af (14min audios discussing something that could be compressed in 100 words) and is not that we’re not interested, but it gets annoying after years and years of text walls.
Being concise is an art by itself and it’s a really good way to express your thinking and convince people, it just needs practice.
Well wasn't that like the first real attempt on this & to a degree a non-optimal one at that ?
Its not like sustainable extraterrestrial habitats will not be needed elsewhere anyway, so it should be hopefully long established technology (by the lava tube settlers of the Moon or the Oneil Cylinder bunch from L3) when you do the first interstellar ship.
LLM agents don't know how to shut up and always think they're right about everything. They also lack the ability to be brief. Sometimes things can be solved with a single character or line, but no they write a full page. And, they write paragraphs of comments for even the most minuscule of changes.
I expect a lot of the things people don't like ("output too long, too many comments in code") are side effects of making the LLM good in other areas.
Long output correlates with less laziness when writing code, and higher performance on benchmarks due to the monotone relationship between number of output tokens and scores. Comment spam correlates with better performance because it's locally-specific reasoning it can attend on when writing the next line of code, leading to reduced errors.
You can, but I would expect code correctness to be reduced, you're removing one mechanism the model uses to dump local reasoning immediately prior to where it's needed.
With that logic, I should ask the AI to _increase_ the amount of comments. I highly doubt the comments it generates are useful, they're usually very superficial.
Perhaps not useful to you, but they are the only way the LLM has to know what it is doing.
It has to reason about the problem in its output, since its output comprises almost the entirety of its "awareness". Unlike you, the LLM doesn't "know" anything, even superficial things.
In some sense it's like us when we are working on a problem with lots of novel parts. We usually have to write down notes to refer to in the process of solving the problem, except for the LLM the problem is always a novel problem.
I usually use huge context/prompt documents (10-100K tokens) before doing anything, I suppose that helps.
I’ll experiment with comments, I can always delete them later. My strategy is to have self-documenting code (and my prompts include a how-to on self-documenting code)
But that information is scattered. It's helpful for the LLM to cluster and isolate local reasoning that it can then "forget" about when it moves on to the next thing. Attending to nearby recent tokens is easy for it, looking up relevant information as needle in a haystack every single time is more error prone. I'm not saying asking it to remove comments will lead to a catastrophic drop off in performance, maybe something like a few percent or even less. Just that it's not useless for pure benchmaxxing.
I have added it in the guidelines doc for Junie and that won't stop it. It can't help itself - it needs to write a comment every three lines, no matter the language it's writing in.
I was trying out sonnet 4 yesterday and it spent 15 minutes changing testing changing etc just to get one config item changed. It ended up changing 40 files for no reason. Also kept trying to open a debugger that didn’t exist and load a webpage that requires auth.
Does this mean when they grow up, their own offspring will also have this defect and require a correction? And, if so, does this mean it is now introducing this defective gene into our gene pool?
I know this is an issue with caesarean section. It is becoming more prevalent because those who require it are surviving, making it more likely to happen in their offspring.
We get half of our genes from each of our parents. So unless this person has the extremely unlikely misfortune of partnering with someone else with the same rare mutation, their offspring would only have a 50/50 chance of inheriting their copy of this gene. There are also medical procedures (PGD) to bring that chance to virtually 0%.
We don't get 50/50 of distinct genes from our parents - it is more like 30/70 and can be even 10/90. The whole DNA ratio in this equation is irrelevant, as we all have 99% of the same DNA. Also, in real world, one parent will consistently give more of their distinct genes than other parent and most likely that consistent gene part will have that single mutation that they would hope to avoid, but contain best genes that the parent can offer. Children from multiple partners could be a solution as it is a different math...
>>>There are also medical procedures (PGD) to bring that chance to virtually 0%.
For that one gene only.
DNA is a math of sum of genes and from what I have read humans are not better than nature(which is not perfect, but very basic) at selecting best specimens of eggs and sperm, but yes - whatever they have picked - PGD might be able to root out that one single mutation, and introduce variety of other mutations or miss good genes from other combinations. So, it all depends...
Also parents who are both carriers have a 25% chance of making a sick child, a 25% chance of making a non carrier and non sick child, and a 25%+25% chance of making non sick yet carrier child. So they already have a 50% chance of making children who'll survive and yet be carriers of the disease. I guess this will increase this to 75%. But you have to evaluate this in connection with the rapid increases in genetic treatment options, which decreases the issues.
"The cesarean delivery rate increased from 5% in 1970 to 31.9% in 2016. This sharp increase can be attributed to various factors, including changes in maternal age, medical advancements allowing more complicated pregnancies to proceed, and evolving obstetric practices. In 2022, the United States recorded more than 3.66 million births, most of which resulted from spontaneous or induced labor. Labor dystocia remains the most common indication for primary cesarean delivery. Globally, cesarean delivery rates continue to rise, and reducing unnecessary cesarean procedures remains a priority in the United States, where 32.2% of all births in 2022 were cesarean deliveries."
"If this trend continues, by 2030 the highest rates are likely to be in Eastern Asia (63%), Latin America and the Caribbean (54%), Western Asia (50%), Northern Africa (48%) Southern Europe (47%) and Australia and New Zealand (45%), the research suggests."
Note: Coincidentally, WHO's article I've linked is lamenting that Sub-saharan Africa only had 5% cesarean due to less availability of the procedure. It is their perspective that the increase in percentages is a good thing and indicates progress, instead of being concerning. And, they find Sub-saharan Africa's low numbers concerning, instead.
Side Note: I also found lots of interesting articles which I haven't posted here, about epigenetic side effects caused by caesarean deliveries like leukemia, illnesses and other genetic issues. But, that seems out of scope for your question. You can make a quick search and find these, though.
"A female-to-female familial predisposition to caesarean section was observed. It could be caused by biologic inheritance, primarily working through maternal alleles and/or environmental factors. The results imply that both mechanisms could be important."
"Large-scale epidemiological studies indeed evidence that women born by C-section are more likely to deliver by Caesarean than women born vaginally, owing primarily to genetic rather than social factors."
> Another Note: Also, ironically WHO's article I've linked is lamenting that Sub-saharan Africa only had 5% cesarean due to less availability of the procedure. It is their perspective that the increase in percentages is a good thing and indicates progress, instead of being concerning. And, they find Sub-saharan Africa's low numbers concerning, instead.
Pretty sure their perspective is that "saving the lives of mothers and babies" indicates progress.
> While a caesarean section can be an essential and lifesaving surgery, it can put women and babies at unnecessary risk of short- and long-term health problems if performed when there is not medical need.
> Rather than recommending specific target rates, WHO underscores the importance of focusing on each woman’s unique needs in pregnancy and childbirth.
> WHO recommends some non-clinical actions that can reduce medically unnecessary use of caesarean sections, within the overall context of high quality and respectful care:
Yes, that's what they're indicating. And, it is saving lives. I myself was cesarean section, as was my mother. I wouldn't be here without it.
That's the potential conundrum, if it turns out to be vastly increasing the need to save those lives than in the past due to a evolutionary pressure on the gene pool. If the WHO is right and we're going to start seeing 50 - 63% increases by 2030, what's in store for the human race if this rate of expansion keeps up?
Will we reach a time when no one can be naturally born and almost our entire race has to be conceived in external gestation devices or cease to exist? And, when we reach that point will we look with concern towards Africa and wonder at how sad it is they're still conceived naturally.
Edit: I don't have the answers. I'm not sure what we should do to course correct or if we need to. But, it is definitely something that should be looked into before it is too late, if it isn't already. And, that is why I brought it up in the context of this breakthrough, to ask if we've considered similar consequences. And, if we have a way to mitigate them if that turns out to be the case.
> Edit Edit: I can't reply to your comment below I think we've hit the leaf end of this post. But, to reply to your question are c-sections replacing natural births or are they just becoming more common? The research I've cited has indicated this is a genetic transfer among female-to-female births of a need for more cesareans.
To reply after a certain number of child comments, you have to open the comment by clicking the timestamp thing
I'm also afraid I don't understand your response. Can you elaborate?
The research I've cited has indicated this is a genetic transfer among female-to-female births of a need for more cesareans.
"A female-to-female familial predisposition to caesarean section was observed. It could be caused by biologic inheritance, primarily working through maternal alleles and/or environmental factors. The results imply that both mechanisms could be important."
"Large-scale epidemiological studies indeed evidence that women born by C-section are more likely to deliver by Caesarean than women born vaginally, owing primarily to genetic rather than social factors."
> "Large-scale epidemiological studies indeed evidence that women born by C-section are more likely to deliver by Caesarean than women born vaginally, owing primarily to genetic rather than social factors."
Interesting. That makes sense. I wonder if the type of research being pursued in TFA might be helpful.
In any case, I also have to wonder whether it's necessarily a bad thing. I quoted 'natural' births earlier because... what is natural? The amount of medical knowledge and technology that go into births doesn't seem very "natural" to me, and this has advanced through the ages to where we are now - where we, rightfully so, look sadly on areas where lack of such technology and knowledge result in more preventable deaths of babies, even if their methods are more "natural"
Of course, to be honest, I'm not very familiar with the pros and cons of c-sections vs natural births - particularly when the question is whether to have a child. I suppose that, given the choice between a c-section and the alternatives, most women will opt for a c-section, and as you point out, that means their daughters likely will have to as well
So what might the solution even look like, apart from exploring the aforementioned gene-editing technology - or other technology - to prevent the genetic factor of c-sections? I would hope that "don't offer c-sections" is not a serious option. "Stop having kids" is one I'd personally suggest, but that's obviously not a sane global solution either.
It's an interesting problem I'd be curious to hear more about - as I said, I'm not very familiar with this.
Research is inconclusive regarding what exactly causes this increase.
We know that infants are generally larger than 50 years ago and one of the factors which trigger birth is the inability of the mother's metabolism to support further growth of the fetus.
That, combined with the fact that all over the world availability of nutrition is much better than half a century ago points to this being the culprit.
You can't compare that to gene-editing treatments, that's two completely different level.
Self driving car were always almost feasible, 20 years ago top gear made cars you could drive with controller like kids do with toy cars. We already had camera and computer, it was just a matter of raw CPU performance and software development..
we have all the CPU performance we need and all software development we need and self-driving cars are driving around in roughly 0.00785% of world’s cities :)
And, there is the issue of data poisoning from untrusted nodes. I've almost cracked that last issue with a self-healing checkpointed rollback system that doesn't have to throw out anything that follows the corrupt datum.
But, I'm just one person with an idea and I don't have infinite funds to make this happen. This isn't a small project.
Maybe there would be interest in something like this, now that entire frontier labs are being banned from making further progress.
The total power of all GPUs on the planet dwarf their capabilities, if we had a way to harness them in a distributed way efficiently. We wouldn't be able to train a Fable as fast as them, but eventually having access is better than never having access.
reply