This is essentially an open research question. ML theory is unfortunately very weak relative to where the empirics are. I think there's a relatively optimistic paper that was posted a while back here but I would also take it with a grain of salt.
There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.
At a high level, the text samples are how the relationships are derived. If we treat text samples as sequences of tokens, then the sequences of tokens describe the joint distributions they occur together which confers the relationship between them. Iirc, this is related to the idea of the distributional hypothesis in NLP: the idea the semantics of words should be similar if they occur in similar situations.
One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.
Infinite loop without side effects == program stuck and not responding on user input and not outputting anything. That's not something a useful program will ever want to do.
Yes, the C++ committee has been making some stupid decisions lately. This is not the only one.
Low level platform-specific code that needs to hot spin until an interrupt happens can use assembly for that part which it will need to do for the interrupt handler anyway.
The problem is when you accidentally write an infinite loop. In a different language, you run the code, see that it gets stuck and fix it. In C, the compiler may delete the function, making it hard to realize what is happening.
I'll help. A call to `stpcpy` that ignores the return value can be swapped with a call to the (more likely to be optimized) `strcpy`. Since that's infinite recursion, and there is no forward progress, it's undefined behavior and anything goes.
This isn't just theory, it actually broke things in practice for me.
C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is a variable less than 10)
So I can't tell if the linked commit is an actual attempt or just an experiment but it did always strike me as odd to make a JS runtime in Zig when my impression was there were a lot of work-stopping compiler bugs at the time.
I'm curious how this view fits in with BERT or the T5 release which prior to the current LLM craze were the de facto language models for use in pretty much any tasks. Was this a position that would've otherwise grown without the llama release?
Some of those organizations (Linux and Mozilla) work on open source code for which they are already trained on. For clients like Apple, they almost surely have agreements to not do that.
Google acknowledges that they should have given notice per their own policy and that they violated it. In this case, they said that they violated it because they had failed to respond to the subpoena within ICE's 10-day deadline:
> On November 20, 2025, Google, through outside counsel, explained to the undersigned why Google did not give Thomas-Johnson advanced notice as promised. Google’s explanation shows the problem is systematic: Sometimes when Google does not fulfill a subpoena by the government’s artificial deadline, Google fulfills the subpoena and provides notice to a user on the same day to minimize delay for an overdue production. Google calls this “simultaneous notice.” But this kind of simultaneous notice strips users of their ability to challenge the validity of the subpoena before it is fulfilled.
At what point does Google’s incompetence imply organizations that use its services are liable for negligence?
What if this were a bogus subpoena for a lawyer’s privileged conversations with a client? A doctor’s communications about reproductive health with a patient? A political consultant working for the democrats?
A gag order would be from a judge. There would be severe penalties if a party breaks a gag order. A request not to notify is just a request; it has zero legal standing and there would be zero repercussions to ignoring it.
Google knows users care about their privacy, and it made the promise in its terms precisely for that reason. People pay attention to this stuff, as the popularity of this story shows.
Therefore, it's generally not going to be in Google's interest to break its own terms.
So what's going on? Did a Google employee simply mess up? Is the reporting not accurate or missing key details, e.g. Google truly is legally prohibited? Or is there some evidence that the Trump administration was putting pressure on Google, e.g. threatening to withhold some contract if this particular person were notified, or if Google continued notifying users belonging to some particular category of subpoenas?
Because Google isn't breaking its own terms just for funsies. There's more to this story, but unfortunately it's not clear what.
> Therefore, it's generally not going to be in Google's interest to break its own terms.
It is also not in Google’s interest to resist this administration. I would not be surprised if they decided to kiss the ring and be by internal policy more cooperative than what the law strictly says.
I guess we’ll get a better idea if more cases show up.
Previous administrations weren't easier to resist. Look up Joseph Nacchio's story. Short version: refuse to install https://en.wikipedia.org/wiki/MAINWAY without a warrant, go to jail.
>Google knows users care about their privacy, and it made the promise in its terms precisely for that reason. People pay attention to this stuff, as the popularity of this story shows.
Do Google users care about their privacy? I'd expect not, given that Google is (and hasn't been shy about telling us about it) reading all their emails in order to provide more targeted advertisements.
And, as I mentioned, Google hasn't been shy about saying that's exactly what they do (prioritizing their ad revenue over their users' privacy), so I have to assume that Google users don't care about their privacy.
If they did care about their privacy, they'd self-host their email on hardware they physically control.
That's orthogonal to Google giving up data to the government, with or without notifying the user(s) in question, except that the above makes clear what we already know: Google doesn't respect the privacy of their users.
> given that Google is (and hasn't been shy about telling us about it) reading all their emails in order to provide more targeted advertisements.
That hasn't been the case since 2017. Nearly a decade ago. They stopped precisely because Google users do care about privacy -- and tracking is one thing, but scanning the content of your e-mails is another.
And what you're linking to is NOT what you described, "in order to provide more targeted advertisements".
Your links are describing Gemini integration. If you ask Gemini a question about your e-mails, obviously it needs to look at them. If Google is suggesting a smart reply, obviously it needs to process your e-mail to do so. But these are features designed to benefit the user.
You were talking about target advertising. That's not what your links have anything to do with.
[0]: "Google publicly announced in 2017 it would stop using Gmail content for ad targeting but continued to scan emails for spam, malware, and other non-ad functionality, which leaves room for ambiguity about downstream uses of metadata or other signals"
Who cares why Google is reading your emails? Not me.
Oh, it's just for non-ad functionality? In that case, go right ahead!
> Google knows users care about their privacy, and it made the promise in its terms precisely for that reason. People pay attention to this stuff, as the popularity of this story shows.
Does it know? And do users really care? Popularity on HN isn't popularity everywhere.
I'd wager most people don't care enough to move away from Gmail.
But even if they did, unfortunately this isn't the only variable a business is solving for. Corporations will generally just pick between the least unprofitable of two evils, not the lesser of.
Depends on how legitimate you consider an administrative warrant and how willingly you think complying with one is.
On a more practical level, forcing them to go to court might not be much better. If this went to a FISA court, those are essentially rubber stamps and give nearly 100% approval.
https://arxiv.org/abs/2604.21691
There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.
reply