More

inkysigma · 2026-06-06T09:22:30 1780737750

This is essentially an open research question. ML theory is unfortunately very weak relative to where the empirics are. I think there's a relatively optimistic paper that was posted a while back here but I would also take it with a grain of salt.

https://arxiv.org/abs/2604.21691

There's of course empirical results and relatively weak theoretical results like the UAT but I also don't think that answers your question fully, especially since it seems impossible to definitively answer questions that the industry seems to betting on like whether or not there is a lower bound to their error rate or whether hallucination as a problem can be solved. We have much stronger ideas of what linear regression is doing relative to what LLMs are doing.

inkysigma · 2026-06-06T09:20:41 1780737641

At a high level, the text samples are how the relationships are derived. If we treat text samples as sequences of tokens, then the sequences of tokens describe the joint distributions they occur together which confers the relationship between them. Iirc, this is related to the idea of the distributional hypothesis in NLP: the idea the semantics of words should be similar if they occur in similar situations.

inkysigma · 2026-05-20T07:27:15 1779262035

One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.

marcosdumay · 2026-05-20T17:04:32 1779296672

There is that famous example where when you write an infinite loop last thing in your main, a function that you never called runs instead.

account42 · 2026-05-20T09:09:59 1779268199

Infinite loop without side effects == program stuck and not responding on user input and not outputting anything. That's not something a useful program will ever want to do.

Certhas · 2026-05-20T09:30:34 1779269434

Not true, C++ made it so trivial infinite loops are not UB because it turns out they do have legitimate uses.

https://lists.isocpp.org/std-proposals/2020/05/1322.php

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p28...

account42 · 2026-05-20T09:37:37 1779269857

Yes, the C++ committee has been making some stupid decisions lately. This is not the only one.

Low level platform-specific code that needs to hot spin until an interrupt happens can use assembly for that part which it will need to do for the interrupt handler anyway.

TuxSH · 2026-05-20T19:25:38 1779305138

You don't even need to use assembly for this, the wait for interrupt typically involves side effects.

xigoi · 2026-05-20T09:28:48 1779269328

The problem is when you accidentally write an infinite loop. In a different language, you run the code, see that it gets stuck and fix it. In C, the compiler may delete the function, making it hard to realize what is happening.

account42 · 2026-05-20T09:38:56 1779269936

This is not a problem that C or C++ programmers actually encounter, ever.

ncruces · 2026-05-20T14:51:05 1779288665

I actually encountered it a couple weeks ago.

Can you spot the infinite loop in this function?

  char* strcpy(char* restrict d, const char* restrict s) {
    stpcpy(d, s);
    return d;
  }

I'll help. A call to `stpcpy` that ignores the return value can be swapped with a call to the (more likely to be optimized) `strcpy`. Since that's infinite recursion, and there is no forward progress, it's undefined behavior and anything goes.

This isn't just theory, it actually broke things in practice for me.

rssoconnor · 2026-05-21T13:35:23 1779370523

Naming an externally linked function with the prefix "str" is by itself UB since that prefix is reserved for <string.h>.

ncruces · 2026-05-21T14:04:54 1779372294

Right. I am doing a standard library replacement, and forgot I needed to compile freestanding. Oops. My bad.

So `str` and `mem` are reserved. But then so are `to` and `is` (by <ctype.h>). Just forget about having a function named `is_valid_user`.

And so are `mtx_`, `cnd_`, `thrd_`, `atomic_`, `memory_`...

Which is why... everything in C is UB.

1718627440 · 2026-05-20T10:04:31 1779271471

Note, that this is not true for C.

zarzavat · 2026-05-20T09:33:18 1779269598

https://9p.io/sources/plan9/sys/src/libc/9sys/abort.c

account42 · 2026-05-20T09:40:05 1779270005

This is already UB without an infinite loop.

1718627440 · 2026-05-20T10:04:59 1779271499

That's only true in C++ though, not in C.

dzaima · 2026-05-20T10:21:36 1779272496

C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is a variable less than 10)

inkysigma · 2026-05-05T01:26:43 1777944403

So I can't tell if the linked commit is an actual attempt or just an experiment but it did always strike me as odd to make a JS runtime in Zig when my impression was there were a lot of work-stopping compiler bugs at the time.

ivanjermakov · 2026-05-05T07:14:13 1777965253

Considering no public announcement this is just an experiment, possibly leaked.

inkysigma · 2026-04-27T21:33:47 1777325627

I'm curious how this view fits in with BERT or the T5 release which prior to the current LLM craze were the de facto language models for use in pretty much any tasks. Was this a position that would've otherwise grown without the llama release?

inkysigma · 2026-04-23T08:41:52 1776933712

Some of those organizations (Linux and Mozilla) work on open source code for which they are already trained on. For clients like Apple, they almost surely have agreements to not do that.

inkysigma · 2026-04-15T23:00:48 1776294048

Well it did contain a request to not notify according to that same letter. I suppose that brings up several questions.

1. Does that mean the same thing in the ToS?

2. How valid are these requests?

nightpool · 2026-04-16T03:53:16 1776311596

Google acknowledges that they should have given notice per their own policy and that they violated it. In this case, they said that they violated it because they had failed to respond to the subpoena within ICE's 10-day deadline:

> On November 20, 2025, Google, through outside counsel, explained to the undersigned why Google did not give Thomas-Johnson advanced notice as promised. Google’s explanation shows the problem is systematic: Sometimes when Google does not fulfill a subpoena by the government’s artificial deadline, Google fulfills the subpoena and provides notice to a user on the same day to minimize delay for an overdue production. Google calls this “simultaneous notice.” But this kind of simultaneous notice strips users of their ability to challenge the validity of the subpoena before it is fulfilled.

hedora · 2026-04-16T04:03:03 1776312183

At what point does Google’s incompetence imply organizations that use its services are liable for negligence?

What if this were a bogus subpoena for a lawyer’s privileged conversations with a client? A doctor’s communications about reproductive health with a patient? A political consultant working for the democrats?

hluska · 2026-04-15T23:27:47 1776295667

A gag order would be from a judge. There would be severe penalties if a party breaks a gag order. A request not to notify is just a request; it has zero legal standing and there would be zero repercussions to ignoring it.

crazygringo · 2026-04-15T23:36:12 1776296172

I'm very curious about this.

Google knows users care about their privacy, and it made the promise in its terms precisely for that reason. People pay attention to this stuff, as the popularity of this story shows.

Therefore, it's generally not going to be in Google's interest to break its own terms.

So what's going on? Did a Google employee simply mess up? Is the reporting not accurate or missing key details, e.g. Google truly is legally prohibited? Or is there some evidence that the Trump administration was putting pressure on Google, e.g. threatening to withhold some contract if this particular person were notified, or if Google continued notifying users belonging to some particular category of subpoenas?

Because Google isn't breaking its own terms just for funsies. There's more to this story, but unfortunately it's not clear what.

pyrale · 2026-04-16T02:55:08 1776308108

> Therefore, it's generally not going to be in Google's interest to break its own terms.

It is also not in Google’s interest to resist this administration. I would not be surprised if they decided to kiss the ring and be by internal policy more cooperative than what the law strictly says.

I guess we’ll get a better idea if more cases show up.

timschmidt · 2026-04-16T03:16:39 1776309399

Previous administrations weren't easier to resist. Look up Joseph Nacchio's story. Short version: refuse to install https://en.wikipedia.org/wiki/MAINWAY without a warrant, go to jail.

nobody9999 · 2026-04-16T02:43:30 1776307410

>Google knows users care about their privacy, and it made the promise in its terms precisely for that reason. People pay attention to this stuff, as the popularity of this story shows.

Do Google users care about their privacy? I'd expect not, given that Google is (and hasn't been shy about telling us about it) reading all their emails in order to provide more targeted advertisements.

And, as I mentioned, Google hasn't been shy about saying that's exactly what they do (prioritizing their ad revenue over their users' privacy), so I have to assume that Google users don't care about their privacy.

If they did care about their privacy, they'd self-host their email on hardware they physically control.

That's orthogonal to Google giving up data to the government, with or without notifying the user(s) in question, except that the above makes clear what we already know: Google doesn't respect the privacy of their users.

crazygringo · 2026-04-16T15:05:55 1776351955

> given that Google is (and hasn't been shy about telling us about it) reading all their emails in order to provide more targeted advertisements.

That hasn't been the case since 2017. Nearly a decade ago. They stopped precisely because Google users do care about privacy -- and tracking is one thing, but scanning the content of your e-mails is another.

nobody9999 · 2026-04-16T18:26:48 1776364008

Yeah, sure. Whatever you say.

https://www.forbes.com/sites/zakdoffman/2025/05/07/googles-g...

https://www.malwarebytes.com/blog/news/2025/11/gmail-is-read...

https://android.gadgethacks.com/how-to/gmails-ai-secretly-re...

https://www.howtogeek.com/gmail-might-be-harvesting-your-ema...

https://cyberguy.com/privacy/stop-google-ai-scanning-gmail/

crazygringo · 2026-04-16T19:11:16 1776366676

Please don't be rude.

And what you're linking to is NOT what you described, "in order to provide more targeted advertisements".

Your links are describing Gemini integration. If you ask Gemini a question about your e-mails, obviously it needs to look at them. If Google is suggesting a smart reply, obviously it needs to process your e-mail to do so. But these are features designed to benefit the user.

You were talking about target advertising. That's not what your links have anything to do with.

nobody9999 · 2026-04-17T01:31:19 1776389479

>Please don't be rude.

[0]: "Google publicly announced in 2017 it would stop using Gmail content for ad targeting but continued to scan emails for spam, malware, and other non-ad functionality, which leaves room for ambiguity about downstream uses of metadata or other signals"

Who cares why Google is reading your emails? Not me.

Oh, it's just for non-ad functionality? In that case, go right ahead!

Ugh!

[0] https://factually.co/fact-checks/technology/email-scanning-f...

crazygringo · 2026-04-17T02:20:48 1776392448

You're upset Gmail blocks spam and malware?

nobody9999 · 2026-04-17T17:51:00 1776448260

>Please don't be rude.

thayne · 2026-04-16T00:59:14 1776301154

It's quite possible that google is more afraid of what will happen if they resist ICE than they are of bad publicity like this.

nextaccountic · 2026-04-16T02:57:53 1776308273

It's not just bad publicity. They may be sued

But yeah no matter the amount they lose in courts, it's inconsequential compared to angering this federal administration even a little bit

airstrike · 2026-04-16T00:57:14 1776301034

> Google knows users care about their privacy, and it made the promise in its terms precisely for that reason. People pay attention to this stuff, as the popularity of this story shows.

Does it know? And do users really care? Popularity on HN isn't popularity everywhere.

I'd wager most people don't care enough to move away from Gmail.

But even if they did, unfortunately this isn't the only variable a business is solving for. Corporations will generally just pick between the least unprofitable of two evils, not the lesser of.

inkysigma · 2026-04-15T22:40:54 1776292854

Depends on how legitimate you consider an administrative warrant and how willingly you think complying with one is.

On a more practical level, forcing them to go to court might not be much better. If this went to a FISA court, those are essentially rubber stamps and give nearly 100% approval.

inkysigma · 2026-04-09T18:20:22 1775758822

His anti censorship stance isn't necessarily born out by the data:

https://www.washingtonpost.com/technology/2024/09/25/elon-mu...

inkysigma · 2026-04-09T18:18:59 1775758739

X under Musk has sustained more government takedown requests.

https://www.washingtonpost.com/technology/2024/09/25/elon-mu...