Agreed. Moreover, the authors of copyright law could never have anticipated this type and scale of abuse. Maybe the companies are legally in the right, maybe not, but that's irrelevant for the question of whether it's ethical. The EFF's post definitely goes against their mission to "ensure that technology supports freedom, justice, and innovation for all people of the world."
I expect any well-informed corporate lawyer that has thought about this carefully is strongly advising that these tools not be used. When the LLM [0] barfs up some nontrivial code that's covered by the AGPL and your company's devs put it into the company's "all rights reserved" codebase -entirely unaware of its provenance- it's going to be a nightmare to come back from that.
[0] ...that Nvidia's CEO says they should be spending 50% of a senior dev's salary per seat per year on...
Oh definitely not. We're not yet solidly out of the "extremely exuberant hype" phase, so the folks that matter tend to not ask questions that dampen the mood.
Sorry to tell you friend, but LLMs have touched the vast majority of active codebases out there, whether you like it or not. You can tell yourself that you’re one of “the folks that matter” (lol) all you want, but we’re never going back.
That's what people told Ignaz Semmelweis, too, I assume. "Nothing you can do, the powers that be decided, you are a minority, you don't matter, lol!" Snickering in the shadow of what they won't confront at those who do.
Not a great analogy. A better analogy is to longbows and muskets/rifles. Longbows in the hands of a skilled user were much better weapons than early muskets, but muskets brought consistency, a lower skill floor and reduced ammunition cost. Fast forward a few hundred years and the modern incarnations of muskets make longbows look silly, and nobody would ever argue that you should go to war with longbows.
You don’t even know what we’re talking about in this thread, do you?
We’re talking about whether corporations are going to risk using LLMs in their codebase because of the theoretical legal risk that they might produce something that would fall under open source licenses, and be difficult to untangle later.
Regardless of what you think the morality is here, or what the legal situation turns out to be, this is already happening. The vast majority of corporate codebases are already “infected” by LLM outputs. Even at corporations where that’s not allowed, I promise there are devs using LLMs anyway.
It's not just about collective agreement, there's a prisoner's dilemma in there.
If some segment of engineers uses agents and outperforms engineers who don't use agents, market forces will push all other engineers to use it over time. The only way we're going back is if we get concrete evidence that engineers using agents perform worse than engineers that don't, and that evidence isn't invalidated by improved models.
> You can tell yourself that you’re one of “the folks that matter” (lol)...
kek. I'm a frequent commenter on HN. I'm definitely not one of the folks that matter.
> ...LLMs have touched the vast majority of active codebases out there...
I agree that LLM use is widespread. I disagree that LLMs have "touched the vast majority of active codebases".
Regardless, the courts are slow and Open Source licensevio cases are even slower. You seem like you'd be unaware of how terrified so many businesses are of having AGPL code deployed in their systems. In my professional experience, a great many businesses will refuse to deploy systems that contain AGPL-licensed utilities... even if those utilities are only used for internal housekeeping purposes, and whose only remote communications method is a UNIX socket used for communications with a CLI control utility that can only be used when you're SSHed into the system. If they're aware of any AGPL'd code anywhere, they will not touch it.
No amount of LLM-provider-provided indemnification can save you from license obligations you've become bound to by creating and distributing a derivative work. People who are in the know know that these tools occasionally regurgitate nontrivial portions of their input data, verbatim. Such people also know that AGPL-licensed code is absolutely in their input data. I'd wager that getting a nontrivial amount of *GPL'd code plopped into your company's "all-rights-reserved" codebase by one of these tools is more likely than the typical US driver personally being in a nontrivial automobile collision.
In the US, people go their entire lives without getting in nontrivial automobile collisions, but they usually wear their seatbelts... even prior to widely-deployed surveillance cameras. I wonder why. It seems like awful lot of boring, repetitive work for a thing that's really never going to happen to you in your lifetime.
I mean people expect a model to give a working solution. They also expect it to provide it in as few tokens as possible (input/output). They might expect it to come up with an original solution, but I don't think most people would compromise on the first two points.
For cyberattacks especially, where things are often roughly interchangeable, I wonder if one could construct a harness where a "weaker" model asks questions that obfuscate the end purpose, but whose answers are still useful, and still show that this setup enables autonomous exploitation. If it were successful, that would force them to be even more sensitive with their detection.
It's utterly bonkers. Hopefully the model weights get leaked. Then we can claim it's public domain or, at the very least, distill it and then release it for free.
Not impressed so far, to be honest. I'm having it try to optimize Stockfish in a loop (on xhigh mode) with a benchmarking oracle; even after giving it specific hints ("consider whether we're prefetching Y optimally, can we make function X branchless"), it's been so far unable to recover any of the recent optimizations we've implemented – let alone novel ones. Opus 4.8 felt a bit more creative to me ... but a small sample size so far. I'm next going to try it on some less open-ended problems.
Edit: It did correctly identify that transparent huge pages were off in its sandboxed environment and that enabling it was helpful, so that's nice. It also noticed that we skip THP on a certain less used path.
More importantly, I'm finding that the code that it produces for its experiments is a lot cleaner than what I'd expect out of Opus; there's fewer useless comments and it's more surgical and readable. I wonder if that explains the increased scores on benchmarks measuring mergability.
Stockfish is a machine learning system, it seems quite plausible you might be getting slapped with the silent performance degradation (https://news.ycombinator.com/item?id=48467896).
Well they're not fully charging you. You get opus 4.8 pricing when it falls back to opus 4.8. Also you can disable it (and it seems like it's off by default in the api)
That don't fall back to Opus if their classifier thinks you might be working on anything that might be a competitor's product. It silently injects instructions into the prompt to sabotage your work. Read the policy above, it's insane to me that they're publicly admitting to this.
Doesn't this "silent degredation" prevent any actual evaluation of the model? If the model fails at something, this allows anyone to claim that it failed due to degradation.
Who cares if it can be evaluated independently? The majority of commenters on HN were happy to vibe code and ship products with the models we had 1-2 years ago. It continues to be laughable.
I understand that moving the goalpost every release is unfair, but it's similarly concerning to consider that people were letting GPT 4.X vibe code and ship entire products.
No, since it's a silent failure, it's not plausible. We have to assume all results we get are the actual model performance, because, it's the actual model performance as we understand it.
Someone trying to solve similar problems will have similar results if the "silent failure" applies consistently in aggregate. So, this is the model's performance.
It’s possible this is happening at a technical level, but I have a hard time believing this is in the spirit of what Anthropic intends to throttle. It isn’t chip design or building out a competitor to Claude.
Stockfish does use neural nets but they are tiny, on the order of 10M params. Frontier LLMs are probably 100k or 1M times larger than that.
Yeah I agree this is probably outside of the intended scope of the silent sabotage mechanism, but there are plenty of reports of the "loud" safety classifier misfiring on innocuous requests and I'm not going to assume the silent failure mode is _less_ prone to false positives.
Edit: Another developer seems to have found a legitimate speedup with Fable in an optimization loop. It's a nice idea, actually, and I'm duly impressed.
reply