More

espeed · 2026-06-13T19:39:01 1781379541

What's dangerous is Opus 4.8's proclivity to create backdoors and no-op critical security code. Claude Web counted 27 instances of this I had cataloged over the last few months, and Fable 5 found more. Fable 5 may do this too, but I didn't get a long enough chance to test it since it kept downgrading to Opus 4.8 on every prompt saying, "This model has safety measures that flagged something in this session", even when asking Fable 5 to fix the security issues it found that Opus 4.8 created. You have a model that presumably can write secure code and identify security vulnerabilities, but as a security measure, they say we're going to force you to use a model that creates security holes. This is backwards. Considering the scale, Opus 4.8 is creating more issues than Mythos or Fable 5 is patching.

espeed · 2026-06-11T20:37:41 1781210261

Run /model after your task to see. Mine keeps downgrading to Opus 4.8, which is a problem because Opus 4.8 keeps no-oping critical security code.

tekacs · 2026-06-11T21:10:36 1781212236

What you're describing only applies to security or biotech downgrades. A downgrade related to the model believing that you're doing something related to model development is invisible and silent and internal.

steveklabnik · 2026-06-11T21:20:21 1781212821

Anthropic has reversed that decision. (But that just happened so it might have been true during the article's testing.)

espeed · 2026-06-11T23:43:24 1781221404

When I reported this, Anthropic sent me an email on Tuesday saying, "You have been approved into the Cyber Verification Program", but it's still downgrading. Is this a bug? What's the point of the Cyber Verification Program if Fable 5 downgrades when you tell it to write secure code?

steveklabnik · 2026-06-12T01:34:27 1781228067

I don’t think that’s relevant? The change is that it will no longer silently downgrade, and will instead be honest that it’s doing it in all cases.

rattray · 2026-06-12T03:23:28 1781234608

I think that gets you access to mythos, which doesn't have the safeguards. It's configured as a separate model.

tekacs · 2026-06-11T21:28:32 1781213312

I was just coming here to post this reply to myself! You're absolutely right! :)

Honestly so glad to see the reversal.

matheusmoreira · 2026-06-12T00:26:54 1781224014

Not sure if it's wise to trust them again even if they say they reversed it.

wren6991 · 2026-06-12T16:06:11 1781280371

They've publicly apologised for the invisible PEFT that deliberately makes the model dumb on some tasks. Whether they still do it, or will once again do it in future in more subtle ways, is something we can't verify.

Personally I think they have proven themselves to be the stewards of AI in the same way Exxon Mobil are the stewards of petroleum.

comboy · 2026-06-11T20:41:41 1781210501

There is in /config "Switch models when a message is flagged" now which can be set to false, but I had no chance to see what happens then, does it just stop or what.

espeed · 2026-06-11T22:15:31 1781216131

Session paused

Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback with /feedback or learn more

   1. Switch to Opus 4.8
   2. Edit prompt and retry with Fable 5

staticautomatic · 2026-06-12T04:39:38 1781239178

Biology? Why?

adgjlsfhk1 · 2026-06-12T05:35:38 1781242538

they're worried about people creating bioweapons

espeed · 2026-06-11T15:46:45 1781192805

Yes, telling Fable 5 to write secure code triggers a downgrade to Opus 4.8. This is doubly bad because Opus 4.8 keeps no-oping critical security code. Is this a bug or by design? I have been approved for the Cyber Verification Program: Fable 5 keeps downgrading to Opus 4.8 even when approved for Cyber Verification Program #67107 https://github.com/anthropics/claude-code/issues/67107

espeed · 2026-05-21T06:16:35 1779344195

What prevents a data center operator from reading your chats? [FEATURE] Provide a way to select your data center #56916 https://github.com/anthropics/claude-code/issues/56916

espeed · 2026-05-07T03:49:13 1778125753

How do you select your data center like you can for AWS and Google Cloud?

espeed · 2026-04-12T20:53:23 1776027203

Does Anthropic's real time data ingestion effect its model behavior globally? Could a file read by your agent effect the behavior of mine?

espeed · 2026-04-09T22:16:52 1775773012

SAME (sent to usersafety@anthropic.com, disclosure@anthropic.com on January 8 2026, no response)...

Claude Code Exploit: Claude Code Becomes an Unwitting Executor https://github.com/anthropics/claude-code/issues/45951

espeed · 2026-02-12T03:49:53 1770868193

Such as Claude Code reading your ssh keys. Hiding the file names masks the vulnerability.

dns_snek · 2026-02-12T11:19:49 1770895189

That's approaching the problem from the worst possible angle. If your security depends on you catching 1 message in a sea of output and quickly rotating the credential everywhere before someone has a chance to abuse it then you were never secure to begin with.

Not just because it requires constant attention which will eventually lapse, but because the agent has an unlimited number of ways to exfiltrate the key, for example it can pretend to write and run a "test" which reads your key, sends it to the attacker and you'll have no idea it's happening.

espeed · 2026-02-12T15:12:00 1770909120

I sent email to Anthropic (usersafety@anthropic.com, disclosure@anthropic.com) on January 8, 2025 alerting them to this issue: Claude Code Exploit: Claude Code Becomes an Unwitting Executor. If I hadn't seen Claude Code read my ssh file, I wouldn't have known the extent of the issue.

espeed · 2026-02-12T20:45:48 1770929148

To improve the Claude model, it seems to me that any time Claude Code is working with data, the first step should be to use tools like genson (https://github.com/wolverdude/GenSON) to extract the data model and then create why files (metadata files) for data. Claude Code seems eager to use the /tmp space so even if the end user doesn't care, Claude Code could do this internally for best results. It would save tokens. If genson is reading the GBs of data, then claude doesn't have to. And further, reading the raw data is a path to prompt injection. Let genson read the data, and claude work on the metadata.

espeed · 2026-02-15T18:35:41 1771180541

Correction: January 8, 2026

Wowfunhappy · 2026-02-12T12:02:27 1770897747

I agree with you but I think there's a "defense in depth" angle to this. Yes, your security shouldn't depend on noticing which files Claude has read, since you'll mess up. But hiding the information means your guaranteed to never notice! It's good for the user to have signals that something might be going wrong.

dns_snek · 2026-02-12T14:57:52 1770908272

There's no defense "in depth" here, it's like putting your SSH key in your public webroot and watching the logs to see if anyone's taken your key. That's your only layer of "defense" and you don't stand any chance of enforcing it. Real defense is rooted in technical measures, imperfect as they may be, but this is just defense through wishful thinking.

Wowfunhappy · 2026-02-13T01:16:25 1770945385

Obviously, don't put your SSH keys in a public webroot. But let's say you're managing a web server and have a decent security mindset. But don't you think it's better to regularly check the logs for evidence of an attack vs delete all the logs so they can't be checked?

andersa · 2026-02-12T12:01:19 1770897679

Why does it have access to those paths?

espeed · 2026-01-21T08:39:15 1768984755

Have we entered the age of AI programming people?

espeed · 2025-12-16T20:49:03 1765918143

Rather than develop its own AI (https://news.ycombinator.com/item?id=45926779), Firefox should develop a system to pipe your html rendered browsing history in real time so external local services can process it (https://connect.mozilla.org/t5/ideas/archive-your-browser-hi...). See https://news.ycombinator.com/item?id=45743918

Firefox probably won't suddenly have the best AI, but it could be the only browser that does this. Previous: https://news.ycombinator.com/item?id=46018789