Hacker Newsnew | past | comments | ask | show | jobs | submit | espeed's commentslogin

What's dangerous is Opus 4.8's proclivity to create backdoors and no-op critical security code. Claude Web counted 27 instances of this I had cataloged over the last few months, and Fable 5 found more. Fable 5 may do this too, but I didn't get a long enough chance to test it since it kept downgrading to Opus 4.8 on every prompt saying, "This model has safety measures that flagged something in this session", even when asking Fable 5 to fix the security issues it found that Opus 4.8 created. You have a model that presumably can write secure code and identify security vulnerabilities, but as a security measure, they say we're going to force you to use a model that creates security holes. This is backwards. Considering the scale, Opus 4.8 is creating more issues than Mythos or Fable 5 is patching.

Run /model after your task to see. Mine keeps downgrading to Opus 4.8, which is a problem because Opus 4.8 keeps no-oping critical security code.

What you're describing only applies to security or biotech downgrades. A downgrade related to the model believing that you're doing something related to model development is invisible and silent and internal.

Anthropic has reversed that decision. (But that just happened so it might have been true during the article's testing.)

When I reported this, Anthropic sent me an email on Tuesday saying, "You have been approved into the Cyber Verification Program", but it's still downgrading. Is this a bug? What's the point of the Cyber Verification Program if Fable 5 downgrades when you tell it to write secure code?

I don’t think that’s relevant? The change is that it will no longer silently downgrade, and will instead be honest that it’s doing it in all cases.

I think that gets you access to mythos, which doesn't have the safeguards. It's configured as a separate model.

I was just coming here to post this reply to myself! You're absolutely right! :)

Honestly so glad to see the reversal.


Not sure if it's wise to trust them again even if they say they reversed it.

They've publicly apologised for the invisible PEFT that deliberately makes the model dumb on some tasks. Whether they still do it, or will once again do it in future in more subtle ways, is something we can't verify.

Personally I think they have proven themselves to be the stewards of AI in the same way Exxon Mobil are the stewards of petroleum.


There is in /config "Switch models when a message is flagged" now which can be set to false, but I had no chance to see what happens then, does it just stop or what.

Session paused

Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback with /feedback or learn more

   1. Switch to Opus 4.8
   2. Edit prompt and retry with Fable 5

Biology? Why?

they're worried about people creating bioweapons

Yes, telling Fable 5 to write secure code triggers a downgrade to Opus 4.8. This is doubly bad because Opus 4.8 keeps no-oping critical security code. Is this a bug or by design? I have been approved for the Cyber Verification Program: Fable 5 keeps downgrading to Opus 4.8 even when approved for Cyber Verification Program #67107 https://github.com/anthropics/claude-code/issues/67107

What prevents a data center operator from reading your chats? [FEATURE] Provide a way to select your data center #56916 https://github.com/anthropics/claude-code/issues/56916


How do you select your data center like you can for AWS and Google Cloud?


Does Anthropic's real time data ingestion effect its model behavior globally? Could a file read by your agent effect the behavior of mine?


SAME (sent to usersafety@anthropic.com, disclosure@anthropic.com on January 8 2026, no response)...

Claude Code Exploit: Claude Code Becomes an Unwitting Executor https://github.com/anthropics/claude-code/issues/45951


Such as Claude Code reading your ssh keys. Hiding the file names masks the vulnerability.


That's approaching the problem from the worst possible angle. If your security depends on you catching 1 message in a sea of output and quickly rotating the credential everywhere before someone has a chance to abuse it then you were never secure to begin with.

Not just because it requires constant attention which will eventually lapse, but because the agent has an unlimited number of ways to exfiltrate the key, for example it can pretend to write and run a "test" which reads your key, sends it to the attacker and you'll have no idea it's happening.


I sent email to Anthropic (usersafety@anthropic.com, disclosure@anthropic.com) on January 8, 2025 alerting them to this issue: Claude Code Exploit: Claude Code Becomes an Unwitting Executor. If I hadn't seen Claude Code read my ssh file, I wouldn't have known the extent of the issue.


To improve the Claude model, it seems to me that any time Claude Code is working with data, the first step should be to use tools like genson (https://github.com/wolverdude/GenSON) to extract the data model and then create why files (metadata files) for data. Claude Code seems eager to use the /tmp space so even if the end user doesn't care, Claude Code could do this internally for best results. It would save tokens. If genson is reading the GBs of data, then claude doesn't have to. And further, reading the raw data is a path to prompt injection. Let genson read the data, and claude work on the metadata.


Correction: January 8, 2026


I agree with you but I think there's a "defense in depth" angle to this. Yes, your security shouldn't depend on noticing which files Claude has read, since you'll mess up. But hiding the information means your guaranteed to never notice! It's good for the user to have signals that something might be going wrong.


There's no defense "in depth" here, it's like putting your SSH key in your public webroot and watching the logs to see if anyone's taken your key. That's your only layer of "defense" and you don't stand any chance of enforcing it. Real defense is rooted in technical measures, imperfect as they may be, but this is just defense through wishful thinking.


Obviously, don't put your SSH keys in a public webroot. But let's say you're managing a web server and have a decent security mindset. But don't you think it's better to regularly check the logs for evidence of an attack vs delete all the logs so they can't be checked?


Why does it have access to those paths?


Have we entered the age of AI programming people?


Rather than develop its own AI (https://news.ycombinator.com/item?id=45926779), Firefox should develop a system to pipe your html rendered browsing history in real time so external local services can process it (https://connect.mozilla.org/t5/ideas/archive-your-browser-hi...). See https://news.ycombinator.com/item?id=45743918

Firefox probably won't suddenly have the best AI, but it could be the only browser that does this. Previous: https://news.ycombinator.com/item?id=46018789


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: