So you take action and put in more effort to cater to the LLM to get it to do what you want, but it's not arguing because there's no record of it in the chat? Presumably you put in what you would have written in the counter-argument into the new chat, just ahead of the LLM refusal? And this isn't arguing?
> but it's not arguing because there's no record of it in the chat?
Yes? Arguing implies I have to convince someone to believe something. I don't think anyone would consider it winning an argument if you do so by causing amnesia.
My job is to get work done, not argue with an LLM, if it refuses twice, it is time for a /clear.
100% of the time, the issue is resolved after a /clear.
It often start going into circles when you have the chat open for medium-long, and starts getting even easily-verifiable tasks wrong, cutting corners, hallucinating APIs, things like that.
Cleaning the prompt and starting from scratch often does the trick.
Of course someone will arrive and say the problem is my CLAUDE.md or whatever it is.
I agree that never having the argument take place textually is important for LLM performance and behavior. I still think we’re investing the same time and intellectual energy arguing with the model, in going back and restructuring context and prompting to head off / pre-answer a refusal.
Right but the difference is there is inertia you have to fight in an argument. By using /clear you remove all of the context that has built up to energize the argument from the LLM's side.
Look at it this way. I can either, keep trying to poke holes in the LLM's context with more prompts with no real guarantee that it won't be enough to remove the argument inertia that has built up in context on its side, or I can /clear and it is over in one turn because the inertia for the argument is all gone.
Back when I first started working with coding agents last year I fell into this arguing with the LLMs trap. I've found that it is a total waste of time because /clear ends the argument immediately. You don't even need to spend time trying to preempt it's views. Just re-prompt and 100% of the time, the LLM will just do the work.
I've seen exactly this behavior on claude.com with no system prompt with Opus 4.8 specifically, especially around chronic illness stuff where there's established mainstream medicine dogma and reddit / internet communities with alternate causality theories and treatment approaches (PMDD and MCAS-adjacent illness). 4.6 is happy to analyze and consider them, 4.8 really doesn't like the alternate theories and treatments.
Once it's in this loop, Opus 4.8 digs in so aggressively it's structurally incapable of conceding a provided detail as correct, even if it's conceded and agreed with everything backing that detail. Like actually, structurally incapable. I've even baited it into arguing with itself when I've "conceded" its original concern tolling hard, and then the model needs to continue to be the "voice of reason" and it will argue against its original concern because I, the user, said it.
How difficult it is to resist "someone is wrong on the internet" is a perennial joke. Turns out it doesn't really matter who/what is on the other side if they seem human-like.
4.7 and 4.8 perform better than 4.6, so why is someone ranting about it being killed? And, Anthropic has 2500 employees, several of whom are higher up on the corporate hierarchy than "the woman who killed Claude". If someone is to blame for some change that happened, the buck doesn't stop with that woman.
So, I'm not reading all that. The man that complained about the woman who killed his AI girlfriend (or whatever he thinks she did) probably doesn't have any opinions I'm interested in.
It's been interesting to see how aggressively some reasoning models like to "reason" by analogy. They love to say things like "it's like a CPU" or "it's like a highway", and then they start to make logical leaps based off that rather than just using it for user explanation. Gemini 2.5 and 3.1 Pro have been particularly bad for this type of behavior. Telling models to "speak as though you are a physiologist considering the case with an expert colleague" gets them to "reason" using a more correct linguistic substrate.
The Opus models over the last year doesn't seem as vulnerable to this type of behavior and I've noticed the "identify as expert" prompt tricks aren't as meaningful there.
Your language is ambiguous — your horror is in reference to natural gas turbine generators (used at these installations) and not gasoline generators (like in a home context)?
Why the horror? I'd prefer the gas remain in the ground, but given the gassy production of US shale oil, I guess I'd rather it be used for this than just flared. I am frustrated that pollutant emissions aren't being policed, and also that the sudden turbine demand plus supply chain issues mean using aeroderivative turbines that are quite a bit less efficient than more complex combined cycle turbines.
There is currently 2x us electricity production in solar and batteries stuck in permit hell due to the US requiring they pay for grid upgrades before connection in a first in first out line that has grown in length and costs.
We could have cheap and available renewables, but we instead destroy them in bureaucratic hell that nobody cares about.
> due to the US requiring they pay for grid upgrades before connection
Is that not perfectly reasonable? Someone doing half the job and dumping the rest on everyone else seems like exactly the sort of thing a regulator exists to prevent.
Reading between the lines, it sounds like the issue is that solar would be located somewhere remote, the backhaul to get that electricity where it needs to be requires significant upgrades, and that takes time. Which is unfortunate and indicates historic mismanagement of said infrastructure but nonetheless the present day policy of "fix the problem first" seems perfectly reasonable.
> I guess I'd rather it be used for this than just flared
I doubt this is really reducing the rates of flaring and leaky wells. Its just additional demand.
The biggest problem I've seen is they tend to build these somewhat close to residential areas with generation on-site. Often these power generation centers aren't right next to residential areas due to both air and noise pollution. But governments are often seeming to turn a blind eye.
Yes, the noise pollution is insane. Benn Jordan's YT video "Datacenters Behaving Like Acoustic Weapons" is an insightful, scary 30 min video covering the datacenter infrasound noise, and the nasty things infrasound does to people: https://www.youtube.com/watch?v=_bP80DEAbuo
I'll admit that I miss having access to the ChatGPT 4.5 "absolutely gigantic model" with enough tuning to make it sane and useful. The RLVR models are superb for actual tasks in those RLVR domains, but that fine tuned view of the world as a verifiable problem to solve makes them feel worse for touchy feely stuff. Even for medical consultation and diagnosis, RLVR model's urge to reach a conclusion often is a liability.
It's RLVR tuned, but not to the ChatGPT level of brain damage, and it's still backed by a fuck off huge pool of model weights - which matters for what you call "touchy feely stuff".
They're also obviously fine with breaking eggs to make an omelette. Given their history, they seem to regard breaking eggs as the goal, and making an omelette as an afterthought.
reply