Could you provide some details, if possible, like what model & thinking effort, what kinds of tasks? I used to swap between Claude Code and Codex often, and these days use Codex more because of the usage limits. Wondering if I should go to Claude for a month, I get a strange FOMO when I read vague comments like this.
The one major difference I noticed is that the GPT models are more analytical (e.g. better at mathematical analysis, code review) vs Claude models tend to write more straight forward code. Besides that I don't really see any significant differences.
There are a few gotchas with swapping, like being careful with AGENTS.md/CLAUDE.md naming (Claude Code only recognizes CLAUDE.md, and I think Codex only works with AGENTS.md), and updating skill files to match the tool.
I was using gpt-5.5 high. Writing terraform code for GCP, debugging app launch and Dockerfile issues, that sort of thing. It was going in loops hallucinating features of GCP, looking things up in strange ways, running terraform apply after being explicitly told in the last interaction not to, and overall not solving problems. These were very straightforward tasks and it couldn't be trusted for five minutes. It's the difference in what I would trust an early senior engineer to do vs what I would trust an unreliable high school intern to do.
This isn't about training on the output tokens from Anthropic models, it's just about using their models to build things like pretraining pipelines, etc. Even if you train on your own data.
From the phrasing, it might as well be that any ML or infra. related work that even incidentally looks like it could be used to train LLMs may trigger a silent nerf.
It's such an obviously bad policy, it's mind-boggling that they thought this was a good idea. It just breeds paranoia and mistrust, especially when people are already a bit paranoid about silent model quantification for cost cutting reasons.
They have enough info on you and your sessions to eventually catch you, label you as bad faith actor and ban you automatically. I don't think many would risk it.
Proof search isn't new, but I don't think that captures the value of LLMs.
They act as a learned proposal mechanism on top of hard search. Things like suggesting relevant lemmas, tactics, turning intent into formal steps, and ranking branches based on trained knowledge.
Maybe a kind of learned "intuition engine", from a large corpus of mathematical text, that still has to pass a formal checker. This is not really something we've had to this extent before.
> They do not think
That claim seems less useful, unless “think” is defined in a way that predicts some difference in capability. If the objection is that LLMs are not conscious, fine, but that doesn't say much about whether they can help produce correct formal proofs.
I've been using GPT-5.4, and more recently 5.5, with Codex CLI + Ghidra MCP for reverse engineering a game without many issues. Injecting code is where it usually balks at, but I'm just trying to discover and parse structures from game memory.
I did get a refusal when trying to read in-game currency, even though modifying it would do nothing. It has some strange boundaries.
> a business that puts employees first and profits for owners last can often have a shit ton of profits for owners.
Owners can make 100x that shit ton if they put profits for owners first, so why wouldn’t they do that instead? Out of the goodness of their own hearts?
> The solution, if there is one, has to come from innovation from the private economy.
Why? The problems of offshoring, consolidation, automation, you described came from private sector incentives (not to mention debt driven consumption, and turning basics like housing, healthcare, and education into profit centers)
Why would those same incentives magically fix the problem on their own?
> And there isn't too much the US government can do to revert this economic decline
This is ahistorical. The post great depression economy that led to the “American Dream” was supported by huge public spending and actions by the government [1]. Revitalization happened before, it can happen again.
So much came from FDR/New Deal, social security, labor law, housing finance, banking regulation, securities regulation. Saying the US government can't really do that much is ridiculous.
These plots are terrible. Why is categorical data connected across categories with lines? Why not just use bar plots?
Like in the "Web Vulns in OSS" plot, white box data for Opus 4.7 is not available, but the absurd linear interpolation across categories implies it should be near 60.
It's a combination of factors. There was rate-limiting implemented by Anthropic, where the 5hr usage limit would be burned through faster at peak hours, I was personally bitten by this multiple times before one guy from Anthropic announced it publicly via twitter, terrible communication. It wasn't small either, ~15 minutes of work ended up burning the entire 5hr limit. That annoyed me enough to switched to Codex for the month at that point.
Now people are saying the model response quality went down, I can't vouch for that since I wasn't using Claude Code, but I don't think this many people saying the same thing is total noise though.
The one major difference I noticed is that the GPT models are more analytical (e.g. better at mathematical analysis, code review) vs Claude models tend to write more straight forward code. Besides that I don't really see any significant differences.
There are a few gotchas with swapping, like being careful with AGENTS.md/CLAUDE.md naming (Claude Code only recognizes CLAUDE.md, and I think Codex only works with AGENTS.md), and updating skill files to match the tool.
reply