I'm perfectly happy at claude opus 4.6. All improvements since then have not meaningfully improved my day to day. If i can get 4.6 on my laptop for 5-10k, i'd gladly start shifting my ~1k/month Anthropic spend over.
Some of the harness even let you run a local model for most things, and only pay for the latest frontier models when needed, which cuts down cost drastically.
Not necessarily, inference speed also has huge time aspect. For example anthropic takes nearly twice as long as OpenAI models for my tasks with both having similar success rates.
However, they have reported numbers along rather inconsistent dimensions. Like, historically they've focused on number of repos and users and later PR's and issues, and often catch-all terms like "contributions" which includes all of those + comments etc... but the number of commits alone (which apparently is the main culprit now?) has been mentioned very sporadically. This has made it hard to get a consistent sense of historical growth.
Without any other information, however, it is reasonable to assume that a 14x in commits is the prime candidate for instability. Especially since commits are write traffic, which is much harder to scale than read traffic. Plus every 3 - 5x increase in scale can reveal bottlenecks in your distributed systems that you never knew existed, so they probably have like 2 - 3 "generations" of bottlenecks to figure out!
Makes prior assumptions that getting tens of gigs of ram is cheap thrown out the window. Would likely lead to super fast SSDs such as optain being way more valuable
> If you've ever built software, you'll know that regressions are all too common. Especially when AI/ML is involved.
"AI/ML" has delivered far more complete testing criteria than any "QA expert" has. It's absolutely crazy to me the number of people who defend the status quo in software testing when software quality has been on the decline for over a decade. But sure. "AI/ML" is the problem, not shit developers who never considered that angle in the first place.
Some of the harness even let you run a local model for most things, and only pay for the latest frontier models when needed, which cuts down cost drastically.
reply