I think Fable gave a bit of a sneak peek into the future.
My objective KPI: for the few days I was using Fable (18hr a day), it would frequently push back against my design ideas and propose alternatives -- and they almost always felt better to me. Back to Opus now, still 18hr days - and I dont think it disagreed with me meaningfully even once since Saturdy. I consider myself and old hand -- and i think Fable really didn't need me to be very specific in my prompts, it would have done a good job regardless, or even despite my prompting.
Of course whether this is the future is anyone's guess. Maybe we will experience a butlerian jihad and there won't be any prompting whatsoever for completely different reasons :-)
CC is mostly my default for large tasks / features (ex. Plan > execute plan ) Biggest gripe with Claude Code is that it is painfully slow relative to the other two.
Cursor for small stuff like bug fixes since it has a lot of models to choose from. I love the review/ diff / checkpoint features. It's planning feature is on par with CC. I'd probably use Cursor as primary driver if it had better cost efficiency. Next version or two of Composer may fill that gap in cost/quality/speed.
Codex isn't allowed at my work, but I use it for personal projects. It has the best balance of quality / cost / speed even if it's planner is poor and quite frankly the codex harness needs to catch up with the other two.
CC for quality / cost. Cursor for quality / speed. Codex for balance of the 3.
composer is competitive with around opus 4.5 in feeling?
largely lags behind opus4.7/gpt5.4, but is respectable, and generally outperforms the glm/qwen equivalents anecdotally despite benchmarks.
fails to follow instructions more often, and is less code critical, but performs okay if you can decompose the task to smaller problem spaces. i.e. only do manual review, only do typechecking, only do specific component. etc
I agree, Composer 2.5 is really good. I use it for all kinds of small tasks, and really for any kind of first pass at debugging, answering questions about the codebase, pulling data for reports, etc. It’s fast, pretty accurate, and basically free.
Yes, I still use it, although less than I would otherwise.
Good:
- Composer 2.5 is pretty decent for the quality / price ratio.
- Easy to assign an issue to it in Linear (I know Linear just added this natively for linear agent, but it seems rubbish compared to Cursor)
- Bugbot actually finds some useful issues (things Claude and Codex will miss)
- Using @cursor in github usually works well, and better than @copilot.
- Working with Python Monorepos with UV in their IDE. VSCode and Cursor work well here (Antigravity managed to screw it up somehow).
The Bad:
- Usage/billing dashboards - These are are opaque and you can't attribute what actions map to what spend.
- cursor won't follow PRs well like Claude Codes does.
- Setting up environments is less good than Claude Code
- Their IDE fork is woefully out of date, it'd be nice if it had more of the codeium fixes.
The Ugly:
- Settings - Try to turn off bugbot, there's multiple places you have to do it. Good luck figuring them all out.
- Support - they are polite, but gas light you and tell you it's your fault their product's settings are awful.
Linear employee here - if you have any specific feedback on our Claude/Codex integration, happy to hear it. Definitely a v1 so expect a number of fast follows up with some of the missing functionality like env customization, secrets, and code signing.
I was speaking more on the linear agent vs the existing integrations. We love the linear guided reviews and issue tracking so have high hopes on getting a good DX here from y'all.
The claude integration - works as well as anthropic will let it work, since you can either automate it (anywhere from 'claude -p xxxx' or the api and ignore your subscription and pay by the token, or open a crapton of tabs with the terminal, or paste it into a bunch of sessions in their app. Which works more-or-less but it's cheaper than per-token costs.
The linear agent, doesn't seem to read the AGENTS.md file, follow along on a PR nor nor let you configure a sandbox (it told me this:
```
Note: I couldn't run ruff/pytest here (no uv/venv in the sandbox), so I verified syntax via AST parse only. The Postgres-backed tests will run in CI.
```
After I asked it to look at the PR check failures.
To be fair, claude code does it 70% of the time (the other 30% the sandbox is dead), and cursor about 10% of the time.
Yes, it’s my daily driver for building the saas I run full-time. I’m not happy about this news.
I like the ability to switch between any models, Composer 2.5 is really solid, I like having my agents coworking in the IDE with me, the plan mode is great, Cloud Agents are great, especially with slack, linear, web, etc integrations. I routinely tag an error report in slack and Cursor fires up a Composer 2.5 cloud agent that has readonly db access, access to error reporting, etc, and it can triage the issue, issue a PR, and tag me in slack.
The only thing I’ve felt like I’m missing out on is the subsidies of the CC/Codex subscriptions, but it seems like that is rapidly eroding anyway.
It uses AI to replace a very niche human-powered workflow in a niche industry. That's all I'll say, but it's grown to about $30k MRR in the last year and is supporting my family, so the stakes feel pretty high to me.
I actually acquired this as a little side project from some guy online. I hadn’t even really heard of the niche, but I knew enough to know that it had some potential, and I could also see that it could be used for an adjacent market segment that I was pretty familiar with, because my dad worked in that segment when I was growing up. And the price was cheap because it just had a few users and was pretty new. All but one of those users cancelled in the couple months after I bought it, because it turned out the product just didn’t do what it claimed to. But I didn’t really care about that, because 1) I knew it could deliver if built correctly, and 2) I could just tell there was demand for this and it wouldn’t be that hard to sell.
So I turned off the signup form and started rebuilding it and improving it, but I kept getting people emailing me and wanting to sign up. No marketing, nothing. I held them off for almost 18 months while I rebuilt it on the side of my day job, and that gave me a ton of confidence that I was on the right track.
Then about a year ago I got laid off. I was really close to relaunching it at that point, so I used all my severance to go all in on it.
Not to my knowledge? But even if that's the case, Cursor seemed like they were doing fine without SpaceX, and I'd like to avoid giving a single cent to Elon Musk. You can do as you wish.
...and won't mind if you change your mind. And again. And again. And again for as long as you care to iterate your design, experiment with a business user over your shoulder, etc. etc. etc. People routinely avoid throwing away work because they get emotionally attached to it, even if they get paid by the hour. LLMs just do as they are told, and thats worth a lot.
There is a large gap between people who have been using AI for coding for the hundreds or thousands of hours, vs. those who do not. People like Ed Zitron, who never managed or participated in dev projects scream from the rooftops that AI coding is only relevant for small hobby projects. Meanwhile, in my own backyard, we are happily shipping production stuff for a few months now, and newly launched IT projects get launched with substantially smaller teams. And anyone who ever had to work with mediocre developers will take Fable any day of the week.
He heavily leans on developers for his points on coding, and then spices it up.
> For example, major media outlets will gladly write that “AI can build software,” but said sentence suggests that you can just type “build me Slack 2” into Claude and have it fart out a fully-functional, production-ready piece of software, rather than a quasi-functional mound of code-slop that can do enough to trick a business idiot or lazy journalist, but little else.
Here is the latest point he made on development and that seems accurate to me? If a non-technical person hands AI an under-specified prompt you get quasi-functional slop.
Can you link the piece where he says it's only relevant for small hobby projects?
I'm not a huge fan of his or anything but your comment is just.. pulling stuff completely out of no-where.
around 9:30
"this is a thing that has its use for the little things, but the moment you start expanding it"
This is just plain wrong. We deal with codebases in double digit millions LOCs with models - it takes genuine skills and instrumentation to do right, but it does work. And I know devs who take this view - that AI is dumb, useless, a gimmick - and what they have in common is they have not tried to put in the hours to learn how to tame the beast.
Anyway, I am losing interest in debating the topic, the efficient markets will deal with this objectively. I can't see how a company employing the usual high-low mix of developers can compete with a company that has a small number of elite devs equipped with those tools and unencumbered by having to manage large development teams and associated bureaucracies. Time will tell.
You can't just take one sentence out of context. Huh?
"...you're just kicking the can. You're still going to have to read all this code to make sure it makes sense"
He's commenting on that maybe it's not a huge productivity boost once you include the reviewing- if you want to get good results you have to know what you're doing, direct it, review it. If you skip this, you get aimless slop.
How on earth is this "plain wrong"?
> the efficient markets will deal with this objectively
> sentence suggests that you can just type “build me Slack 2” into Claude and have it fart out a fully-functional, production-ready piece of software, rather than a quasi-functional mound of code-slop
How's that different from randomly selected human developer team? Other than price, time and hr. Most software project always failed for a reason.
That's normal in sense of being a practical outcome of most software development projects.
The truth is, software development process always produced mostly garbage. Looking at only successful projects and saying "see? that's what humans do, completely unlike AI" is a bucket of survivorship bias.
I'm sorry, but this is borderline silly. Fable has been out for less than a week and you're already making grand pronouncements about its superiority? How much first-hand evidence could you possibly have for that claim?
If he is already enjoying what he claims with pre-Fable models, then it stands to reason Fable is already above that baseline, and therefore your nitpick is silly.
That both fair and also charitable to the commenter. It's not a given that Fable is "above that baseline," all we can go by is anecdotes and Anthropic's marketing materials. Both tend to be puffed up. And if they're speaking generally about LLM-assisted coding, they could have chosen to say it that way.
You sometimes meet someone and hit it off immediately. A fellow engineer, hobbyist, anyone. That is my experience after working with Fable since it launched and barely sleeping. I never had more nuanced, interesting and fruitful design discussions in my life. And I've been programming across the entire stack, from transistors to enterprise architecture, and spent 40 years around computers. Love is love, it does not need to be well researched and may well be misplaced :-)
AI is a cool technology but anthropomorphizing it and using the L-word to describe your relationship with it is a symptom of AI psychosis. Respectfully, I suggest that you take a break and touch grass.
Fable caught us in the middle of a crisis where we had to replace a supplier with a quickly-put-together home made solution. We have been working with it non stop since it launched. And thanks to a lot of baseline experience with previous models, our small band of relatively old hands decided we are badly in love with it.
Now, remind yourself when was the last time you had to work with a developer who went to CS because it pays well, and has zero enthusiasm for what they are building, and are just phoning it in, with minimum effort and low skills. AI models are coming for those people first. And those people are in the fricking millions. Strip IT teams to people with passion either for product or for tech, give them such tools, and watch. Compare this with a normal IT shop with a bunch of great people, a metric ton of average people, a few toxic imbecilles, and the necessary HR/management bureaucracy to keep that bunch on a leash.
that's what I'm seeing it across other teams as well: backlogs size barely smaller and even defects rising. the low performers have only gotten lazier.
the difference now is that people are starting to openly question the extreme productivity gaps and their usual excuses aren't working anymore.
AI doesn't need to replace everyone. just dropping the bottom 1 or 2 people per team would already be catastrophic for the field
Not my impression. I felt 4.7 was a regression, but I am again badly in love with 4.8 with the level of insights it produces in design discussions, and how long can it go unattended while producing spec-adhering quality code. There are problems it still can't solve well, from the edges of algorithmics and far from the mainstream, but for lots of stuff it is godlike.
Also, I dont think Boris C. is coming here for PR. He is a tech guy, and this is the best place for tech discussions. Why so cynical? The guy is an engineer.
Wonderful stuff - thanks! I was thinking about doing this as well.
I just completed recreating one of my childhood favourites, Thrust. Opus 4.7 almost one-shot the game (was perfect in 30 mins based on just the wikipedia entry describing the game), and then I spent 3 full nights trying to build perfect algorithmic AI to auto play it. (It thoroughly failed, sticking to a sub-par algo that was ok, but not nearly as good as possible).
My next stop is reimplementing Mercenary. First open world game I played. On a 8b commodore Plus/4.
Garmin recently did something similar, resorting to tls fingerprinting to prevent unofficial logins to their api (via the popular garth library).
They lost a lifetime customer in me - i think i have spent close to 20k on garmin gear between my wife and myself, watches, gps devices for cars, boats, and hiking gear. If they refuse to give me access to my data, i will (a) lobby for laws to be passed to make this mandatory (b) absolutely never ever buy anything garmin until i see a reversal of this policy and an apology.
More broadly though, its yet another service that blocks API access. No doubt this is caused by proliferation of amateurs armed with agentic tools building nice, personalized frontends for themselves. Companies seem to absolutely hate it when people dont go through their shitty websites with dark patterns, misleading search results and analytics.
Huh, I completely missed that. I've been using python-garminconnect [0] for a few months without issues. I agree though that it's annoying, though not reason enough for me to switch away from Garmin yet.
Already minted tokens work, they broke the login process.
For now its just tls fingerprinting, not client attestation - so, I managed to implement a working solution. But I am sure they will tighten the screws still further.
The only annoyance is that Garmin requires 2FA if you enable the ECG feature on your smart watch/fitness tracker, but I have a small program that reads the 2FA codes from my Gmail inbox and supplies them to the scraper without too much trouble.
Have you looked at the feasibility of making your own CIQ app to push data from the watch to your alternate internet endpoint?
I have the impression there are permissions and APIs to access sensor history and activity records, but I haven't had a need to dig in and learn what restrictions there might be...
The end game here is going back from a model where a bunch of product and tech management people sit in the U.S. or Europe, and try to manage thousands of mediocre talent sitting somewhere far away. The new model is you give those coding tools to good engineers colocated with your product people, and you ship good stuff much faster. If you can achieve such a setup, the token costs can be $50k per seat per month and you still run circles around the legacy IT models in terms of efficiency. Giving everyone the API keys and not changing the way products are managed is not going to work.
Good lord, what company would want to spend 600k per employee just to go maybe %20 faster (what the studies seem to show is a realistic estimate for productivity gains),
I'm building a product right now with some AI coding (despite my negative sentiment about AI in general they are useful). I am both the product person and the engineer, and I'm pretty decent at using it, so according to the hype I should be seeing like a 10x speedup. I am not seeing that. It's definitely faster, but there are also days where I'm stuck cleaning up things after going too fast for too long, or periods where I need to put the software in front of people to get real feedback, or even periods where I just need to use it extensively myself to find the pain points and bugs. I just don't see this "running circles" once you get past an MVP and you actually need to build something secure and not embarassingly broken.
To me the question is, can the frontier labs make the variance of output lower + make the output of higher quality to justify their prices?
If not lower priced chinese offerings will be better as its cheaper per token - giving you more attempts to offset the variance.
My feeling on the former is no... I believe they tried really hard but they've settled on pure marketing now to attempt to fight off the chinese with perceived superiority in quality.
My objective KPI: for the few days I was using Fable (18hr a day), it would frequently push back against my design ideas and propose alternatives -- and they almost always felt better to me. Back to Opus now, still 18hr days - and I dont think it disagreed with me meaningfully even once since Saturdy. I consider myself and old hand -- and i think Fable really didn't need me to be very specific in my prompts, it would have done a good job regardless, or even despite my prompting.
Of course whether this is the future is anyone's guess. Maybe we will experience a butlerian jihad and there won't be any prompting whatsoever for completely different reasons :-)
reply