More

nikcub · 2026-06-26T19:40:08 1782502808

I'm glad there are more attempts at solving model routing, as costs (at API rates) has really become an issue. Some feedback:

1. Reiterate the cache issue from other comments already here. there is a lot of optimisation in harnesses around caching and a proxy model blows that up

2. Coding agents are model aware - they already route code discovery to mini / flash models, planning to heavy models, workflow design to ultra, implementation to mid / high etc. They know when they're exploring, planning, implementing, reviewing etc. and which model class to select and when it fails.

With a proxy you're breaking this control loop and feedback. It doesn't know, for ex. that it just attempted with deepseek v4 and it failed, lets try Opus?

3. How are you going to RL improvements and prevent the router becoming stale? You only have access to your own internal prompts and ~thousands of samples.

This is RL'd on one orgs codebase. There are going to be a lot of prompts you haven't seen before and have no insight to on how to route correctly, and you have no insight into users HF to improve your own model. Orgs aren't going to share their traces with you, so you need other sources to train on and improve

There are also new model releases every week that you need to keep up with - whats the story going to be here

4. Publish evals by running terminalbench / deepswe bench. Show us the performance / cost / time chart vs the other agent and model sets. If you can show gains there, you have a very simple value prop to sell where you can charge for a % of the saved costs

adchurch · 2026-06-26T20:07:22 1782504442

Really appreciate the thoughtful feedback!

1. Agree it's important, fwiw the proxy model doesn't blow this up though - only incurs a 1 time cost when switching models and we're aware of that when making routing decisions

2. The agents are model aware yes but they are not incentivized to optimize too heavily here (in particular they don't use OS models even when they would be better). I think that's where this router comes in and brings genuine improvement.

3. Two parts here: 1 is continuing to grow our golden dataset over time, 2 is using reward signals from production traffic (on a per-customer basis or, if allowed, across all users)

4. Yes we have these internally, great callout that we should publish! Will do + will link from the repo soon. (Fwiw I think these benchmarks are useful but don't fully capture vibes - you should try it out yourself for that!)

nikcub · 2026-06-25T23:41:52 1782430912

This is devastating. Om was the godfather of early tech blogging and lifted up so many people around him. He was kind, caring and compassionate.

When I first started blogging around 25 years ago, he would have been amongst the first 10 readers. He linked to me, emailed me privately with feedback, praised posts and would call bullshit when he saw it.

He was never competitive with other blogs or bloggers and was never tied up in drama. He was very often a mediator in behind the scenes conflicts and was obsessed with truth over getting the scoop.

He loved tech and startups and most of all loved seeing other succeed and didn't have a gram of resentment within himself.

Everybody from that post-dotcom crash era of tech owes Om a large debt of gratitude. He will be missed. RIP Om.

nikcub · 2026-06-08T22:38:06 1780958286

Google are the only one of the big three who can tick the boxes on being multimodal, price / performance and having Apple-level of compute available

nikcub · 2026-06-08T10:29:18 1780914558

> "a well written agents.md is very good for the agent"

while even a mildly bad agents.md can be _very_ bad for the agent. they rot very quickly which is why human curation is essential.

same with memory - a lot of the self-learning tools that are becoming popular now degrade agents over time - which is why you end up being able to run an eval with no context and it performs better

> but that's why agentic coding can still be considered a "skill".

yes - far too many cases of throwing a kitchen sink of prompts, skills, tools etc. thinking the llm will sort it out. you need to constantly prune, eval, tweak, observe, update etc. in a loop

nikcub · 2026-06-07T01:26:55 1780795615

Bright Data is available as a product on AWS Marketplace

https://aws.amazon.com/marketplace/seller-profile?id=bf9b432...

nikcub · 2026-06-07T01:24:09 1780795449

Same - have accounts with 3 different services and have never been KYC'd even with heavy usage.

nikcub · 2026-06-05T21:16:13 1780694173

> which actually lead to a #1 spot on openrouter usage

that was only because it was free

infecto · 2026-06-05T21:32:17 1780695137

+1. The coding model was fine and it was fast but the fact that it was free was a massive boost.

nikcub · 2026-06-04T20:34:14 1780605254

It's becoming apparent that it requires more tokens to secure code than it does to write it

May even be an order of magnitude more

Mtinie · 2026-06-04T20:45:14 1780605914

In all seriousness, wasn’t that always the case? Writing bad code is relatively cheap.

Ensuring code isn’t bad is the expensive part.

chrisweekly · 2026-06-04T22:41:09 1780612869

Sort of?

The definition of "bad" from a security PoV is rapidly expanding, in light of relatively new capabilities and increasingly cheap access to exploitable vulnerabilities.

fny · 2026-06-04T23:48:52 1780616932

I don't think the definition of "bad" is expanding. Rather the ability to detect and exploit "bad" is.

chrisweekly · 2026-06-05T02:33:34 1780626814

fair point. another way of putting it might be to say that, for all extant software, much more of it is "bad" than we realized even a month or two ago -- and the cost to create and maintain "good" software is increasing (even as the naive / surface-level / apparent cost is plummeting)

kenjackson · 2026-06-05T03:56:59 1780631819

Same thing happened with the growth of the internet. There was a time when there was basically no consideration of buffer overflow.

tptacek · 2026-06-04T21:00:14 1780606814

For now, maybe, yes? But the most important targets of this kind of work aren't AI outputs; it's legacy code, particularly (but not exclusively) old memory-unsafe code. In those situations the figure of merit isn't the token cost of recreating the target code; it's the cost of finding the same bugs with humans or preexisting tools.

Those costs can be extremely high.

ath3nd · 2026-06-04T21:10:50 1780607450

Any newly produced AI code is immediately legacy and trash at the same time.

andai · 2026-06-05T06:12:52 1780639972

There's a parallel between looking for bugs and mining. As models get smarter, they'll find "deeper bugs".

I expect at some point formal verification will become more economical than red teaming. Writing it correctly is more expensive, but it may be cheaper than trying to secure incorrect software.

(Or rather, as hacking incorrect software becomes vastly cheaper, the amount of software worth writing properly will increase.)

I've been thinking, by Dijkstra's standards we have already been vibe coding for almost a century :)

sam-cop-vimes · 2026-06-05T08:15:11 1780647311

Are AI firms going to charge us to write code, and then charge us even more to secure it?!

smt88 · 2026-06-05T08:19:03 1780647543

Yes, obviously. Infosec has always been plagued by this. How many services make you pay for SSO?

XCSme · 2026-06-05T14:36:00 1780670160

Not if the original code is secure...

windexh8er · 2026-06-04T21:53:12 1780609992

Given the slop that's made its way to Github we can see that this is a great profit model. Ship slop and then "fix" slop. What an efficient use of our planet!

bflesch · 2026-06-04T20:56:11 1780606571

It's weird because why can't they train the AI to simply output secure code?

The basic security flaws with regards to input validation and overflows should never ever be output by an AI. For "security flaws due to bad design" I'll cut them slack until AGI is achieved.

simonw · 2026-06-04T21:18:49 1780607929

> It's weird because why can't they train the AI to simply output secure code?

The most interesting security bugs have causes that are spread across large codebases, or networks of dependencies.

Training the AI to "output secure code" won't work if it doesn't also have access to the source code of every dependency that it's using... and even then, given current model speeds and prices most developers won't want to wait for an hour on every edit they make while the LLM reasons through all of the dependencies.

tptacek · 2026-06-04T22:00:13 1780610413

What's destabilizing the industry right now isn't vulnerabilities AI introduces into new code; it's a flood of sev:hi vulnerabilities in existing code, not introduced by AI but discovered by it.

chrisweekly · 2026-06-04T22:43:25 1780613005

Agreed -- and, compounding the challenge, the flood of _reported_ high-sev CVEs is itself a kind of DDoS attack on maintainers.

froggit · 2026-06-05T10:03:09 1780653789

> What's destabilizing the industry right now isn't vulnerabilities AI introduces into new code; it's a flood of sev:hi vulnerabilities in existing code, not introduced by AI but discovered by it.

Vulnerability discovery has essentially moved to a "proof of work" computation model with AI that has some similarities to crypto like BTC or ethereum 1.0. I don't see any reason a well funded adversary couldn't use this same process on open-source code to develop exploits. I'm sure AI would be happy to try and create exploits from the results rather than fixes.

This sort of proof of work has a notable difference from crypto in the asymmetric nature of what each side is targeting. In crypto, each miner was attempting to find a solution to the same problem and they would all move on to a new one once a solution is found. However with AI vulnerability scanning, the non-deterministic nature means an adversary is likely to find different vulnerabilities. Even if it doesn't, the adversaries have a different post-discovery workflow (i.e. probably less compute intensive aka cheaper due to only needing one viable exploit to win) than the software maintainers do.

Considering it's possible both the adversary and their target could both do all this while running Claude puts Anthropic in a real "Merchant of Death" position.

tptacek · 2026-06-05T13:24:43 1780665883

This doesn't make sense. Claude isn't creating the vulnerabilities. They've been here the whole time. You just get to know about them now.

bflesch · 2026-06-05T09:25:39 1780651539

Even before that everybody was getting drowned in shitty reports from automated tools.

The goal of AI-generated code should not be that one needs a AI-based security review tool on top of it, but that the AI-generated code in itself is reasonably secure.

bobkb · 2026-06-04T22:08:09 1780610889

I think these audit tools can look beyond just security and can look for compliance audits as well. The ability to audit real targets in staging environments makes it easy to identify issues.

nikcub · 2026-06-03T20:33:18 1780518798

Cloudflare are more likely to be undercounting bots - they don't really pick up many of the modern browser-driven bots and crawlers.

yjftsjthsd-h · 2026-06-03T20:48:04 1780519684

I'm quite happy to believe that it's unreliable in both directions.

nikcub · 2026-06-03T20:30:59 1780518659

these old network security techniques don't really work anymore. the common bots are at known IP ranges, the problem bots are all on datacenter + residential proxies.

CyberDildonics · 2026-06-03T21:01:59 1780520519

Why would blocking those be a problem?

chadgpt3 · 2026-06-03T23:40:31 1780530031

because you are blocking all of Comcast, Verizon, T-Mobile, British Telecom, ....

at the end you have blocked every network with human visitors and only datacenter IPs can access your site.

The proxies rotate IP every day, so you either have ineffective blocking or you block the whole network.

efilife · 2026-06-04T02:37:41 1780540661

My site is not for americans so I don't care about blocking american isps

chadgpt3 · 2026-06-04T11:30:08 1780572608

You think they only use American networks?

nikcub · 2026-06-03T21:30:12 1780522212

there are 150M+ of them and you'll be taking out a lot of human users with it

modern blocking is behaviour / heuristic based

CyberDildonics · 2026-06-04T03:53:34 1780545214

There are 150 million bots all using residential IP addresses?