More

Eridrus · 2026-06-16T01:44:47 1781574287

We started out core product on ECS, which is a declarative way to run a containerized service. It has been nice and reliable, but it has limitations (slow scaling, weird AWS Quotas if you have ephemeral tasks).

We're moving our non-critical components onto EKS (pipelines, tooling, etc). We had one outage from runaway IP allocation in a subnet, but otherwise it's been pretty stable.

I do hear vague horror stories so I'm really not excited about moving our prod stack to it, but it's actually been really good for installing 3rd party software so far.

Eridrus · 2026-06-14T14:05:35 1781445935

In the last week we have done a complete analytics dashboard overhaul with Fable/Opus. The baseline was really bad, for we have no front-end engineers, so we largely felt comfortable not reading anything but the auth code (where we did find one subtle edge case handled incorrectly).

The pipelines and data serving design was all human since it did have to deal with some data scale but the javascript/api layer was all slop, and it seems fine and good.

If you have a really high quality piece of code that needs to meet a high bar of quality/reliability, then I think the risk of letting the AI loose on it is very high and I wouldn't do it. If you have a pile of code you already know is a pile of garbage despite being human written, well, it can't get much worse :)

I also built an agent orchestration meta harness that runs on k8s and uses the k8s agents sandbox for running codex/claude code in the cloud. This was almost entirely just handed over to Fable and I have not asked a single architectural detail. The quality of this product is mediocre, but the fact that it largely works after I went through a few iterations of clicking around is impressive. I would have preferred to buy something off the shelf, but nothing even really came close (though maybe now I would have forked Omnigent)

Eridrus · 2026-06-14T11:42:42 1781437362

Codex is much more subscription-efficient than Claude.

Having said that, I think there is a question of how far we can push this and not collapse under the weight of tech debt created, e.g. https://openai.com/index/open-source-codex-orchestration-sym...

I think the dream is basically that you go and file a bunch of Linear tickets, and then you come back a day later to evidence of the tickets being resolved and the code merged. I don't think we're super there yet (See: Anthropic's regular bugs in everything), but this is the future that people are trying to get to and to some extent the question is: is there anywhere we can apply this to now sanely? How does this frontier evolve?

Eridrus · 2026-06-14T03:25:36 1781407536

The commentary sounds like AWS really pushed this.

If you're bringing this sort of stuff to the government, it's because you want the government to act...

Eridrus · 2026-06-14T03:22:37 1781407357

Releasing a model without benchmarks seems to say the model is probably bad...

Eridrus · 2026-06-13T16:43:38 1781369018

Tell me you haven't talked to a VC.

A better model for VCs is: companies are finding tons of budget to allocate to new AI spend. Besides the labs, who is going to be able to capture some of that spend while they're actively looking to spend it?

Nobody at the seed stage is investing in things they think are "safe". They are investing in things they think have huge upside.

jdw64 · 2026-06-13T17:01:27 1781370087

Sometimes people don't realize that 'professional' ideals and 'reality' are different.

What you're talking about seems like 'ideal' investing, not real world investing at all. Of course, the VCs in your country and the VCs in my country are different.

It's like in software, where everyone says you should write maintainable code within the norms, but in reality, most people don't do that

that investing in 'potential' is the basic principle of VCs. They call it the power law. But when you look at actual investment portfolios, it seems quite rare for people to follow only that principle. I guess you don't think so. Of course, I agree that ideal venture investing follows the power law. But in real world investing, there are pragmatic investors who operate somewhere between the ideal and reality. We always project ourselves onto the 'ideal,' but I don't think there are only people who are immersed in that ideal. Of course, no VC would invest in someone like me. I've met with VCs three times in my career, but they all turned me down. Haha.

jdw64 · 2026-06-13T17:19:55 1781371195

Anyway, I wasn't trying to mock your profession. Here's what I think. Most VCs and investors have their own success formula. There will be VCs who succeeded by investing in infrastructure. But the question is whether that same success formula applies to AI startups right now. Of course, from your perspective, it might look like 'this clueless kid is just being cynical without knowing anything.' I partly agree. But that's not the core of my argument.

What I'm trying to say is that those success formulas themselves need to be reconsidered.An insider from up there came out and talked about the next 'Databricks,' believing that's the kind of potential they're looking for. All of them do. Everyone wants to be the first investor in a goldmine. I don't think this is just about greed

The question is whether the traditional infrastructure investment logic holds here. I think most current AI infrastructure tools are closer to 'temporary patches' that exist before the functionality gets internalized.

Let's say infrastructure is like a concrete building. Traditional IT infrastructure basically has a standards committee, and once that committee sets things, changes are extremely rare. It's a kind of 'lake.' But AI infrastructure right now is different from one to another; even the ecosystems differ—the Chinese ecosystem is different from the US ecosystem. It's a flowing 'river.' I just think the question is whether the old grammar can be applied in this situation.

You probably have more money, more investment experience, and more success than I do. I only have a lot of failure. But apart from that, the issue is simply that 'potential' in growth potential ends up being data measured against past examples, and the question is whether that data still holds up now. Anyway, I might have been slightly sarcastic earlier, so I apologize for that. Someone as successful as you, please bear with it a little.

Eridrus · 2026-06-14T02:32:18 1781404338

I'm not a VC man, just someone who has raised funding recently. Nobody in the US is talking about the next Databricks atm, for better or worse, they are either trying to get as much allocation in OpenAI/Anthropic, funding random credentialed people to make neolabs, or funding people who are somehow selling into the massive AI coding/agent demand (or to the labs). Investors in the US currently do not care about safe bets, they want growth at all costs. Risk on.

Maybe in your country it's different, but this is what I see.

Eridrus · 2026-06-13T01:55:15 1781315715

The difference between OpenAI & Anthropic is that OpenAI didn't do multiple big media pushes about how their models are so scary and dangerous.

OpenAI's models are very good, they have refusals + a government ID verification story for cyber access (I don't think they prevent non-US nationals, but I don't know this). What they don't have is Project Glasswing and all the hand wringing about how they're going to end the world in public.

I hope Anthropic pulls their head out of their ass and just starts acting like a normal company.

jazzyjackson · 2026-06-13T03:20:56 1781320856

You’re telling me this testimony isn’t sincere marketing for how revolutionary and dangerous his product will become?

  OpenAI CEO Sam Altman testifies at Senate artificial intelligence hearing | full video“ (2023)

"My worst fears, are that we cause significant - we the field, the technology, the industry - cause significant harm to the world...If this technology goes wrong, it can go quite wrong and we want to be vocal about that."

https://youtu.be/Pn-W41hC764

datadrivenangel · 2026-06-13T02:11:40 1781316700

OpenAI did this back in 2024 several times.

Eridrus · 2026-06-14T02:33:52 1781404432

I think Anthropic went a lot further with the marketing around Mythos/Glasswing.

OpenAI talked a lot about potential future risks. Anthropic went around saying "this model is too dangerous to give to people at all".

Eridrus · 2026-06-12T23:58:50 1781308730

The problem is that there are a bunch of benchmarks, the model providers often don't even use the same benchmarks, a bunch of them have known problems, and it's expensive to do your own benchmarks.

I am a GPT 5.x booster since to me it just feels smarter, and I generally felt like the benchmarks backed me up, but it's not every benchmark, so sadly we're mostly arguing about vibes.

SWEBench-Pro was a big one, though apparently Claude was reading solutions out of the .git folder it wasn't meant to have access to among other problems.

smoe · 2026-06-13T00:10:06 1781309406

I find it fascinating that every time this kind of discussion comes up, people talk about night and day experiences between Claude and Codex, in both directions. I’m really wondering what people are doing to get such different outcomes.

I’m currently working on two projects/clients one using Claude, one using Codex. I have a strong preference for the latter, but not because I think it is much more intelligent or writes much better code. It is simply because I find the way of interacting with it more pleasant: more literal, mechanical, makes fewer assumption and or double checks, and is less proactive in my experience. At least until some updates over the last few weeks.

Eridrus · 2026-06-13T01:47:49 1781315269

I think I like Codex for the same reason tbh. I think it's just general misanthropy or autism or something lol. Most people seem to prefer Claude.

For me, I think Codex was visibly smarter than Claude until 4.8 came out, it would regularly do better debugging and IMO write better code. 4.8 I think is close.

I think Claude is widely regarded to have a big lead in front-end, which I do not work on.

Claude's Ultrathink is pretty cool, though it eats up tokens like nothing else obviously.

AlphaSite · 2026-06-13T02:55:15 1781319315

It probably means they’re close enough that there’s no observable difference. Or better at every different things.

Eridrus · 2026-06-12T21:33:43 1781300023

The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title.

Eridrus · 2026-06-12T18:18:13 1781288293

Why did we invent the term Data Scientist a decade or so ago to replace Business Analyst?

To try and give the position a rebrand with more prestige and hire better people for it.