More

verdverm · 2026-06-16T22:10:59 1781647859

I'm using SearXNG, EXA, Tavily, and soon (tm) Cloudflare

They all give slightly different results, you can dedup / fusion with heuristics / another agent

verdverm · 2026-06-16T22:09:05 1781647745

There is a lower bar (that gets lower over time), but ime, the config you are describing is too low still.

qwen/gemma in the 27/35B range @fp8 are better than gemini-2.5, but less than gemini-3.1, you can run DS4-flash @fp8 on two DGX spark, and things keep becoming better. DiffusionGemma came out recently with 4x token gen speeds.

tl;dr - the models you appear to be trying with are too small or too quant'd

verdverm · 2026-06-16T20:44:05 1781642645

slow down and put more effort into vetting, trust your gut, make them share all screens during (coding) interviews

verdverm · 2026-06-16T20:11:00 1781640660

They explicitly do not want them. They will pay companies to abandon renewable energy projects that have already planned / started.

verdverm · 2026-06-16T20:09:12 1781640552

Not just climate data, they are changing how census data is aggregated so they can reverse-individualize for political goals, discussed 3 days ago.

https://news.ycombinator.com/item?id=48517377

https://www.npr.org/2026/06/12/nx-s1-5855734/census-bureau-d...

verdverm · 2026-06-16T17:10:34 1781629834

51° FOV, not much improvement here, but at least it is a much smaller form factor

verdverm · 2026-06-16T16:58:40 1781629120

Second this

verdverm · 2026-06-16T04:38:38 1781584718

related: Electrostates, Petrostates, and the New Cold War

https://www.youtube.com/watch?v=gLnxzkiB-GI

verdverm · 2026-06-16T03:43:00 1781581380

There is a bug in llama-cpp for qwen/gemma models, use vLLM instead

pdyc · 2026-06-16T05:06:46 1781586406

what bug and it affects what?

verdverm · 2026-06-16T15:19:16 1781623156

it's a prompt cache invalidation bug that causes all input to be reprocessed instead of getting preloaded

There are other reasons to prefer vllm to llama-cpp as well

verdverm · 2026-06-14T16:24:18 1781454258

You can do this in opencode and pi (haven't used), by defining your own agents or overriding the built-in ones, so in your primary agent you can disable all tools and give it good instructions for how to delegate

I imagine most harnesses should have a way to do this today, if they don't, get a new one. OpenCode i.e. is highly customizable, Claude and VS Code both support a ton as well including custom agents (though unclear if you can create custom top-level in claude-code)

https://opencode.ai/docs/agents/

https://code.claude.com/docs/en/sub-agents

https://code.visualstudio.com/docs/agent-customization/custo...

gbro3n · 2026-06-14T17:40:18 1781458818

Thanks, those don't deterministically prevent the main loop from using tools thought, unless I'm wrong that's just prompting the main agent on when to use specialized sub agents

verdverm · 2026-06-14T17:45:27 1781459127

you can configure tools, thinking, permissions et al on a per agent basis in the frontmatter, or via config (which they use in the examples), either location is valid, merging order (?)

the main agent would be very different, basically an orchestrator, and you are "loop engineering" it, and turning off all the things for this main agent besides being able to run subagents

for opencode:

https://opencode.ai/docs/agents/#permissions (what tools, mcp, etc...)

https://opencode.ai/docs/agents/#task-permissions (what subagents it can call)

https://opencode.ai/docs/agents/#additional (thinking effort)