Agreed - there was always a set of things I wanted to do that I knew the magic core for, but wanted a team of implementers for the curft, the 100k of actual testing harnesses, hyperparameter exploration, etc.. . I now have that team of implementers. All the problems seem research-y though - optimal binary transport systems that are zero-copy and compatible with languages, fast physical simulation optimizers, etc etc... So, things that all had a _LOT_ of busywork around the magic core.
We are there already pretty much - if I understand your point (“How the models are wielded”) refers to the harness - which is part of model training already. Fable was trained to use Claude code harnesses effectively to keep plugging at a problem with a lot of working memory and world knowledge and reasoning capability - and to keep hitting that search space intelligently. And it is not just cybersecurity. I’ll give several examples from my own recent experience :
1. Cyber - already discussed - has an issue that the bigger models can actually do a full end-to-end exploration of an exploit - go from theory to an actual deployable payload.
2. CAD/CAM/Mechanics - CalculiX (ccx) - an open-source FEA and similar mechanical solver - think Siemens or ANSYS, but open-source. A team I was helping was trying to do a design mount of a physical object that would need to reduce vibrations in a frequency band - think microphone mount basically. Usual loop would be design, analyze with Siemens, go to beginning. AI loop is have AI design, then analyze using ANSYS, then analyze result, change design, iterate. That loop did not produce anything useful for elastic materials because ANSYS would take 12 hrs to do acoustic analysis using a GPU. 1 week of autonomous work by a frontier model resulted in a modification and custom solver added to ccx that could simulate the acoustics (vibrations) _in that particular problem_ in about 20 seconds - mainly because it could try new mathematical ideas, then compare them against ANSYS reference for quality of solution, and iterate. And 1 week _after that_ the frontier model - iterating on one design per minute - came up with multiple 3-d printable ground-breaking mounts - including sending one off to xeometry for printing and getting it shipped back. Existing designs had a 20 db drop in the frequency ranges needed - this one had 60. For reference 40 DB is basically infinity :) While this was for microphones - you can imagine that vibration reduction is a big thing in engine, suspension, and weapon mounting, and well, in general things that move. 3 person team btw - unthinkable even 1 year ago.
3. Pharma. Different company - but given a known Density Function Theory or Kappa Cluster molecule simulator, one can run nice agentic loops over frontier models to do chemical or pharmaceutical research - there’s a reason Anthropic is launching Claude Chemistry. Note that then limiting factor is the multi-week runtime of Kappa Cluster and similarly molecular dynamics simulators. If one _could_ speed that up for a particular problem space or molecule type, one could very very quickly have a high-end reasoning model iterate to a good molecular design - and frontier models are getting very good at precisely automating the ML research needed to do that autonomously - after all, there’s a reference there. 5 ppl - 2 years ago would be a research institute.
4. Physical AI - robotics - same principle.
5. This is basically the bet Bezos is doing with his new company.
Please do not underestimate the effect these models will have on our ability to improve our ability to effect the world - this is just starting to hit now. I think we can all extrapolate the GDP and defense impact of this - or at least that there will be a very significant one.
Well - there is a giant push to allow non-qualified investors to invest their 401k (and roth and whatever) into the private equities market - pre-IPO companies and such.
I can't shake the feeling of a grand fleecing incoming - and honestly, most big financial companies I know are against it because the blowback of inevitably bankrupting the firefighters&nurses pension fund will be congressional hearings and piercings of corporate veils.
Seems like the feds are pushing for it - for reasons I cannot fathom.
There is talk among the more conspiracy-minded folk that short exposure to companies that were supposed to go bankrupt, and then didn't, has been rolled into esoteric financial instruments that were sold to retirement funds. If these companies in particular weather a downturn, while everything else is bleedig profusely, it'll be like a double whammy; the short hedge turned into another knife.
A lot - and over the coming 2 years, even more. Utilization rates are under 50% across the board, and special and cheaper chips are coming out all the time for inference. And a truckload of research - TurboQuant, HC (deepseek), etc, etc..
Imagine an agent shadowing all your terminals, providing ideas and asking to run commands that will let it verify the hypotheses it comes up with, while at the same time doing research on vendor docs, etc...
Quite safe, and already a force multiplier - this would be a harness. Maybe have it be able to write to a shadow system with similar (ideally same) hardware to verify it's hypothesis on how the system works, etc...
Minor nit re[2]: for agentic workloads that are actually worth money - i.e., claude code and similar, things are either prefill-bound - which this does not help - or more importantly tps/user bound (at 150k+ context windows) - you want your big magic model to emit 200 tps/user. This is why Nvidia bought Groq (now LPU) and what Cerebras is trying to do, etc, etc. So for the stuff that makes money in the field - GPUs are not really compute bound once context lengths are large - but still memory transfer bound (may be KV-cache transfer, may be HBM->SRAM-on-chip, etc..)
> i.e., claude code and similar, things are either prefill-bound
When accounting for prefix caching, this greatly accelerates each turn. Barring large file reads, prefill still isn't the bottleneck vs. decoding reasoning tokens. Script-writing too.
This is especially true during exploration phases when traversing through directory trees and grepping files, you're talking about a few hundred tokens/turn.
Kindof yeah - predictivity is a question though for larger layers - when trying to scale this up. But yeah, this is a "95% predictor in latent space is a 7x improvement in speed if done right" approach.
Note that _passenger aviation_ is commercially non-competitive. The big 4 US airlines make money on credit cards, not airfare : they lose money on airfare. So, most people who are trying to make money will not use them as a model.
In general, safe businesses can only exist with government support or government prohibition of all other businesses globally - and that is a very hard bar to clear.
We are multiple orders of magnitude away from Landauer limits - so next big thing in matmul could be photonic multipliers - there’s a bunch of them coming up in the next 3? years. So that’s a 2-4 order of magnitude improvement. Sigmoid?
I think the one thing you are not taking into account is that the investors on average fundamentally don’t care. Scale arbitrage means that small companies are fundamentally about velocity - and if they get sued due to regulations that do not pierce the corporate veil, they just fold. And the ones that did not get sued make money for the vc. And figure out later how to be hipaa etc compliant. Basically, I’ve been seeing over the last 10 years VCs are not caring about insurance or corporate liability - sink rate is so high it is irrelevant.
For big corps - this is different. But modulo hipaa - this is why they are gung ho hi about binding arbitration - they are trying to match velocity to some degree - and mostly failing…
VCs and investors are a massive issue, which is ironic saying that here, but once you get into contracts with other businesses, it changes things for the business and the leadership within who do carry liability when things go wrong, especially when they have made attestations.
"who do carry liability when things go wrong" -> unless one pierces the corporate veil, it's just money. Not even their money. HIPAA - unless basically stealing data - will not generate personal liability. And even for SOX will only generate liability in limited amounts for limited people - and executives will go a long way towards avoiding the entire thing.
From what I have seen - most executives would rather shut down the business and quit than accept the possibility of personal liability - and just avoid the regions of the world in which they do have it.
reply