Hacker Newsnew | past | comments | ask | show | jobs | submit | monkeydust's commentslogin

I have been experimenting with multi-agent llms for last month, as I put in the writeup for my repo and in the video the biggest value I have found is when you run a bunch of different agentic strategies in parallel then have a judge review the variance of them. So far that has uncovered interesting insights. The rest of it is so-so. Been fun but also expensive!

Repo with video: https://github.com/monkeydust/rightmind


Multi Agent setup to tackle complex problems using the diversity of multiple LLMs. All for personal use but finding it very useful especially what I call the 'all angles' where it runs multiple strategies parallel then a judge agent presents summary including a view on how the strategies agreed and diverge from one another.

Repo with video: https://github.com/monkeydust/rightmind


Don't know but could BOINC setup which has been around for ages and mature plus has some incentive mechanism (Gridcoin) be used for this?

AI has collapsed the cost of producing content while leaving the cost of reviewing, verifying higher imho. This has inverted the economics of collaboration. Reviewer attention, not output volume, is now the scarce resource, this happened with my engineering teams (PR reviews) and is now happening in my world in Product.

In some cases there's also no preparation or verification happening at all, which massively inflates the productivity gains of AI. Lots of VCs and investors asking companies to move into "trust the AI" mode.

I once consulted for a company in the content marketing business that was one of the largest and fastest growing startups of its country. The content production in itself was "cheap", a dollar for 500 words. But it collapsed, due to the unbearable amount of people required to review

Now virtually all content is generated by AI and the old customers don't have anyone to verify anymore.

Companies are made of people who are shitty to each other but trust machines blindly.


  > Now virtually all content is generated by AI and the old customers don't have anyone to verify anymore.
i see this first-hand at $company, where the pr's are so obtuse and descriptions are incomprehensible or just too long and gratuitous that thorough code review is also falling by the wayside and becoming more of a "rubber stamp"; this is what management is calling "productivity boost"...

Another productivity boost I'm seeing is from $COMPANY is asking devs to let AI do the design, instead of having a designer do it.

I have a dirty secret: I do the design myself.

Productivity gains are still there. People are amazed at the "AI".


I have a MA system setup for personal use.

You give it a problem, you then refine that problem where a fast, cheaper model asks you questions which you answer to get a better input prompt. You then choose a MA strategy for example take problem break up to sections then final judge concludes or you do multi turn where agents debate then judge summarises debate.

The best approach is what I call 'all angles' where all these strategies run in parallel the final meta-judge synthesise the response - the most useful part of this which I recently added is a view to see the variance in each strategy.

Been using this for life stuff - housing search, schools, family challenges!

Perhaps I should make a video of it in action if people in HN community interested let me know.


Right here is the video demo of what I built - https://streamable.com/e49cgt

Details and repo post on ShowHN here - https://github.com/monkeydust/rightmind

I have also developed a similar system not focused on the exploratory refinement of prompt(s). But more focused on feedback loops cybernetic style, so focused on the maintaining of stability of the prompt outputs by a growing library of deterministic checks and autofixes. Anything that is a "problem" which isn't covered by that library is surfaced to the human driving the process.

You mention cost in one of the replies. Can you elaborate on the cost profile (ballpark) for various problem types? I would also be curious to understand the strategies employed and what the costs look like across each.

Definitely interested, would love to see a video :)

Sure let me do that. Can I post this as a ShowHN if its just video? The rules say people need to try out but that will cost me a small fortune :) ...could perhaps post on Github and people can setup the repo themselves with their own Openrouter key if that works. Have never done a ShowHN but would be fun to try it.

The cheap models may ask subpar questions leading to subpar solutions

So what harness are you using? And what LLM’s

Homebrew harness and all frontier ones plus deepseek. All via Openrouter at the moment. Works well enough but can get expensive so use for real high value challenges. Interestingly the refine feature has been most useful to me and people I have shown, essentially people are lazy when expressing the initial problem (me included!), refine asks relevant questions to initial problem then refines the initial statement, user can accept/reject/edit before submitting.

I came to a similar conclusion. I think the default options in many IDEs (Ask/Plan/Agent) are limited... 'Refine' feels like an improved 'Plan' in that it doesn't just jump right into building a list of tasks based on the initial prompt, because who knows what sort of flaws or deficiencies were present in the initial prompt! Can't always get everything right in the first try. XP

I don't think a specific harness is even necessary to get a boost from 'Refine'. Even a simple custom agent is portable enough... it's easy enough to take the existing 'Plan' agent definition present in VS Code and tweak it to be 'Refine' instead.


There is a 5 line skill I’ve been using for refinement called grill-me that works quite well

The problem with these kinds of systems (they have been well studied), is that that the overall output is ultimately anchored to the dumbest models used.

I.e. you cannot end up having a more intelligent output by using more dumber models (that is: dumber than the most intelligent model used).

It's generally always best to refine your prompt and send it (at most) to the two smartest frontier models possible. And then have the smartest model review the output from the second smartest.


Same, you can instruct it specifically to look non-standard and give it examples of website styles I want. After some wrangling it feels a bit more creative but takes prompting.

Do you have any like dyes or work clothes describing this process?

I don’t know what prompted to get it away from the base model and any of like the non-standard web design like styles are a little bit too harsh for me if that makes sense. Like for example I like brutalist design, but it just feels heavy sometimes on the apps that I’m making.

I’ve tried to get the AI to describe a style based on the product name or you know it seem that I wanna have for example travel. But then it creates us like steal, amorphic design where everything looks like a boarding pass airplane ticket.


Does it matter though.

So if I have a problem with my telecom provider and I want to get it solved asap, I'd the AI can do this just as effectively as a human operator isnt that OK?


That's a pretty big "I'd". In my experience the answer is no, the AI is not as effective as a human operator (as long as the human operator is paid enough to care)


The article pretty much plays out whats happening in our place, heavy use of AI in software development but we dont see us shipping faster, about same or perhaps slower (for other reasons). Its a weird feeling as were waiting for this utopia to kick-in but its not and were cant fully put our fingers on it.


The article and the AI skepticism crowd on HN read like the blind leading the blind to me.

I'm at a FAANG. My org is moving much more quickly, maybe between 3-10x more quickly than we were pre-AI. We aren't seeing a spike in reliability issues. Things just get done faster. An org as large as mine has no right to move as fast as it does.


I’ve been back through your post history (not entirely) - you mention multiple times you work at a FAANG - so you work at one of 5 very public companies.

You have been asked multiple times by multiple commenters to provide a single example of something that reflects this incredible boost achieved by <massive tech org>, you have ignored every request for this, and I suspect will ignore this one as well. HN is going to die unless we all start calling these constant deceptive practices out. I’ll leave others to parse your history and make their own judgements.

“Judge them by their fruits”


Not going to break NDA and give up our competitive advantage for HN, sorry! I can tell you it's been useful for us, but thinking about how to use it is an exercise left for the reader.


Perfect excuse for avoiding any substance (except no one asked you to give up the advantage or secret sauce of using LLMs, only the end examples of what has been achieved with it). It is also funny how you always leave out the name of the company you allegedly work at. It is perfectly clear why you do that, though: no matter what company you name, its actual employees on HN will quickly disprove all the ridiculous claims you have made regarding LLMs and AI. Keeping the name ambiguous lets you get away with it.

Anyways, solenoid0937 is an LLM-hype peddler and an Anthropic shill, not an actual FAANG-employee. See proof here: https://news.ycombinator.com/item?id=48269250


It's highly team dependent. Shortly, the more "coding monkey" the work is, the more velocity you can get with AI. As soon as you need to interface with customers and extract requirements, that becomes the bottleneck.


These guys have been around for a while, anyone have practical experience/exposure to using their technology for heavy enterprise workloads?


This does feel like where things should be going for more natural human-AI interaction patterns. Nice write up and demos.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: