Hacker Newsnew | past | comments | ask | show | jobs | submit | bushido's commentslogin

Under normal circumstances, I'd agree with most folks here, it'd be highly unlikely.

However, we're (i think officially) in an arms race.

I wouldn't want to bet against anyone in these unprecedented times (with plenty of historical parallels).


I had a very similar experience. I'd be keen to see how you went about it if you release it.

Here's what I use: https://github.com/DheerG/swarms


Very interesting read.

What jumps out at me is a lot of this is still very task oriented. And each to their own, but anecdotally, I haven't seen great results from task oriented behavior.

I don't mean that it does not produce what was asked for. I'm saying that tasks even when created by engineering and product teams are often wrong.

I lean very heavily towards outcome based prompting. Say exactly what do you want achieved and then maybe give some constraints, ie. what definitely not to do.

In my experiments, this has always produced much, much better results.

Interestingly, it's less engineering and more customer focus.


Say what you want, followed by how you want it done, which includes unit testing, research (Claude can do research for you) on best practices, etc. I usually also follow up with "Why would I NOT want to do this this way, be critical of this." Usually outlines sensible issues, you give guidance, then the plan is ready to go. Then you test, test, and when you're done testing, you test some more, and review all code. I find that despite how busy that all sounds, I save myself weeks of effort.

> I lean very heavily towards outcome based prompting. Say exactly what do you want achieved and then maybe give some constraints, ie. what definitely not to do.

I do this when writing stories/projects/issues/epics for humans. Works great.

If you read any management book published in the last ~70 or so years, you’ll find that “Make sure people understand the goal” is the ultimate hack. They even use this in militaries!

“go take that hill” works a lot better than “walk 50ft to the right and shoot at those bushes”. You always get what you ask for :)


Anecdotally, I think this is a much bigger cleanup than just talking to the administration.

I think there's a wider damage done which there is no coming back from. And this is for the USA, not Anthropic.

The chance that sovereignty and rules such as this could be applied to AI was a concern that a lot of people had, but the risk was unknown.

Speaking for myself, I had guessed this would happen at some point of time. I was expecting/hoping it'd be years away.

However given the events in the last few days it greatly increases my concern with building any product which can depend on an on an API which could go away for a number of my customers.

I've been experimenting with other open weight models hosted in favorable sovereign countries for a few months, but this accelerates something which was an experiment to now being a must-have.

I don't think it is going to be easy for any of the parties to repair this easily.


I see it exactly the same, self hosted LLM will be the future. They may not be SOTA, but it is better to have a Trabant in your garage than being denied use of shared Ferrari, because somebody outside of your control had hissy fit.

I like a good car analogy, but it doesn't hold up in this case. What am I going to do with a Ferrari that's durable and makes me money, the same way that Mythos/Fable can output me better code than Opus 4.x? I mean I guess I could take a picture of me in a Ferrari, and then when I show off that picture but can't produce the Ferrari, I look dumb, but Mythos generated code/artifacts are still downloadable and runnable, if you got far enough before the cut off. Judging from the quality/availability of the few days we had it, it's a shared resource anyway. The usage limits were too low to give everything to Fable to do.

I think they’re going to have problems, because I’m sure they have a lot of foreign employees and it’s insane to think they now have to block them all from using their best models.

Back when encryption and PGP was the hot topic, similar things happened with sovereigns.

Interestingly I've had a similar experience with agent teams/swarms, albeit they can get much more expensive depending on the workflow.

I found that Fable didn't have as much of an impact when put in a team.

But it was/is a very pleasant model to work with 1:1. And was the first time I didn't use my primary team based workhorse in months, across 10s of sessions last week.


Compute was constrained. There is a lot happening, especially with chinese chips which currently points to a massive upcoming increase in non-US capacity.

ex: https://www.youtube.com/watch?v=8ekndZwyOzo


They are export controlled in most cases as well.

Also, the EU, Japan, SK, ASEAN, and India are not supportive of using Chinese tech after China export controlled rare earth exports last years [0].

Software supply chain regulations also make utilizing Chinese software risky for ExChina players and make using ExChina tech risky for Chinese players.

Expect to see RFCs now demanding visibility into what models are used and right of refusal - this is already the norm in F1000s. Similar ones are likely to arise in the EU as well with some of the upcoming industrial policy changes being proposed.

[0] - https://www.reuters.com/world/china/china-is-making-it-harde...


This feels more and more like a marketing/scarcity play for the largest global corps.

Will likely give them time to expand capacity as well. And make them harder to dislodge in these orgs.


To me this makes little sense — I can’t imagine the orgs they have limited this rollout to don’t already have Claude subscriptions and integrations. And sure this may play nicely into branding a build a mystique around the model but ultimately they are missing out on a ton of revenue and risking being totally front-run now that model performance parameters are out and people have firsthand experience. Feels more like a fairly genuine attempt to be responsible. They could have easily rolled out an update and done some PR to absolve themselves of responsibility

Urgency x scarcity, unbeatable marketing move.

It is really good. Will also cut through the common procurement, legal and change management processes seen at these orgs.

Genius^2

I go back and forth on this one.

Yes, I'm with the author. I'm absolutely sick of constantly reading AI content.

But if I have to really dig into it deep, a lot of the people who send me AI content now, weren't sending me anything meaningful to begin with (pre ai).

The number of organizations I have been around where most people just copy paste each other's messages is no joke. This was happening long before AI came along. AI has just made it so much more obvious.

Previously they might have copied it from Joe in Product. Now it all sounds like Claude or GPT.


I was half expecting/hoping this would talk about the opportunity cost of owning a home. More precisely, owning a primary residence.

There are quite a few studies about this, but it is something which is not discussed broadly enough. But there is a inverse relationship between home ownership and income.

Because for most regions across the globe, once someone buys a home, they start looking for work that's in geographic proximity to their primary residence. And in most parts of the world, incomes generally tend to stagnate since higher paying jobs are almost always away from where people live.

Now a lot of people believe that they will pick the higher income, but the amount of logistics which goes into thinking about selling your home or renting it even often dissuades people from trying to look for a job which pays significantly more.

Interestingly, some multinational companies that I know of facilitate the entire transaction for their executives and senior managers when they need to move cities or countries because of this effect.

Owning a home for the purpose of investment and not living is a different matter, and the same effect isn't seen there.


I think the main thing which a lot of these articles miss is it's not just your Agents.md which can give you a model upgrade or the inverse.

But everything your harness looks at could be this. So the skills in your code base, the commands that you've added, the memories that were auto created, they all work towards improving or completely destroying your productivity.

And most of it is hidden. You hear people talk about this all the time where they'll be like, Oh, I use GSD or I use Superpowers and my results have gotten worse.

Your results might have gotten worse precisely because you use them (along with your memories and other skills).


It's funny how often my brain gets redirected to German Shepard Dogs when reading conversations about AI. And I still can't remember what the other meaning of GSD is supposed to be without an Internet search.


What do you mean that looking for SMB (Samba) related posts on HackerNews gives results about SuperMarioBros?!


Yes. I agree. It is not just AGENTS.md.

I got myself a Strix Halo system and a GLM coding plan. The qs to self was: what can I do when tokens are essentially unlimited? The opaque-ness of what my harness is, and how it grows over time & use, when using projects out there, makes it hard to know what is helping and what isn't.

Clearly, the harness, together with LLMs has utility. Yet, I can't help but feel that ... at times, I am struggling with the classic "explore-exploit" problem. Or one between having the system be deterministic and when it should be less so. When is my system in a local minima (and needs a good kick out of it, automated if possible), and when it is at a good place in the "global" state-action space.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: