Hacker Newsnew | past | comments | ask | show | jobs | submit | computerex's commentslogin

Maybe hackernews has older crowd. Windows was the defacto developer OS for a very long time.

HN definitely has an older crowd, but additionally I've noticed HN is increasingly dominated by Europeans from 5am-2pm PST. "American" HN seems to kick off around 3pm-10pm PST now.

Even the HN dataset on HuggingFace shows that most engagement on HN is now during non-US hours [0] and drops off as Europe goes to sleep.

[0] - https://huggingface.co/datasets/open-index/hacker-news


Nothing compares to Opus when it comes to "taste" in web design in my experience. Nothing compares to opus in very difficult HPC/model inference development. I worked on this with opus: https://github.com/computerex/dlgo

OpenAI was offering 2x usage at one point and I still used opus just because it's so much more effective.


Which Opus?

Anthropic has been releasing models named Opus since 2024 with Claude 3 Opus.

Opus has gotten vastly more capable since then.

Local model far surpass Opus 3. They even surpass Opus 4 on most benchmarks.

Sure, if you compare to the latest Opus 4.8 or even 4.6, they're not there yet. But there's a huge difference in performance between 4 and 4.8.


Can't speak for anyone else but there was a step change in frontier models last November. Opus 4.5 and GPT 5.2 I think.

When I colloquially say Opus level I really mean Opus 4.5 or later


Right. Local models haven't quite hit that level yet. The biggest open models, which you need tens of thousands of dollars of hardware to run at reasonable speed, have pretty much hit that level of capability, but most models you can reasonably run at home aren't quite there yet. But given the gap, if local models keep improving, you'd expect to maybe see that level by this November.

My understanding is that we could in fact run the largest models on "reasonable" home hardware by focusing on throughput rather than raw speed and having them do unattended inference in large batches. The big proprietary suppliers have no interest in this because their own incentive is to fill all the physical space available with top-performing hardware and doing huge amounts of inference as quickly as possible. A home user with limited hardware investment has very different constraints.

For HPC/ai work opus blows gpt away, it’s no competition.

As someone who just spent the last three days (tried using both, ended up using mostly Codex) implementing DiffusionGemma in Rust, I think they're more or less equal when it comes to machine learning and AI. They get stuck at different points, but wouldn't say one is a clear winner over the other. HPC I have no idea so I'll take your word for it :)

The flip side is that benchmarks are gamed even by the top labs. Benchmark performance doesn't necessarily correlate with real world performance.

Again correct but it overstates the issue. I can say labs don’t want this. This happened arguably unintentionally in Metas llama 4 release, it went horribly, heads rolled, and like several billion dollars were paid for new talent and the org that built llama 4 was destroyed.

Evals come from a million places and new evals and robust perturbations of existing evals abound. They test a variety of tasks in a variety of ways. All of them individually are flawed. Taken together the aggregate signal is highly useful as you more or less marginalize over a lot of different things. Not to mention these companies have plenty of proprietary internal measurements, they build benchmarks themselves to probe their models and then also have flywheel traffic and A/B tests.

You are right to call out benchmarks but to dismiss them or not take them seriously is a mistake.


Listen, you can say “but benchmarks, the benchmarks!” all day long, but consumer know when we are being sold a lemon. If it can’t do the most basic of things at least as good as it used to, this is table stakes. Nevermind that if you can’t do the basic stuff, how on earth can you be trusted with more?

And you can say “If it can’t do the most basic of things at least as good as it used to, this is table stakes” all day long while people point you to much better evidence to the contrary too, I’d rather be on the other side of that.

Listen. I don’t care about evidence. I care about my lived experience for the product I paid for. I used the new product. It’s actively terrible. To the point of not being usable. We’re all ancedata, but what is “better evidence to the contrary”? The known and game-able benchmarks that they know they need to win at, so they train it to. It’s all he said, she said, which is the only reason we keep having this conversation.

Yea but it’s not right? You or I or the myriad of other institutions inside and outside of academia can probe these models with an evolving landscape of evaluation sets, even those unavailable to the developers. It’s just ignorance to claim benchmarks are somehow useless or all being gamed. You choose your tools in the way you want, but just don’t call it somehow better than a myriad of more carefully constructed setups and scaled evaluations.

Not true. Togetherai, deepinfra, fireworks AI offer a wide range of models like gpt oss that are very capable and far cheaper than the models from big 3.

I'm referring to Chinese open source models hosted on American clouds vs Chinese clouds. You're talking about an old and non-agentic capable American produced model.

You are actually referring to open weight models, not open source. Gpt-OSS is an example of an open weight model. It’s highly capable in agentic settings, people use it for coding all the time.

My greater point remains. Models like the qwen variants, minimax, k2.5, glm models are available by American providers like AWS at a much cheaper price than api offerings from the big three LLM providers.

Your point about Chinese models being cheap only on Chinese hardware makes absolutely zero sense. You can check out the model catalog like together ai’s qwen 3.5 9b offering. It’s 25 cents for 1M tokens vs the ridiculous $5/1M tokens for haiku.


Not a great example: Qwen 9b is a tiny model that outputs barely coherent text in a casual chat, nowhere near comparable to Haiku. But the broader point stands.

I am not sure if you are testing qwen 3.5 9b. I would also verify that you are running it correctly. Qwen 3.5 9b is actually a very capable coding model that can do agentic coding albeit it’s obviously not as good as opus.

You can look up the benchmarks on that model as well. Your experience does not align with mine.


Are they better? Are they better than GPT5.5?

That depends on the use case. For a lot of business use cases they are good enough. They are certainly better than older models like gpt-4o.

I feel like that’s already becoming true. I sometimes work on problems/projects where the AI agent is definitely more qualified than me to call the shots.

For example, this library here for deep learning is 100% ai generated and far beyond my technical capabilities.

https://github.com/computerex/dlgo


I find AI a great scaffolding for improving understanding and mental models. BUT! It's all in how you use it.


The real question is: Do you need to understand it fully for it to improve your life?

For example, if you're in fundamental science (or generally a fan of reductionism), it for sure would be nice to understand the universe instead of just having access to an AI that can comprehend it. But to the majority of the population it only matters that someone (or something) understands it enough to make it useful to others.


Understanding everything fully is futile. But there are many many many things that by understanding you improve your life. So, I feel the question is... not useful, I would say. Yes, you need to search for things that if you knew them you would improve your life. No, you can not know them all beforehand. Yes, there are such things. There always are.


They only improve your life if you actually work on something that you yourself are trying to improve. Most people are fine with the status quo, so if something like LLMs can take over the understanding of complex tasks, they won't even notice, except for the fact that more of these tasks will get done.


There are clearly things to understand more than just the immediate stuff you do for work. I think most people are thirsty for understanding, it's just... many times it's in other domains than you expect.


Reminds me of a Carl Sagan quote, that our society is built on science and technology yet few understand it.


LLMs are a mirror of the user‘s input capabilities, like every other computer programme.


I have never had good experience with any Google models in coding. Particularly for coding hard stuff, there is a night and day difference between Opus/Gemini in my experience.


That’s a western perspective because we are spoiled and have no thought for sustainability.

Please take a look at poor countries of the world like Pakistan. They have a repair culture. They have vehicles from the 80’s out on the road doing daily driving work instead of being used as vintage show pieces. It’s a poor country, this is a necessity. But nevertheless seeing the repair culture there in contrast to the disposable culture in the western world makes me pause.


This... I wonder why isn't there a market in Tijuana, Juarez and other border towns for fixing broken electronics and similar appliances.

Here in Mexico there are plenty of "unofficial" laptops/mobile (Apple, Windows, Androids) repair shops that even receive your device by DHL/UPS, fix it and return it. Because the labor costs are low enough to make it worth. The only downside is that most of the spare parts are imported from the US.


In Western countries, the time of skilled repairmen is better spent repairing things which are much more important and expensive than consumer goods.

And a consumer usually has a much higher return from working in his specialized field to earn money and buy a new product, than spending time with difficult repairs of a broken product.


Yeah, this is entirely a function of labor costs. If you want your stuff repaired, ship it to a low-labor cost economy or hire someone to whom it’s worth the time.


To add to that; labour should be expensive. And lower repairability of consumer goods is a side effect that is worth dealing with for that benefit.


Just to take one step further, labor costs are largely a function of local real estate costs.


> labor costs are largely a function of local real estate costs

Difficult to determine causality in that system. All we can say is places with expensive labour tend to have expensive real estate. (The confounding variable, I imagine, is immigration.)


The practice of planned obsolescence means that there is more to it than just that.


The site crashes on my 2020 iPhone SE.


The product is for Android users.


That's fine, wish the site worked so I didn't have to be told that.


Okay, now I will be supporting Azure products and will try to bring them into my workplace over AWS/Google Cloud.


Why? Microsoft probably just hasn’t prioritized nimbus participation over their other construction work. They probably haven’t yet constructed the correct subsidiary structure or key sharing agreements that allow them to participate either.

Sooner or later they’ll participate. And then you would have moved your workload for no reason.


I wouldn't be so sure. The departure of these guys only opens new room for less 'pro-ethics' corpos to replace them.


The reason cited for this whole fiasco is that some of the Ministry of Defense's genocide work could be performed by servers in the EU, which could expose Microsoft to legal or regulatory issues.

It's not that Microsoft was against this, it's that Microsoft was against themselves getting in trouble for this with the EU.


Well they did put in their contracts with the Israeli government that their services can't be used for mass surveilance which makes them slightly less evil than Google/Amazon.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: