Hacker Newsnew | past | comments | ask | show | jobs | submit | alecco's commentslogin

> 4× RTX PRO 6000 Blackwell Workstation (GB202, 96 GB GDDR7, 600 W)

Those are SM120 so no tmem/tcgen05 and lack of support in main libraries (it's like everybody is focusing on B300/SM100).

For that money I'd buy a single B300, similar total AI TOPS, similar GPU bandwidth aggregated, and only 25% less total memory (probably saved in less implementation complexity), half the energy consumption...

Also by having all SMs local they have the special L1-level interconnect. SMs can collaborate on the same GEMM. And a bunch of other nice features.

Or, you know, rent it.


Yes this has been on my mind as well. But this was built one at a time but still overall very happy with them

Bellard seems to be at the extreme tail of the distribution of talent x grit/perseverance.

Could be that the generated regression tests create actionable exploit code.

They are hands down the best at serving their model via API. Currently due to caching and their special Engram technology.

Good summary:

https://xcancel.com/jukan05/status/2066752564809605584

DeepSeek Completes First External Funding Round, Raising $7.4 Billion

- DeepSeek has completed its first external fundraising round, raising more than RMB 50 billion (approximately $7.4 billion). The company was reportedly valued at over $50 billion.

- Investors are investing through a limited partnership (LP) managed by CEO Liang Wenfeng, rather than directly into DeepSeek itself. This structure allows Liang [CEO/founder] to maintain absolute control over the company.

- The China National AI Industry Investment Fund is the sole exception, investing directly in DeepSeek and receiving voting rights. The fund has committed RMB 1 billion.

- Ordinary investors receive no voting rights, but gain access to financial information and priority rights in future financing rounds. All investor stakes are subject to a five-year lock-up period, designed to discourage short-term profit-taking.

- Major investors include Liang Wenfeng (RMB 20 billion), Tencent (RMB 10 billion), CATL (RMB 5 billion), JD.com, NetEase, and IDG Capital (RMB 3 billion each).

- DeepSeek’s management reportedly conducts *due diligence* on the identities of LPs backing investment funds, aiming to prevent the entry of unwanted investors.

- DeepSeek had previously operated without external funding, but rising compute costs and intensifying competition for AI talent have increased the need for capital.


Another CEO with AI psychosis [1]. LLMs are not true AI, they lack common sense (or whatever it's properly called). LLM-based systems still need somebody with deep domain knowledge at the wheel to keep them from doing stupid things. It's like an alien-made bicycle that gets you to the speed of sound if you are an Olympic cyclist.

Still, LLMs are extremely powerful pseudo-AI [2] and will bring a pseudo-singularity. But the impact is still scary if a tiny fraction of humans are augmented 1000x. And as better models become exponentially more costly, only the money people will be able to afford the new models. This is a very likely scenario and scares me to the point I dropped all my projects to work on affordable LLM-based tools to make the difference at most 10x instead of 1000x.

To my elder relatives I explain it like: imagine we are farmers in the 17th century and suddenly out of nowere John Deere tractors, combines, etc. become available. But they cost more to run than all you and your fellow farmers have, so only a tiny handful of rich people take over everything.

[1] https://en.wikipedia.org/wiki/Chatbot_psychosis

[2] https://en.wikipedia.org/wiki/Chinese_room


> Another CEO with AI psychosis [1]. LLMs are not true AI, they lack common sense (or whatever it's properly called).

If you read the article, you'll find that the it indeed relies on this claim:

> Within a couple of years, possibly much sooner, AI may achieve so-called closed-loop recursive self-improvement (RSI): the capacity to rewrite its own code to become more capable, without human intervention. Should that happen, the result could be an intelligence explosion of a kind for which there is no precedent and no map.

This may be optimistic and/or simplistic, but not impossible.


>the result could be an intelligence explosion of a kind for which there is no precedent and no map.

Or it will simply reach the end of the softmax faster.


> ... they cost more to run than all you and your fellow farmers have ...

But the financial situation is that running a single AI agent costs significantly less than you'd have to pay a human to do the same task.

And I don't see what you're getting from The Chinese Room - that thought experiment relies on there being no external difference at all, right?


DNNs/LLMs can only predict next tokens based on training data. They often make big direction mistakes as they are particularly bad at common sense. Kind of like the Paperclip Maximizer scenario. They need a human with deep knowledge to drive them and to catch them when they go off the rails.

"Next token prediction" isn't a system. It's an interface a system uses. Nothing precludes an arbitrarily simple or complex behavior from producing a token logit.

And with what we know of LLMs? Autoregressive transformers are Turing complete in theory, and we are yet to find anything that LLMs are "fundamentally incapable" of in practice. Even continuous learning is already approximated with in-context learning - both allow a system to learn from prior experience, both have practical limits on how far they go. That's what powers "trial and error" in today's agentic LLMs.

"LLMs can only predict next tokens based on training data" is comforting but misleading. It just isn't the saving grace you want it to be. It describes an interface, not a ceiling. And if there is some sort of fundamental "capability ceiling" that LLMs are heading towards, we are yet to see it. We know plenty of things LLMs are bad at, but they keep getting less bad at them release to release.

If there is none, then, simply improving over the current recipes iteratively might yield systems that only "need a human" in the same way you "need" to have a boss. Maybe less so.


>DNNs/LLMs can only predict next tokens based on training data.

How do they decide between using 'a' or 'an'?


I don't get the argument; how do you decide between using 'a' or 'an'?

You use 'an' when the word that comes after it begins with a vowel.

They pick random top-k next token based on their amazing 4chan/reddit training data, duh.

The smart path: Find good mentors (and return the favor); use LLMs not to do the work but to help you learn and exercise your brain: make them test you, using something aking to teacher/Socratic method, make mistakes and get the mentor/LLM to review in a way you figure out the answer.

I just found this comment from 15y ago on the ffmpeg/libav drama: https://www.reddit.com/r/linux/comments/vvdxn/comment/c57zdk...

I don't know ffmpeg but this resonates with my experience with other open source projects.


Sounds about right. Don't know about the internal politics around the original maintainer but the libav folks never seemed right to me. I was glad at the time that the distro I was using left the choice up to the user.

As far as the accusations against both rejecting patches and/or rewriting the code themselves goes I can empathize. It's not always easy to take on maintenance of code that isn't written like you want it to, even if the difference is ultimately immaterial. Sucks when this happens to a fundamental project that is used everywhere though. A good maintainer does need to have some ego but not too much it seems.


I know there were several similar posts in the front page, but this one is much better than all the ones I read so far.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: