Hacker Newsnew | past | comments | ask | show | jobs | submit | boristsr's commentslogin

It's absolutely mind boggling to see claims of model distillation being theft, a class of attack, and all sorts of claims all the while Meta is in court for copyright violation, anthropic has had to settle a case with authors. With distillation "attacks" at least they paid API fees.

Anthropic had to settle with authors because they literally pirated books! Their behavior regarding distillation is genuinely beyond parody.

There are 2 things worth separating.

1) China distills and is therefore morally bad.

As you rightly point out, that's not a great argument.

2) China distills and is therefore possibly not that competent.

I think that makes sense. If they only catch up to the frontier through distillation then 1) Their model will never be as good as the model they are distilling from. 2) They will never reach the frontier - they need someone else to do it first.


This is literally a repeat of the whole “China only make low quality cheap stuff” argument.

“All they do is copy.”

And now, oops they are world leaders in EVs, batteries, solar, drones, just to name a few on the biggest consumer facing things.


"Success leaves clues"

You gotta start somewhere and you can start at page 1 or page 10 and that time, energy and cost you saved starting 9 pages later can be put into making whatever it is you're building better than the original.

The US, and every other country, is full of derivatives or straight up copies. No one is getting super mad at the generic cheerios at the grocery store. It's hypocrisy.


>2) China distills and is therefore possibly not that competent.

I think deepseek at least has done enough innovative work that you could grant them a baseline of competency.

In general, there are enough papers coming out of China to suggest that there are quite a few people there who know what they are doing.


You're correct and I shouldn't have used the word competent. Perhaps "and is therefore not elite enough to be state of the art"?

I also have a soft spot for deepseek because they write such readable papers. I don't have a degree in anything but with a little work I can understand their papers - which I really appreciate.

But I still think my point stands - if you need distillation you won't be SOTA


> China distills and is therefore possibly not that competent.

I heard that argument more than one year ago, when chain of thought and reasoning cycles started to be hudden to protect against distillation.

Meanwhile, models as DeepSeek and MiMo are nothing short of excellent nowadays.

Ever since I switched away from OpenAI to DeepSeek I never felt the need to go back.


Deepseek Flash V4 really was a "holy shit" moment and deserves the praise/hype it's been getting from users. I have a multi-tier subscription strategy I've maintained for the last year of: 1. $20-$30 plan from first Claude now Codex for "SOTA" 2. Gemini via the extra $10/mo or so from my Google One plan 3. a cheap fallback plan.

Together it gives me plenty of head room/model performance for $40ish/mo, plus letting me compare the various models over time.

Originally I'd been using the Z.AI plan (that I'm still grandfathered into for <1 yr) as my cheap plan but wasn't keeping up with the SOTA progress and is slow/limited now. So I subscribed to the Opencode Go plan and use Deepseek Flash V4 almost exclusively and it is insane how much usage I can get for $10/mo.

I did the math on my Flash usage vs. what I'm paying Opencode and I'm typically not even exceeding $10 in API costs! So it's actually sustainable not rugpull pricing at least for me. I can pound it with requests/agentic loops and have it running for 30 min doing whatever the fuck and check back and have spent literal pennies for what would have cost $30+ on my work's Github Copilot plan.

I know enterprise world works under different rules and isn't price sensitive in the same ways as an individual but I truly don't see how this is sustainable for the US AI giants in the long term to maintain like 25x+ markup for 1.25x performance benefit.

IMO it does help explain the recent emphasis on secret, scary "super models" like Mythos to muddy the waters for decision makers with hype and FOMO at at time when companies are beginning to seriously scrutinize their token spending for the first time.


Man, I decided to try DS with a healthy dose of skepticism.

I canceled ChatGPT because I would be on vacations. Codex was pretty great, but I thought "Let me put 10 bucks on Deepseek API and plug it into Claude Code".

I was completely blown away. I found it even better than Claude or Codex. And those 10 bucks? It lasted for more than a month.

I don't see myself coming back to Claude/OpenAI.


Agreed. I’ve cancelled all plans except OpenCode Go. OpenRouter for API spend. Feels so nice to a) not feel like I have to code when I don’t want to just because I need to make my subscription worth the cost and b) know that this level of performance won’t be yanked away. Super pleased with DeepSeek V4 Pro.

I'm simply a postgrad from a no name university, but I'm not sure that the future frontier will come from the current approach from OpenAI and Anthropic. Distilling just seems like another avenue to collect useful data, like using books or scrapping the human-net, not necessarily copycat behavior. Chucking another 1000 TWh and scaling with already pillaged human-output is having diminishing results.

The next "frontier" as in, an order of magnitude higher model capability, might come from eschewing the bitter lesson and trying to be clever instead of pilling on GPUs. (I'm sure US labs also try clever new architectures) Maybe the plateau is permanent in the medium term. The frontier will be the exercise in taking the 10T monster models into something that can be run locally with minimal degradation.

That jibes a lot more with "AI is just new tech" attitude than the (genuine or otherwise) "we will build the Godhead in our image".


Deepseek models are on the Pareto frontier of cost/performance. Thats the far more important one than just making a top scoring model.

I think it depends on the depth of the summary, and the purpose. You can do quite an indepth analysis as part of educational material for example, which is one of the tests of fair use.

I think a key thing to remember when assessing your own liability is fair use is a defense, not an automatic guaranteed right for blanket uses.

Leaking spoilers of unpublished works can definitely cause market harm, and serves no wider good for the market the same way educational material would.

I wouldn't like to be on the receiving side of this lawsuit. At the very least it's going to be expensive to defend against.


That's the rub. When it comes to copyright, money makes right. The one with more money and willingness to go to court will win. Not who is actually legally right.


That's not just copyright, it's our entire legal system. A corporation can intentionally murder hundreds of thousands of people and get nothing but a slap on the wrist fine.


Yes please!


That's not true, or they wouldn't have settled for 1.5bln specifically for training on pirated material.

https://apnews.com/article/anthropic-copyright-authors-settl...


As I said, the initial piracy was an issue. That is what they settled over. Your link covers this:

> A federal judge dealt the case a mixed ruling in June, finding that training AI chatbots on copyrighted books wasn’t illegal but that Anthropic wrongfully acquired millions of books through pirate websites.

With more details about how they later did it legally, and that was fine, but it did not excuse the earlier piracy:

> But documents disclosed in court showed Anthropic employees’ internal concerns about the legality of their use of pirate sites. The company later shifted its approach and hired Tom Turvey, the former Google executive in charge of Google Books, a searchable library of digitized books that successfully weathered years of copyright battles.

> With his help, Anthropic began buying books in bulk, tearing off the bindings and scanning each page before feeding the digitized versions into its AI model, according to court documents. That was legal but didn’t undo the earlier piracy, according to the judge.


I understand your thoughts. I've had similar motivation problems about blogging since the release of ChatGPT. Feels like you are writing for a machine rather than readers. Definitely seen a decline in readers since December 2023 on older articles that previously had steady traffic for years.

Also, I just purchased LazyVim For Ambitious Developers. I've used the online edition a number of times in recent months. Thanks for your work!


Excellent deep dive and explanation of the process of tracking down and fixing it. Thanks for sharing it, it was a fun read. Will definitely keep this in mind next time I fire up farcry for some nostalgia!


This reminds me very much of the fun of the "old internet", late 90's early 2000s geocities fun! Love it.


That's not quite true. There is support for "strict" tables which does have more stringent rules around types

https://sqlite.org/stricttables.html


Mongo DB also disables safety features by default to improve their benchmarks. But Mongo DB gets lots of criticism for that and SQLite gets none.


Nanite is a lot more than just a continuous lod system. The challenges they needed to solve were above and beyond that. Continuous lod systems have been used for literal decades in things like terrain. The challenges for continuous lod for general static meshes are around silhouette preservation, UV preservation and so on. One of nanites insights was that a lot of the issues around trying to solve automatic mesh decimation without major mesh deformation/poor results just disappear when you are dealing with triangles that are just a few pixels (as little as single pixel triangles) in size. The problem with small triangles is a problem called quad overdraw, where graphics cards rasterize triangles in blocks of 2x2 pixels, so you end up over drawing pixels many times over which is very wasteful. So the solutions they came up with in particular were:

- switch to software rasterization for small triangles. This required a good heuristic to choose between whether to follow the hardware or software path for rasterization. It also needed newer shader stages that are earlier in the geometry pipeline. These are hardware features that came with shader models 5&6.

- using deferred materials which drastically improves their ability to do batched rendering.

It's actually the result of decades of hardware, software and research advancements.

The 2 solutions posted in recent days seem heavily focused on just the continuous lod without the rest of the nanite system as a whole.

Also yes, there were also challenges around the sheer amount of memory for such dense meshes and their patches. The latest nvme streaming tech makes that a little easier, along with quantizing the vertices which can dramatically lower memory usage at the expense of some vertex position precision.


There are also pros and cons to this method of rendering, in terms of performance. The triangulation cost imposes a significant overhead compared to traditional scene rendering methods, though it scales far better with scale and scene detail. For that quality of rendering, making it viable requires a good amount of memory bandwidth and streaming speeds only possible with modern SSDs.

So it’s only really practical because GPUs have the power to render games with a certain level of fidelity, and RAM and SSD size and speeds for consumer gear are becoming capable of it.

Also there are significant benefits for a developer, especially if using photogrammetry or off-the-shelf high-detail models like Quixel scans, so there’s a reason Epic is going all-in.


Thanks to both of you for the detailed explanation!


Nanite does a few things:

- continuous lod as this library does

- software rasterization for small single pixel triangles which reduces quad overdraw

- deferred materials (only material IDs and some geometry properties are written in the geometry pass to the gbuffers, which things like normal maps, base colour, roughness maps, etc being applied later with a single draw call per material)

- efficient instancing and batching of meshes and their mesh patches to allow arrows of objects to scale well as object count grows

- (edit, added later as I forgot) various streaming and compression techniques to efficiently stream/load at runtime and reduce runtime memory usage and bandwidth like vertex quantization etc.


I see, thank you!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: