More

BillStrong · 2026-06-04T00:01:04 1780531264

That is a relatively simple system. It cuts out use cases.

For instance, history, newer information is mixed with older authoritative information.

The same thing for religious institutions, where the older items may be the more authoritative for the purposes.

BillStrong · 2026-05-31T14:10:47 1780236647

Already predicting which versions to avoid, huh.

BillStrong · 2026-05-31T14:07:55 1780236475

Yes, much higher computation required to encode it, and decode it, both.

Caspy7 · 2026-05-31T15:21:39 1780240899

He only mentioned decode complexity. Would be interesting to know the average encode complexity compared to AV1.

kiicia · 2026-05-31T19:16:19 1780254979

Encoding speed even on Mac Studio is atrocious, it’s in range of single frames per second as opposed to realtime+ for even h265

F3nd0 · 2026-05-31T20:07:52 1780258072

The specification for AV2 has only been finalised very recently, so performant encoders have not yet been developed. Meaningful comparison to older codecs like H265 and AV1 will only be possible once that has changed. (It'll be slower, but almost certainly not one-frame-per-second slow.)

Dylan16807 · 2026-06-01T12:24:41 1780316681

Getting the full bitrate gains will be slower.

For any specific bitrate and quality target, there's a good chance it'll be faster.

BillStrong · 2026-05-31T10:08:37 1780222117

It is still ironic to me that my Steam Deck has decode AV1 acceleration, on a really old CPU/GPU combo.

shmerl · 2026-05-31T15:05:48 1780239948

AMD added AV1 encoding only in later SoCs though. Next Steam Deck will have both.

Narishma · 2026-06-01T16:32:43 1780331563

Why is it ironic?

BillStrong · 2026-05-31T10:07:19 1780222039

Essentially all of the processing of the video data, barring the container format which the CPU uses to know what part of the data to send to the GPU or the Audio chip or codec.

And HW acceleration is generally a preset baked in version of the encoder or decoder. These are mostly codec specific.

So, no using hardware from previous versions.

Now, you can see some software that tries to use the GPU itself, instead of the dedicated hardware acceleration, to decode, but that isn't the HW accelerated, and may not operate in real time.

At the same time, that will consume much more power, eliminating some of the advantages or the pure HW rendition, especially important for mobile.

I could see an argument being made for encoding, if it is 2x or faster than the CPU, but I haven't looked at any in a while, so don't know the speeds.

BillStrong · 2026-05-26T13:22:48 1779801768

They work better for coding workloads. Essentially, the more regular the output, the more the faster model gets right, the less the big model has to do.

Writing tends to have more false positives. I haven't tried this particular one, however, but that is the general trend.

BillStrong · 2026-05-26T05:07:02 1779772022

It does to an extent. One thing I will give AI, because of the nature of LLMs, you are essentially arguing with the median level of the input that trained the model. So, for someone new to the subject, you get access to patterns that will bring them up to a certain level.

Getting past that is problem we face now.

stuaxo · 2026-05-26T11:53:23 1779796403

That may well need more than the models, somehow put it better than me: these LLMs have no taste - nor can they as thins are.

BillStrong · 2026-05-23T15:53:45 1779551625

As long as you don't keep calling out to the CPU, that is.

Tool calling, searches, cache movement if used, and even debug steps all stall the GPU waiting for the CPU.

There was a test of turning one of the under 1B Qwen3+ models into a kernel that didn't stall by the CPU as one GPU pass that saw quite a bit f perf lift over vLLM, I believe, showing this is an issue still.

Its been a month, so I don't remember more details than this.

BillStrong · 2026-05-19T04:16:29 1779164189

Even in humans, negative stimuli carries more weight than positive, in the general case.

Without reading it yet, my first thought would be to test a general ratio, something similar to human interpersonal relationship ratios like 30% negative to mostly positive, and positive are targeted, such as reinforcement not just for the good job, but reinforcement for the improvement.

And ensure the negative is targeted, such that you point out tendencies to be avoided rather than just specific instances.

Of course, most human interaction online has none of this, so, would be hard to replicate.

sebastian · 2026-05-19T14:28:28 1779200908

Yeah, I like the ratio framing. That does seem like the kind of experiment you'd want to run next.

The thing I'd be curious to separate out is ratio vs density. The fiction examples were positive, but a lot of the tokens are still spent on normal story work. The targeted examples put much more of the training signal on the AI being in the relevant situation and choosing against the bad option.

That makes me think the next thing to test is not just the positive/negative mix, but how much of the data is actually about the failure mode.

BillStrong · 2026-05-18T17:53:48 1779126828

Look on HuggingFace, there is a template that is supposed to fix the updates for the Qwen Models.

https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates

Maybe will help you?