More

sgsjchs · 2026-06-13T11:53:03 1781351583

Make multiple nodes do the same job, compare results.

sgsjchs · 2026-04-21T11:37:58 1776771478

You have it backwards.

This formal game-theoretic notion of fairness acknowledges that power disparity exists and that having less power than your counterparty allows them to inflict greater disutility on you without you being able to inflict disutility on them in turn to discourage this.

On the other hand, fairness "in the usual sense", pretends power disparity doesn't exist and that, say, an armed robber is not allowed to take your stuff when you have nothing to defend yourself with. Which in reality only works as long there is a powerful third party (the state) that will inflict disutility on the robber for it.

sgsjchs · 2026-02-07T21:02:48 1770498168

Does it really matter that English is not as precise if the agent can make a consistent and plausible guess what my intention is? And when it occasionally guesses incorrectly, I can always clarify.

sgsjchs · 2026-01-24T19:46:43 1769284003

It's the other way around.

sgsjchs · 2025-11-30T13:09:35 1764508175

The trick is to provide dense rewards, i.e. not only once full goal is reached, but a little bit for every random flailing of the agent in the approximately correct direction.

thegeomaster · 2025-11-30T13:39:28 1764509968

Article talks about all of this and references DeepSeek R1 paper[0], section 4.2 (first bullet point on PRM) on why this is much trickier to do than it appears.

[0]: https://arxiv.org/abs/2501.12948

Jaxan · 2025-11-30T14:04:24 1764511464

How do you know the correct direction? Isn’t the point of learning that the right path is unknown to start with?

jsnell · 2025-11-30T14:52:32 1764514352

The correct solutions and the viable paths probably are known to the trainers, just not to the trainee. Training only on problems where the solution is unknown but verifiable sounds like the ultimate hard mode, and pretty hard to justify unless you have a model that's already saturated the space of problems with known solutions.

(Actually, "pretty hard to justify" might be understating it. How can we confidently extract any signal from a failure to solve a problem if we don't even know if the problem is solvable?)

robotresearcher · 2025-11-30T18:41:33 1764528093

Your hard mode is exactly the situation that RL is used, because it requires neither a corpus of correct examples, nor insight into the structure of a good policy.

> How can we confidently extract any signal from a failure to solve a problem if we don't even know if the problem is solvable?)

You rule out all the stuff that doesn’t work.

Yes this is difficult and usually very costly. Credit assignment is a deep problem. But if you didn’t find yourself in a hard mode situation, you wouldn’t be using RL.

sgsjchs · 2025-10-10T09:21:48 1760088108

I, too, enjoy the craftsmanship, but at the end of the day what matters is that the software works as required, how you arrive at that point doesn't matter.

AdieuToLogic · 2025-10-11T01:01:17 1760144477

For me, it is not a matter of craftsmanship so much as a repeatable approach for growing the minds of junior engineers such that they have the best chance to succeed.

sgsjchs · 2025-10-09T23:48:27 1760053707

> I still don't understand this decision.

Variable declaration `T v;` means "declare `v` such that expression `v` has type `T`". Variable declaration `T *p` means declare `p` such that the expression `*p` has type `T`". etc.

kopirgan · 2025-10-10T03:14:14 1760066054

Nice explanation!

sgsjchs · 2025-10-06T16:42:34 1759768954

But in C that's just syntax sugar for pointer math.

uecker · 2025-10-07T06:01:53 1759816913

It still makes it possible to have bounds checking. (And it is also not true anymore for C2Y.)

pjmlp · 2025-10-06T16:46:57 1759769217

Except it is more obvious what is the intention, it is about clarity to the reader.

1718627440 · 2025-10-06T21:50:08 1759787408

My point was indeed, that if you don't use pointer arithmetic in C, that means that you don't use arrays. I mean when you declare arrays of a fixed size, you can also declare an equivalent number of primitive variables instead, but I would find that inconvenient. Hence the question.

smj-edison · 2025-10-06T22:38:10 1759790290

If I remember correctly, he meant that only array accesses are used, because their length can be checked (as all arrays have a static length due to no dynamic memory).

uecker · 2025-10-07T06:05:28 1759817128

Indeed, this is what many people do. But even if you use dynamic memory, if you replace pointer arithmetic by array indexing, you get bounds checking. And in C this also works for arrays of run-time length.

1718627440 · 2025-10-07T22:16:30 1759875390

But can't I put any pointer arithmetic in array brackets, so it wouldn't limit anything?

uecker · 2025-10-09T16:02:33 1760025753

Whatever index you compute can be checked against a bound.

1718627440 · 2025-10-10T17:41:41 1760118101

2[a*b] What bound?

uecker · 2025-10-11T09:42:44 1760175764

This does not even compile. For array indexing,

array[expression]

if "array" has a bound whatever expression evaluates to can be checked against the bound of array. If "array" is not a bounded array but a pointer or an unbounded array, then this does not work, but my point is that it is easy to avoid such code.

sgsjchs · 2025-10-06T16:32:59 1759768379

You very rarely would actually want scalar types which don't map directly to hardware supported ones anyway.

sgsjchs · 2025-09-14T01:13:56 1757812436

Why would you want to store arbitrary individual passwords instead of deriving them with on demand from the service name/domain and a common secret?

snailmailman · 2025-09-14T01:16:11 1757812571

If you are doing that,

- what if some site has weird password requirements and the derived password doesn’t work

- what if a site gets hacked and you need to rotate one password.

If you have to store data per-site anyway because of those cases, may as well just store passwords. You can (and should) still generate extremely high entropy passwords.

merlincorey · 2025-09-14T01:18:20 1757812700

Additionally, you can store other data for example one could have scans of important documents that are stored in Pass which means they are GPG encrypted and backed by a git repository so they are versioned and shared across multiple machines.

lucb1e · 2025-09-14T01:57:25 1757815045

indeed. Additionally:

- if your secret leaks and you don't know it (or you do know, but you need some time to change it), the attacker not only gets the snapshot of your password manager but also can derive all future passwords you'll generate, or past ones you long forgot about

- there's no way to know what you've entered before, since it's stateless. With data stored in a manager, I know what username I used and can associate other data. If your uniqueifying input is the domain, and let's say HN would become hn.yc or whatever and you visit it again in ten years, you'd have to remember that hn.yc accepts the password of what you entered as news.ycombinator.com

I have to admit though, hash(name+secret)=password is so simple and beautiful that it draws IT people like a fine artwork draws visitors. But for me, that doesn't outweigh the practical issues

akerl_ · 2025-09-14T01:17:29 1757812649

Because the former works with any site and circumstance and the latter does not.

gmuslera · 2025-09-14T01:53:40 1757814820

Not all sites are safe, either by design or by people running them. Having a common secret+service name as password AND having at least one of those sites leaking your plaintext password could mean that your derivation may go public and all your other passwords and services fall because of that.

listeria · 2025-09-14T02:57:06 1757818626

presumably the derivation would involve a cryptographically secure, non-reversible function so as to not compromise the secret should one of them be leaked.

jibal · 2025-09-14T05:43:17 1757828597

"deriving them" != op<+>