This formal game-theoretic notion of fairness acknowledges that power disparity exists and that having less power than your counterparty allows them to inflict greater disutility on you without you being able to inflict disutility on them in turn to discourage this.
On the other hand, fairness "in the usual sense", pretends power disparity doesn't exist and that, say, an armed robber is not allowed to take your stuff when you have nothing to defend yourself with. Which in reality only works as long there is a powerful third party (the state) that will inflict disutility on the robber for it.
Does it really matter that English is not as precise if the agent can make a consistent and plausible guess what my intention is? And when it occasionally guesses incorrectly, I can always clarify.
The trick is to provide dense rewards, i.e. not only once full goal is reached, but a little bit for every random flailing of the agent in the approximately correct direction.
Article talks about all of this and references DeepSeek R1 paper[0], section 4.2 (first bullet point on PRM) on why this is much trickier to do than it appears.
The correct solutions and the viable paths probably are known to the trainers, just not to the trainee. Training only on problems where the solution is unknown but verifiable sounds like the ultimate hard mode, and pretty hard to justify unless you have a model that's already saturated the space of problems with known solutions.
(Actually, "pretty hard to justify" might be understating it. How can we confidently extract any signal from a failure to solve a problem if we don't even know if the problem is solvable?)
Your hard mode is exactly the situation that RL is used, because it requires neither a corpus of correct examples, nor insight into the structure of a good policy.
> How can we confidently extract any signal from a failure to solve a problem if we don't even know if the problem is solvable?)
You rule out all the stuff that doesn’t work.
Yes this is difficult and usually very costly. Credit assignment is a deep problem. But if you didn’t find yourself in a hard mode situation, you wouldn’t be using RL.
I, too, enjoy the craftsmanship, but at the end of the day what matters is that the software works as required, how you arrive at that point doesn't matter.
For me, it is not a matter of craftsmanship so much as a repeatable approach for growing the minds of junior engineers such that they have the best chance to succeed.
Variable declaration `T v;` means "declare `v` such that expression `v` has type `T`". Variable declaration `T *p` means declare `p` such that the expression `*p` has type `T`". etc.
My point was indeed, that if you don't use pointer arithmetic in C, that means that you don't use arrays. I mean when you declare arrays of a fixed size, you can also declare an equivalent number of primitive variables instead, but I would find that inconvenient. Hence the question.
If I remember correctly, he meant that only array accesses are used, because their length can be checked (as all arrays have a static length due to no dynamic memory).
Indeed, this is what many people do. But even if you use dynamic memory, if you replace pointer arithmetic by array indexing, you get bounds checking. And in C this also works for arrays of run-time length.
if "array" has a bound whatever expression evaluates to can be checked against the bound of array. If "array" is not a bounded array but a pointer or an unbounded array, then this does not work, but my point is that it is easy to avoid such code.
- what if some site has weird password requirements and the derived password doesn’t work
- what if a site gets hacked and you need to rotate one password.
If you have to store data per-site anyway because of those cases, may as well just store passwords. You can (and should) still generate extremely high entropy passwords.
Additionally, you can store other data for example one could have scans of important documents that are stored in Pass which means they are GPG encrypted and backed by a git repository so they are versioned and shared across multiple machines.
- if your secret leaks and you don't know it (or you do know, but you need some time to change it), the attacker not only gets the snapshot of your password manager but also can derive all future passwords you'll generate, or past ones you long forgot about
- there's no way to know what you've entered before, since it's stateless. With data stored in a manager, I know what username I used and can associate other data. If your uniqueifying input is the domain, and let's say HN would become hn.yc or whatever and you visit it again in ten years, you'd have to remember that hn.yc accepts the password of what you entered as news.ycombinator.com
I have to admit though, hash(name+secret)=password is so simple and beautiful that it draws IT people like a fine artwork draws visitors. But for me, that doesn't outweigh the practical issues
Not all sites are safe, either by design or by people running them. Having a common secret+service name as password AND having at least one of those sites leaking your plaintext password could mean that your derivation may go public and all your other passwords and services fall because of that.
presumably the derivation would involve a cryptographically secure, non-reversible function so as to not compromise the secret should one of them be leaked.
reply