Interestingly, despite it being much more detailed and a lot more process and procedure than what I currently do - which is more akin to the version 0 described, but in parallel - we come up at the same final problem: reviews and quality assurance.
I sign off the code I merged, part of company policy but also just to be sure it is actually decent. But reviewing has become the real draining bottleneck: even stacked PRs, if that total 5-6k lines is not a 5min job. Even if I brainstormed and set the plan, that's really the part that doesn't scale right now for me in this. But the author is very shy about that: either the changes arent that big in the end or they trust the process enough to review in a more casual manner. Being equally untrusting I can't do that ...
For decades, engineers understood that large code reviews are harder than small ones. Out of both politeness and a desire to receive better code reviews, we learned to break our large changes into smaller chunks. Some engineers took things even further and replaced code reviews with pair programming. But then LLMs showed up and everyone seems to have forgotten those lessons.
They can be still be applied now using coding agents, if you're willing to push back against the default setup and change your mode of thinking a little bit. Of course it doesn't help that an entire industry is dedicated to persuading us that maximizing token spend is the only way to get shit done.
I appreciate this probably seems like an extremist take, but I wrote some more about it here in case there's anybody out there who identifies with it:
> They can be still be applied now using coding agents, if you're willing to push back against the default setup and change your mode of thinking a little bit. Of course it doesn't help that an entire industry is dedicated to persuading us that maximizing token spend is the only way to get shit done.
Yeah the problem is the executives and managers around us are demanding we ship massive features as quickly as possible, and I like having a job and dread having to find a new one in this market...
I think that's reasonable. My only gripe is that making small sets of changes is often faster to do by hand than waiting on llm reasoning, so I've found it amounts to very little speedup.
Proper review should take longer than writing it yourself, because you need to know the correct solution, understand the proposed solution, and evaluate the difference between the two. When designing it yourself, you just need to know the correct solution and write it, and with modern high-level languages and IDEs with autocomplete writing it is hardly a bottleneck.
I disagree under the following circumstances, which in my experience is the common case: You don’t know from the outset all relevant considerations that go into implementing something. Coding yourself is an exploration process of those considerations. Being shown a finished solution doesn’t let you see and understand all the considerations and the possible options that you’d have contemplated when implementing it yourself. When reviewing, you still have to do that exploratory thinking to weigh the possible options. And the fact that you have to do that exploration purely mentally rather than in a process of working with code arguably makes it harder (similar to contemplating alternative solutions to a Sudoku purely mentally, actuallu).
There rarely is a single correct way of implementing some requirement or feature. It’s a trade-off between compromises, not binary correct or incorrect like a Sudoku puzzle. The insights that the exploration give you may even lead you to implement something significantly different from what you originally set out to.
Sudoku’s constraints are knownn and easy to build an harness for. Software has a more malleable structure. An harness is hard to build and the tests cases for the constraints can be a lot.
Imagine sudoku with hundreds of subtle, sometimes mutually exclusive rules, and no single valid solution.
This is not about LLMs, by the way. It’s about reviewing any code, including by a fellow human. It’s just that many people mistakenly feel like with LLMs they can lower their guard and accept even if they have not gone through the steps of themselves coming up with their solution and comparing it to the one suggested by the LLM.
The reason is that many correctly see proper review as duplicate work, and while it is justified with another human (because it is (A) instructive and (B) reducing bus factor) with LLMs most people simply can’t be bothered. If you personally can, you are a minority.
If I'm attentive during spec/plan creation I sort of build this "expectation" of what the actual PR will look like, the mental model of it. Then it's somewhat easier to review.
But the mental load is brutal tbh, and still not sure if it's "worth it"
Just because everything is a potential threat vector now: doesn't this also mean you could easily put AI specific malicious instructions into the PDF that the regular human would never notice?
Like the "white text between the lines that only appears when copy-pasted"-hack that some professors have been doing in their exercises to their students to include pink elephants in the output and stuff. But worse. Just thinking of a electricity bill pdf you provide as proof of address to some company that uses an LLM to extraxt that address and pre-process that doc. But instead we can command it to do something else that a regular human wouldn't even ever notice...
For quite some time the best approach to documents you didn't create is to rasterize and OCR. For at least 20 years, PDFs have been intentionally scrambled or have had extraneous text that appears in copy/paste but does not appear in the visible output.
Yes, although that's not new. The amount of different exploits and RCE I've seen in the past decade from just "opening" an PDF is mind blowing. Not sure if it's slowed down but around 8 years ago ghostcript would patch a couple of RCE from PDF processing every few months.
Oh this happens all the time. When Apple announced they would be scanning everyone’s private iCloud data for CSAM, they had some “PSI” system which would at some point consider the content of a grayscale and reduced quality version of the image.
The problem is that security researchers for years have known about pre-processing attacks where photos which appear as one thing (a dog in a yard) appear ad something completely different (a cat on a couch) once put through machine learning pre-processing.
> Just because everything is a potential threat vector now: doesn't this also mean you could easily put AI specific malicious instructions into the PDF that the regular human would never notice?
Yup and there's so many memes floating around regarding that being used to bypass AI "resume reviewers" that it got academically reviewed [1].
It's 2 different statements. The first is true, even if you don't like it. The "therefore" is something you completely made up to make your point and imply something I neither said nor suggested.
You might not like it either but an arm race isn't new. The tools changed but competition, and thus threats, remain.
I agree with you, but argue with the form of the person we both replied to. Alhough I would prefer universal peace and international morality, I maintain a generally neutral position on nuclear arms. I am also neutral on the evergreen innocent idiocies of youthfulness.
This is a form of argument known as reductio ad absurdum. I see it more and more frequently now, often in dismissal of a fairly throughtful point of view, usually with a mocking and disdainful tone, and therefore nuclear weapons are nothing to worry about.
I'm not sure if reductio ad absurdum was about my point or theirs but just to be explicit, I didn't say it wasn't a problem or a big deal, only that that threats and competitions are not new. I clearly didn't make a moral or ethical statement about nuclear weapons.
Yeah, same sentiment here. I agree that some more people need to be told to relax a little in our field, but on the other side, the product and project managers are constantly looking to ensure maximum utility, especially in startups and high pressure environments. And looking around with the large number of companies that laid off people in the last year or two, I see fewer devs having the choice to push back when that happens. It really reads like the author is in the comfortable position of a staff or principle engineer without a direct manager and gets to decide what their day and week looks like and pick what they work on. I am afraid fewer and fewer have that luxury...
I totally see where they are coming from, jitsu, too is making every change its own artifact. And it plays hard to the "faster shipping" that especially AI-driven teams are pushing hard, and find the review process to be the next big bottleneck now (as I just saw with my last client as well).
There has also been a lot of discussion about the value of the peer review process recently, in in general. But I wonder if this isn't all going into the wrong direction. Quite honestly, even with the previous review and discussion system of (squashed) commits how often did you really use git blame and opened up the original PR discussion of that changed line that caused the bug? And how often did it help you beyond learning it was done by that rockstar developer who has left long ago? And that the discussion on that PR was a point in time and the code around it has evolves beyond that and it would need looking at another 10 PR discussion to get the entire context.
What I am saying is that git (and before that Subversion and CVS) has a full history is so that it can resolve the latest state. Period. We made that commit ritual somewhat of a hallmark in putting more supposed meaning into recording ever more in that history. But we rarely stopped to check if that is all that useful. Recording even more, all the time, reminds me of these work group meetings that have minute records of every bike shedding meeting word said by everyone, that, honestly, no one ever looks at after the next meeting ever again. I don't think there is value in minute record keeping, it becomes too much noise that just makes it harder to parse. Now also adding all AI conversation and agent thinking to that tree? What's the value of that in like 3 months down the line? I don't see it.
It was a great hack, but it was always just that: a hack. We all always knew that the "certificate authority"-hierarchy is broken and can easily be abused by the ones in power. I appreciate everything that the let's encrypt peeps have done for the world, but the cert authority system really needs an overhaul.
I sign off the code I merged, part of company policy but also just to be sure it is actually decent. But reviewing has become the real draining bottleneck: even stacked PRs, if that total 5-6k lines is not a 5min job. Even if I brainstormed and set the plan, that's really the part that doesn't scale right now for me in this. But the author is very shy about that: either the changes arent that big in the end or they trust the process enough to review in a more casual manner. Being equally untrusting I can't do that ...
reply