More

a-dub · 2026-06-13T18:06:15 1781373975

i read all the pkgbuild diffs, still doesn't give me a good sense. sure, i can verify that it's coming from the official repo but even then there's no guarantee that there isn't junk in there or that the git ref is actually pointing at the right thing.

it would be better if there were stronger community moderation and review that has stamps i can trust rather than this idea that eyeballing build scripts is a reasonable security posture.

embedding-shape · 2026-06-13T18:10:02 1781374202

> it would be better if there were stronger community moderation and review that has stamps i can trust rather than this idea that eyeballing build scripts is a reasonable security posture.

Ok, so instead of having a reasonable security posture yourself, you'd rather rely on a number of random strangers who've eyeballed the PKGBUILD instead?

Generally, I think Arch tries to prevent users from relying on bad signals, and this principle might be applied here too.

> i read all the pkgbuild diffs, still doesn't give me a good sense. sure,

Do you have an example of a diff that doesn't give a good sense? I review all my diffs too, but I feel like all of them give me a good sense if it's safe to install or not. I mean, why would I otherwise, what's the point in reviewing if you don't use it to make a decision if to install it or not?

a-dub · 2026-06-15T01:47:47 1781488067

pretty much all of them. the diffs only really show that it's coming from the same source, the changed hash and maybe some urls for some patches. actually looking at what is in that changed hash is a much more complicated story. this gives end users a false sense of security ("i read the diffs" -- not really), and attackers a clean vector (all it takes is one bad commit that might not even be on a real branch, or linked patch or late download dependency in the package itself).

embedding-shape · 2026-06-15T10:17:34 1781518654

> the diffs only really show that it's coming from the same source

What else do you have to review? Both in the cases of binaries and source, the idea is that you trust upstream already, otherwise you shouldn't install software from them. And since you trust upstream, the only thing you need to review in the PKGBUILD is quite literally: Where is this stuff coming from, is it the official domain/repository? Are there other non-official dependencies? Are there patches applied?

Once you've reviewed those, you're done, and as safe as if you installed straight from upstream, zero false sense of security here.

You're mixing concerns here, as what you describe is completely different issue.

a-dub · 2026-06-15T12:27:30 1781526450

blindly trusting upstream is not really a reasonable posture. that is pretty much the source of all software supply chain attacks.

there is work involved in figuring out how to get the complete diff of the code and dependencies that are included in the change, plus review time. this could range anywhere from 5-10m to 1h per package updated- if not more.

zyuiop · 2026-06-13T18:10:20 1781374220

Well ArchLinux has a product for you if you want packages that were vetted: the official repositories. AUR is just a centralized place to put user created packages, like npm is a place to put user created node packages.

a-dub · 2026-06-12T19:50:53 1781293853

two scenarios i could think of where there's additional risk for bio/nuclear weapons 1) basement lab leaks and 2) improving quality of execution for shops that are already resourced enough to hire experts but maybe they're not that great.

i think the correct answer is probably to funnel more money to global (bio)security initiatives and maybe use ai leverage as a way to get more of the world on board. (some kind of access to nvidia or cloud ai or whatever in exchange for policy commitments deal- while that leverage lasts).

dannyw · 2026-06-12T20:21:08 1781295668

I just find doubtful that a LLM is going to help, instead of hurt, any state actor that is capable of starting a nuclear weapons problem.

a-dub · 2026-06-12T02:51:32 1781232692

  i built a turbofan
  https://app.confbuild.com/p/z459
  
  now I want to build a complete Airbus as detailed as possible with give budget

a-dub · 2026-06-09T18:30:12 1781029812

the claimed inference cost is 2x. if that is true, it is massive and remarkable that they're able to do anything like this at all.

a-dub · 2026-06-06T17:48:10 1780768090

i thought this was all fixed with special modes of clone that are optimized and don't actually copy anything (ie, it creates a new deficient process that can pretty much only exec)?

zbentley · 2026-06-07T18:56:41 1780858601

Kind of. Those exist, but because Linux’s formal ABI is syscalls and not libraries that combine them in known-safe ways, the clone speedups that make fork faster are a confusing and fragile API for low-level programmers to use.

That, and even those clone-without-pagetable-copy improvements leave a lot of slowness on the table. Being able to skip even disable-able functionality intended for fork would simplify code. Also, for programs that launch the same subprocess many times, a better API might allow caching away some of the pre-entrypoint initialization of exec.

a-dub · 2026-06-05T16:16:56 1780676216

yeah this is the actual key. an actually useful title and a stable link to the discussion around the change.

conventional commits are pleasing, but questionable actual utility. the code speaks for itself. the actually useful information is a well chosen title and the context for the change.

a-dub · 2026-06-05T15:33:38 1780673618

here's an idea for an "experienced" technical interview structure. "we care about x, y and z. you have forty five minutes to convince us that you will meaningful help us achieve our goals. we then will take 30 minutes to push on technical details as we see fit. you will be judged based on technical content, choices, taste and your overall approach and strategy for moving us forward and convincing us that you're the right person to do it. good luck!"

BugsJustFindMe · 2026-06-05T15:39:11 1780673951

> we care about x, y and z. you have forty five minutes to convince us that you will meaningful help us achieve our goals. we then will take 30 minutes to push on technical details

This part is good.

> as we see fit. you will be judged based on technical content, choices, taste and your overall approach and strategy for moving us forward and convincing us that you're the right person to do it. good luck!

This part is confrontational and will lead to worse outcomes. "As we see fit" signals capriciousness. "You will be judged" signals hostility. "you're the right person" engineers conflict; there is no "the right" person, only "a good" person. And telling someone "good luck!" in this context is like telling them to try not to die while standing on a narrow plank over a pit of sharp spikes. No matter how you think you mean it, it will come across as callous to many.

FireBeyond · 2026-06-05T17:18:13 1780679893

I do agree with your assessment, on both parts, but I think it could survive with minor tweaking.

"We are going to push on the technical details, both those we agree with and disagree with. This is where the rubber meets the road. You've painted a picture, how does it stand up to scrutiny and challenge. In the end, while there may not be perfect alignment, we're looking to see how your ideas go through the process of validation and how we get there, communication, exploration, and collaboration-wise."

BugsJustFindMe · 2026-06-05T20:26:26 1780691186

I'm not sure what exactly that's meant to do other than using more words to say the same thing only this time with hackneyed cliches. IME it's best to avoid empty metaphor that does little more than occupy space.

Like...

> This is where the rubber meets the road. You've painted a picture, ...

Was this written by AI? It's so bad, lol.

FireBeyond · 2026-06-06T01:05:34 1780707934

Hah, it is, in hindsight, and I agree (and bleh, last thing I want to sound like is AI!). But I think my point is more - if it had been framed as something like "defending your thesis/dissertation", seeing if your choices stand up to scrutiny, versus the somewhat snooty down-the-nose feeling of "We are judging you. Speak now." then it is better - and real: your architecture and code decisions should stand up to review/critique, after all.

BugsJustFindMe · 2026-06-06T07:30:58 1780731058

Framing something as "defending" against "critique" creates an interaction that is immediately adversarial, which is not so far distant from judging down-the-nose, IMO. They are words of combat and criticism, and I think we get better outcomes when we find ways to ask a candidate to "discuss strengths and compromises" without that. I don't want any of my candidates to feel like they are on defense against me in a scenario where they know that there is a catastrophic power imbalance and I am able to use literally any reason I want to kick them to the street. People don't think well in that kind of situation.

> last thing I want to sound like is AI!

My suggestion of it signaling AI was somewhat tongue-in-cheek. Humans have been writing inexpertly and injecting cliched fluff since the invention of writing. Communicating with precision is one of the most difficult things that any of us ever do.

paulluuk · 2026-06-05T15:36:08 1780673768

I wonder how many great engineers with stage fright you would lose this way, though.

a-dub · 2026-06-03T12:58:11 1780491491

if anything, compilers and interpreters are probably the most suited for ai codegen as they have some of the most comprehensive test suites around and are relatively easy to verify.

that said, it's a fun debate. sure, the rust compiler eliminated a whole class of memory bugs- but did the convoluted borrow checker gymnastics plus agent whoopsiedaisies leave behind a plethora of hidden and exploitable logic bugs?

it'll be interesting to see, although we may never know as anthropic will probably use the latest frontier models to audit and quietly patch over the coming years.

a-dub · 2026-06-03T00:21:46 1780446106

cool to see the eu pushing on the zkp approach. that was my first instinct when i read about this. add a zkp header to http with optional upgrading to "adult" content and it's done.

a-dub · 2026-05-26T17:28:46 1779816526

i'm curious: how does the steady state error rate of a stochastic automated system like this compare with the downtime and errors that come from a (brittle) deterministic bridge that can fail with upgrades? what does the observability look like? (i'm guessing one feature is that the execution log including images/screenshots for each transaction gets saved, which is probably a huge improvement.)

fchishtie · 2026-05-26T17:49:52 1779817792

it’s a good q - we experimented a lot with computer use / agentic automation and found that at scale a hybrid solution where the automations run as deterministic code with agents for recovery is the best - running automations as code is faster & cheaper & when you’re doing critical tasks (like updating patient records) you don’t want an agent to potentially mess something up.

previously writing RPA code used to take a long time - using AI (and its infinite patience) we can write more durable code that covers more edge cases

And since they’re code based it’s pretty straightforward to an agents monitor them and update their code when upgrades to the underlying system happen etc…

for observability - we have workflow execution logs that store text, videos and screenshots so an agent or a human can debug them - lots and lots of webhooks when things break ! (:

_crowecawcaw · 2026-05-27T04:54:50 1779857690

I also experimented with vision/screenshot based computer use tools for similar use cases but had inconsistent results. LLMs had trouble getting precise pixel coordinates from a screenshot to move a mouse. And the screenshots took extra tokens. I had a lot more success using accessibility APIs to replace screenshots + input simulation since accessibility data is easier for LLMs to process. The accessibility functionality is now released as a separate library for building automation tooling: https://xa11y.dev/

fchishtie · 2026-05-27T14:46:38 1779893198

cool! thank you for sharing - will check it out

a-dub · 2026-06-01T14:41:28 1780324888

that sounds like the way. also keeps runtime costs down. seems like the trick there would be to build (a|on top of) durable rpa librar(y|ies) that the agents and humans work in so that the automation recipes and their automated updates can be quickly skimmed and sanity checked when needed. add in some live automated testing (assuming you can make this happen with the legacy systems) and maybe you could get really close to fully automating all of it.