Speculating here, but perhaps your coworker was too ambitious? In my opinion, you should start with AI-generated PRs that do small, linting refactors and then work up from there. In particular, if this is done in parts, one of the strategies you can employ is to:
- add tests
- break files up into smaller parts
- test the smaller parts
- then actually improve behavior
(Which is no different than what you would do as a human)
It's not worth bothering with unless the task is very difficult, long-context, long-running, or all of the above. But, when it's worth using, it genuinely increases success rates and appears to amplify model intelligence.
Pleading has worked for me. “My job depends on this, please help me” and ChatGPT would do a task it previously claimed it wasn’t able to (extract text from an image, it claimed it couldn’t make it out at first)
Asking LLMs to do things in different ways does sometimes get them to answer correctly when they didn't with a previous prompt that is effectively equivalent but people really go nuts anthropomorphizing this behavior.
ChatGPT has no empathy for you keeping your job, you just lucked into a more helpful predictive text chain based on some combination of the input and the random temperature.
Asking it to just 'try again, dummy' could have worked equally well (or not, its all just probabilities after all).
I did too, but then added something very similar to a prompt ("must be accurate") for an ai-backed feature out of frustration, and sure enough it fixed the issue. Lord have mercy
One of the best things you can do is start by having it do unit test coverage for existing behavior. A refactor with no tests breaks things pretty much no matter who does it, because they don't know what the right behavior is.
While I could generally agree, in this specific instance if the AI were “thinking” correctly it should have found the mistake. I admit it was a difficult problem though (solving it required creativity).
To be more precise, the prompt actually pointed to where there could be issues, and the issue, which was exactly of the kind that was pointed at, was not found.
There are a lot of factors in "should have found" which my recommendation improves.
If you told it to write unit test coverage, you would have covered more of the codebase. That reduces the size of context necessary for the next mistake finding investigation - it'll see it's already covered a lot of the paths. Then you say "Go look for issues" (or whatever you asked it to do) and it'll be able to think more deeply about what's left over.
What specific model were you using, at what effort? How big was your context window?
I discovered Homebrew now sometimes asks whether I actually want to install a formula (e.g. `brew install ffmpeg` asks whether I want to install it because it has dependencies). Is there a way to disable this behavior and revert to the previous one?
Oh, didn't read that part of the news. That's great. Ability to run x64 docker images seminatively was one of the big reasons I jumped to the M1 platform when it came out and I was baffled that they would remove it.
EU does not want privacy. They actually want to get rid of privacy every so often (adding backdoors in encrypted conversations). So far it has not worked out, but I’m afraid they will succeed at one point.
To follow along that line of thoughts, the requirements they are actually asking for proper DMA compliance would probably go right in that direction tbh.
I, for one, am happy Apple is taking a stance, and, as an European would really much like my government to stop asking ridiculous things that do not profit the consumer.
Believe it or not, the eu is not one single entity with one undivided goal. As is perfectly well demonstrated by chatcontrol being proposed by one side and continually struck down by the other.
Probably not, but it's still available. The DMA most likely would require the ability for users to be able to benefit from to the AI regardless of which email/photo/messaging provider they prefer to use.
Exactly. Even though Siri is completely lost today, my friend asks it a number of random things, all she wants is an answer. Currently it redirects to the web, it’s enough for her. I told her “next year it’ll work!” And boom. We’re in the EU. Sad.
reply