I read it as a models performance being random and observed differences in the opinions are the results of the overinterpretation of the random outcomes.
I think however that some people seem to be always lucky which indicates that it is not random but rather some fixed differences between people and their environments.
A models adherence to some configuration is a matter of probability. There might be some underlying pattern, but as far as I understand this is not documented and it may be even impossible to do so. So people are just trying stuff and sharing what appear to work. There is no causal link anywhere in this recommendations, and is just based on spurious correlations.
I think that's issue, rather than 60K being small.
Most of the actual edits/changes I request to codex are solved within 100-150K tokens, beyond 200K I'd definitively try to restart the session as soon as I could as all models are horrible once you get across ~20% of the total context size. And this is while working on +million LOC codebases.
Problem I guess is that there is no solid and concrete evidence of this (to me [and others seemingly] obvious) degradation, but should be easy to prove, yet no one has time to sit down and show it :)
But the likelihood of a model getting minor details wrong once you're above some magical threshold between 15-20%, seems to skyrocket, and I hit that issue sufficient amount of times that now my workflow is trying to prevent that.
what are y'all doing to hit that? Do you just not give it any pointers and let it churn away? What kind of context are you handing off?
I routinely get claude to do things pretty decently and finish up easily in the 4-5 digit range of tokens. It seems to be doing the right kind of thing to not waste its time looking at 1000 files.
This is me. I found AI to be an incredible provider of structure, focus and productivity, its an externalized executive function provider. No longer do I forget what last week's meetings were about, no longer am I paralyzed by seemingly I surmountable tasks it all just flows, and I get to rubber duck against an endlessly patient system. I love it, and I'm somewhat bewildered by some of the takes in this thread. Different strokes for different folks, I suppose.
Berlin "boutique" tech consultancy, we are seeing a noticable increase in Israeli and US engineers into our hiring pipeline. The braindrain from the autocratic countries is real.
I can (and do!) pay the power company a pittance to run a surprisingly strong local model on a little box next to my keyboard, that does a fine job. Maybe not as fine as the billionaires thinking machine, but good enough and often better. Given that fact, I consider reliance on LLMs as much of an "issue" as reliance on a computer.
I would have killed for access to an LLM during school. Not to do my homework (though that too, homework is an antipattern imo) but to fill my gaps at my own pace and level of patience. Just endlessly pestering the AI "ok, but why?" until I grokked it.
You handle large code base by enforcing best practices that should have always been enforced. Proper up to date documentation, strict adherence to conventions and coding guidelines, cross review of deliverables, TDD, and so on. Just whispering "make me a dashboard" into the machine's ear is not how you drive agents to create maintainable and understandable code.
So you are saying Ai is a smart/fast auto complete and the actual intelligence is driven by humans?
Reality aside it took me a lot of time to just having the version i wanted. But i find increasing frustrating is that solution chose by the ai is not optional, not production grade. Which require even more use if tokens and more time waste. Its good for management but for us difficult to maintain.
The scientific study of multitasking over the past few decades has revealed important principles about the operations, and processing limitations, of our minds and brains. One critical finding to emerge is that we inflate our perceived ability to multitask: there is little correlation with our actual ability. In fact, multitasking is almost always a misnomer, as the human mind and brain lack the architecture to perform two or more tasks simultaneously. By architecture, we mean the cognitive and neural building blocks and systems that give rise to mental functioning. We have a hard time multitasking because of the ways that our building blocks of attention and executive control inherently work. To this end, when we attempt to multitask, we are usually switching between one task and another. The human brain has evolved to single task.
Fair enough, so it's a misnomer. Let's call it task switching then, since we don't actually do tasks at the same time, but switch from one to the other. A Claude Code session helpfully prints a small tldr summary of the ongoing session, so that one can quickly onboard again to the task at hand. I do not find that draining, personally.
And how do you get this to work exactly? I keep getting variations of "Missing required parameter: redirect_uri" in the OAuth flow.
The solutions proposed by Gemini and Google's AI summaries all hallucinate agy subcommands that don't exist, hilariously.
Edit: after bouncing around several GitHub threads, I realized that the agy TUI framework is wrapping the URL in a way that causes spaces to be inserted where the URL wraps. That's hilarious.
reply