6 Comments
User's avatar
Tian's avatar

Spot on. The 'babysitting' problem is indeed the biggest bottleneck to AI adoption in engineering workflows. Moving from AI as an autocomplete to a truly autonomous agent requires better context injection and robust feedback loops. For those looking for tools specifically designed to help agents understand complex architectural contexts with less supervision, https://open-code.ai is exploring some very interesting solutions in this space. Great insights!

BB's avatar

You've highlighted a lot of the conversations I've been having with myself lately. That is where I am right now: 2-3 agents working on independent git worktrees, with me checking in when they reach a stopping point. Then there is a future I know exists: 10+ agents working, some on semi-autonomous tasks that still require my input, and some on small, completely autonomous tasks. Much of my time lately has been spent mapping how to get there and also fixing the spectacular ways I've failed to get there.

However, just over the last 6-8 weeks, I've gone from thinking the 10+ agent future is impossible and dumb to not only possible but desirable. Even as overall capability growth in the underlying models is slowing down, it feels like the change to our workflow is speeding up. I'm not sure if it's finally grappling with how to use the models or if there was some small threshold they needed to hit. Opus's ability to choose the correct tools in the correct circumstances has been a particular boon.

Matthew Hawthorne's avatar

Hey - thanks for reading. The initial technical hurdles for me were reliable execution and generating useful logs to resolve, retry, or resume conversations when they failed. There's still some work to do there, but my workflow has mostly shifted to just... merging branches.

That was the key insight for me -- that, especially for simpler projects, everything could be automated, so we get to choose where to spend our time. Maybe it's manual reviews of larger diffs, maybe it's reviewing summaries of automated reviews, but either way, it should be a deliberate choice.

The AI Architect's avatar

Really impressive work on GZA. The asynchronous execution angle is something I've been thinkng about a lot lately since waiting around for agents to finish kills any productivity gains. When I tried running mutliple coding tasks in parallel last month the merge conflicts were brutal but your approach of treating each as its own branch makes total sense.

Matthew Hawthorne's avatar

Thanks for reading. Wanting to use branches for everything was one of the main reasons I built my own tool. Note that there can still be a lot of conflicts as you turn up the dial, but they're fairly easy to resolve. I briefly experimented with using a script to manually resolve conflicts, which I've now shifted to a Claude skill... I'll write more about this soon.

Deepak Shukla's avatar

The real risk isn’t breaking more things, it’s never knowing when to stop 🎯