Comment by logicchains

2 days ago

> and ultimately meld together to be able to achieve tasks on time horizons of multiple hours

It's already possible to achieve tasks on a time horizon of multiple days if you put the LLM into a sufficiently structured workflow (where you have a separate program that smartly manages its context). E.g. a standards-compliant HTTP 2.0 server where the code is 100% written by Gemini Pro (over 40k lines of code total, including unit tests, in around 120 hours of API time): https://open.substack.com/pub/outervationai/p/building-a-100...

1 comment

logicchains

blixt 2 days ago

This is very interesting, and nice learnings in there too, thank you for sharing! It seems the author monitored the LLM, stopped it from going off-track a few times, fixed some unit test code manually, etc. Plus this is strictly re-implementing a very well-specced library that already exists in the same programming language. So I think it's still a bit hard to say we can let an LLM work for multiple days, if we imply that this work should be domain-specific to a particular company. But it's very promising to see this was possible with very little interaction!