Comment by NitpickLawyer

21 hours ago

I have a similar thing built around a year ago w/ autogen. The difference now is models can really be steered towards "part" of the overall goal, and they actually follow that.

Before this, even the best "math" models were RLd to death to only solve problems. If you wanted it to explore "method_a" of solving a problem you'd be SoL. The model would start like "ok, the user wants me to explore method_a, so here's the solution: blablabla doing whatever it wanted, unrelated to method_a.

Similar things for gathering multiple sources. Only recently can models actually pick the best thing out of many instances, and work effectively at large context lengths. The previous tries with 1M context lengths were at best gimmicks, IMO. Gemini 2.5 seems the first model that can actually do useful stuff after 100-200k tokens.

0 comments

NitpickLawyer

No comments yet

Contribute on Hacker News ↗