Comment by jstummbillig
3 hours ago
I think that is always something that is being worked on in parallel. Recent paradigm seems to be the models understanding when they need to use more tokens dynamically (which seems to be very much in line with how computation should generally work).
No comments yet
Contribute on Hacker News ↗