Comment by ph4rsikal
4 days ago
It might appear so, but then you could validate it with a simple test. If the LLM would play a 4x4 Tic Tac Toe game, would the agent select the winning move 100% of all time or block a losing move 100% of the time? If these systems were capable of proper reasoning, then they would find the right choice in these obvious but constantly changing scenarios without being specifically trained for it.
[1] https://jdsemrau.substack.com/p/nemotron-vs-qwen-game-theory...
No comments yet
Contribute on Hacker News ↗