Comment by ph4rsikal

4 months ago

It might appear so, but then you could validate it with a simple test. If the LLM would play a 4x4 Tic Tac Toe game, would the agent select the winning move 100% of all time or block a losing move 100% of the time? If these systems were capable of proper reasoning, then they would find the right choice in these obvious but constantly changing scenarios without being specifically trained for it.