← Back to context

Comment by rybosworld

14 hours ago

Based on the comments, I think a lot of people are missing what the AI Agent actually got wrong here. Nowhere did the agent claim that 45 + 8 = 63.

You can see the Agent's step by step thought process here (also linked in the article):

https://ibm-cuga.19pc1vtv090u.us-east.codeengine.appdomain.c...

The Agent correctly entered the starting point (MIT) and the ending point (Harvard) and the mode of transport (on foot). OpenStreetMap returns this as taking 45 minutes long.

Then the agent reversed the directions, and changed the mode of transport to car. What it should have also done, is change the destination to Logan Airport. This is the part that the agent missed. OpenStreetMap then returns that the drive from Harvard to MIT takes 8 minutes.

The agent then returned the answer as being 45 minutes walking and 8 minutes driving. The first number is correct. The second is wrong because the agent chose the wrong destination, not because it did math incorrectly.

Seems like lots of readers are chomping at the bit to prove how stupid the models are rather than focus on the real problem the author is highlighting.

The model's scoring was done by another model though no? That was the source of the answer being mislabed as correct. So a different model thought that 45+8=63.