← Back to context

Comment by gonzobonzo

7 days ago

I agree with you but I'll go a step further - these benchmarks are a good example of how far we are from AGI.

A good base test would be to give a manager a mixed team of remote workers, half being human and half being AI, and seeing if the manager or any of the coworkers would be able to tell the difference. We wouldn't be able to say that AI that passed that test would necessarily be AGI, since we would have to test it in other situations. But we could say that AI that couldn't pass that test wouldn't qualify, since it wouldn't be able to successfully accomplish some tasks that humans are able to.

But of course, current AI is nowhere near that level yet. We're left with benchmarks, because we all know how far away we are from actual AGI.

The AGI test I think makes sense is to put it in a robot body and let it navigate the world. Can I take the robot to my back yard and have it weed my vegetable garden? Can I show it how to fold my laundry? Can I take it to the grocery store and tell it "go pick up 4 yellow bananas and two avocados that will be ready to eat in the next day or two, and then meet me in dairy"? Can I ask it to dice an onion for me during meal prep?

These are all things my kids would do when they were pretty young.

  • I agree, I think of that as the next level beyond the digital assistant test - a physical assistant test. Once there are sufficiently capable robots, hook one up to the AI. Tell it to mow your lawn, drive your car to the mechanic and have the mechanic to get checked, box up an item, take it to the post office, and have it shiped, pick up your dry cleaning, buy ingredients from a grocery store, cook dinner, etc. Basic tasks an low-skilled worker would do as someone's assistant.

  • I think the next harder level in AGI testing would be “convince my kids to weed the garden and fold the laundry” :-)

The problem with "spot the difference" tests, imho, is that I would expect an AGI to be easily spotted. There's going to be a speed of calculation difference, at the very least. If nothing else, typing speed would be completely different unless the AGI is supposed to be deceptive. Who knows what it's personality would be like. I'd say it's a simple enough test just to see if an AGI could be hired as, for example, an entry level software developer and keep it's job based on the same criteria base-level humans have to meet.

I agree that current AI is nowhere near that level yet. If AI isn't even trying to extract meaning from the words it smiths or the pictures it diffuses then it's nothing more than a cute (albeit useful) parlor trick.

  • Those could probably be mitigated pretty easily in testing situations. For example, making sure all participants had a delay in chat conversations, or running correspondence through an LLM to equalize the personality.

    However, I'm not sure an AGI test should be mitigating them. If an AI isn't able to communicate at human speeds, or isn't able to achieve the social understandings that a human does, it would probably be wrong to say that it has the same intelligence capabilities as a human (how AGI has traditionally been defined). It wouldn't be able to provide human level performance in many jobs.

Why even bother with the people in the mix? Just tell the AGI: make as much money as you can in 6 months. Preferably without breaking any laws.