← Back to context

Comment by BeetleB

10 days ago

Anything people ask a human to do instead of a computer.

Humans are not the most reliable. If you're ok giving the task to a human then you're ok with a lower level of relisbility than a traditional computer program gives.

Simple example: Notify me when a web page meaningfully changes and specify what the change is in big picture terms.

We have programs to do the first part: Detecting visual changes. But filtering out only meaningful changes and providing a verbal description? Takes a ton of expertise.

With MCP I expect that by the end of this year a nonprogrammer will be able to have an LLM do it using just plugins in a SW.

Not anything - it wouldn't be a great idea to give an LLM the ability to spend money, but we let humans do it all the time.

  • With suitable safeguards or limits on what it can spend why not? On the one hand it might not fear repercussions as a human would, on the other hand it’s far less likely to embezzle funds to support its overly lavish lifestyle or gambling addiction.

    • Yeah, you could marry an AI and share a bank account with it, and now it could buy you useful stuff it thinks you need without you doing anything, or even buy you presents.

  • I don't know about you, but even as a senior engineer, my employer hasn't given me the ability to spend money :-) It's not something employers normally do.

    And as was pointed out, if you use something like MCP, you can control what it spends on. You can limit the amount, and limit to a whitelist. It may still occasionally buy the wrong thing, but the wrong thing will be something you preapproved.

  • We don’t let LLM’s spend money yet but many businesses make bank letting computers automatically buy and sell things.

    • The software they "let" do that is at the opposite end of the scale in terms of how well it is understood, specified and tested. Or they "lose bank".

To elaborate — the task definition itself is vague enough that any evaluation will necessarily be vibes based. There is fundamentally no precise definition of correctness/reliability.