Comment by fennecbutt

5 hours ago

I mean it's interesting because of the way they work.

If people can be tricked by an AI generated voice over the phone, or misinformation generated by human or by AI, then we're already holding AI to a higher standard.

I would say in the same way that I look at my boss who I work for and can identify them that way, then of course I'll be like "yup I can do that for you".

Models aren't trained to be suspicious, that's what guardrails are for. Our brains are comprised of so many specialised areas and I'm fine with the same concept for AI.

I would country passing a token/authentication of some kind as a part of guardrails. Without guardrails an AI model is like a human brain missing a lot of the areas around suspicion, identification, rules etc. Only the "eager to please" centers remaining.

I feel like the easiest way to achieve this is in-harness, start with a core prompt and minimal tools, extensions to prompt, relaxed guardrails and additional tools should be controlled by the harness itself, when a token is passed, or a camera indicates an identified face match, etc.

0 comments

fennecbutt

No comments yet

Contribute on Hacker News ↗