← Back to context

Comment by noirbot

1 month ago

Also, the UX of your potential "remote workers" are vitally important! The difference between a good and a bad remote worker is almost always how good they are at communicating - both reading and understanding tickets of work to be done and how well they explain, annotate, and document the work they do.

At the end of the day, someone has to be checking the work. This is true of humans and of any potential AI agent, and the UX of that is a big deal. I can get on a call and talk through the code another engineer on my team wrote and make sure I understand it and that it's doing the right thing before we accept it. I'm sure at some point I could do that with an LLM, but the worry is that the LLM has no innate loyalty or sense of its own accuracy or honesty.

I can mostly trust that my human coworker isn't bullshitting me and any mistakes are honest mistakes that we'll learn from together for the future. That we're both in the same boat where if we write or approve malicious or flagrantly defective code, our job is on the line. An AI agent that's written bad or vulnerable code won't know it, will completely seriously assert that it did exactly what it was told, doesn't care if it gets fired, and may say completely untrue things in an attempt to justify itself.

Any AI "remote worker" is a totally different trust and interaction model. There's no real way to treat it like you would another human engineer because it has, essentially, no incentive structure at all. It doesn't care if the code works. It doesn't care if the team meets its goals. It doesn't care if I get fired. I'm not working with a peer, I'm working with an industrial machine that maybe makes my job easier.

It's hilarious that people don't see this. The UX of an "llm product" is the quality of the text in text out. An "aligned model" is one with good UX. Instruct tuning is UX. RLHF is UX.