Comment by wouldbecouldbe
1 day ago
The issue is, it almost always outperforms knowledge workers.
IF the right questions are asked, and IF steered into and corrected at a few crucial points. IF not it goes off in the wrong direction really quick and that's a problem that's still mostly unsolved in the last 2 years.
And that can be catastrophic in high risk environments, like legal, medical or high risk software products where being wrong in the wrong place can mean bankruptcy or even cost a life.
I help run a few marketing websites where I let the CEO's run crazy with Claude cowork, they are making PR's like a madman, but they are not allowed to touch any of the API's & platforms where there is real user data & sensitive information.
Ya, while the tools are really solid and have seen huge leaps these past two years, in no way will an LLM be able to do any of it unguided in two years. Just a humble opinion that I would love to see be wrong.
"in no way will an LLM be able to do any of it unguided in two years"
IDK "not any of it" seems a bit strong, especially thinking towards 2028. For a lot of knowledge professions, there is a surprising amount of tasks that are just dumb work compared to the rest.
There's a huge difference between one shot and few shot versus building a robust harness with deterministic and adversarial quality gates. And I'm finding that agents can actually do a pretty good job of a surprising number of things if you are very clear about your dimensions of quality and the rubrics that you get agents to research and then use to validate against those dimensions of quality.
Make sure to use a deterministic pipeline or harness to go step by step so agents aren't checking their own work and I sometimes get alpha from having a codex check the work of a clod but I am seeing pretty good output across multiple domains when I have three independent quality gates and a loop which only spits it out to a human if it doesn't converge at a reasonable cost.
> Just a humble opinion that I would love to see be wrong
Out of curiosity, why would you love to be wrong about that? What possible outcome could you see being a net positive for society if the vast majority of knowledge workers (and ultimately, as robotics progress, most workers in general) are replaced by AI?
I believe it was Blink-182 who said, "Work sucks". You have to pay people to do that stuff; they don't want to be there. And then you get into second order effects- costs plummet for anything labor intensive, including medical care, prepared food, cleaning, and private tutors. Then onto tertiary effects- if you can spin up a million genius researchers to attack a problem, you start seeing massive progress in every important area and it isn't tied to population growth.
I get that you might have a 'UBI/alternative general welfare is impossible' up your sleeve, but you've written this like it's somehow unfathomable that not forcing everybody to work just to survive would be a good thing. Of course it would be good! It's just a matter of dealing with the (huge) side effect of lost income.
5 replies →
In a way, we are betraying something here. My reading is: solving the social problems of capitalism feels so impossible, that reducing the need for anyone to do work is a liability. In a way this sentiment should make extremists of us all?
1 reply →
Yeah it can do things unguided if the tests to confirm its correctness are very solid. Thats where a lot of progress has been made and where agents are good, but this is domain specific, and a chance where startups can shine.
> And that can be catastrophic in high risk environments, like legal, medical or high risk software products where being wrong in the wrong place can mean bankruptcy or even cost a life.
Which also happens with humans – does it do so at a lower rate? On its own, it kind of sounds like similar anti-self-driving-car arguments.
yeah thats why I mentioned it works well IF guided by the correct expert.
I agree that you can create a set of domain specific rules, reinforcement layer validation tools, like self driving, that vastly improves the accuracy of au & llm's. Making humans less and less needed. But where LLM's comes from the magic of generic knowledge, this will be the opposite, narrowing it down.
I kinda disagree. High risk environments just means that they will have to have a human-in-the-loop for a longer time which drastically reduce the skill required for such human (which is still requires high skill just not stupidly high).
The employers will think it requires less skill, whereas in fact it might actually require more skill to do a good job of being the human-in-the-loop.
For example, my sister is a translator and she says that checking AI translations is actually harder in many ways than doing a translation in the first place, but the agencies pay less for checking than actual translation.
I used to do audio transcription and some video captioning. Found it a bit drudgerous and fatiguing in rather specific ways, but I was effective at it and could find some satisfaction in it. It's been some years now, so I haven't had a chance to try out the kind of thing they're doing now, but I'm pretty sure I wouldn't want to. I can raise my blood pressure just sitting here and thinking about what it would be like to have to go through a Word doc and correct the bot's errors. But, even putting aside my professional pride (or indignation), I can only imagine that it would make all kinds of mistakes I never would, and wouldn't be any help with the parts I'd have trouble with. And I'm pretty sure that, at least often enough for it to be an issue, the priming of reading what the bot thought something was could easily make it way harder to hear it correctly, if I notice there's something wrong in the first place. I assume there's a similar problem for your sister along the lines of throwing off how it would occur to her to express something in the target language.
Doesn't it increase the skill required? You need to be able to jump in at the perfect time, while waiting patiently for 99% of the time. It's like self-driving that requires you to "jump in" at the worst possible time (0.5 seconds from a crash), and stay put the rest of the time--but don't get bored or inattentive. The only way to do that would be to be so naturally good at the danger point that you can do it basically reflexively.
I think the opposite, only the most skilled will be required.
But it depends on the skill:
- For landing pages & simple saas solutions: marketeers & founders have more skill, since they understand the user best. The real skill is not the basic coding, but understanding the market.
- For security risks/architecture: senior devs can spot things in seconds
Im not a doctor or lawyer, but im sure there are cases where AI is really good in a similar way and cases where they miss the most crucial aspects.
> drastically reduce the skill required for such human
I mean thats what is wanted by some companies.
The problem, especially for things like legal is that it requires someone more skilled to read through and understand that the argument is bollocks, or the law/precedent they are banking on is in fact the right one.
We have a tool that auto-writes letters to our management companies when they break SLAs. We have a slider that goes from polite to we are going to extract your first born.
Thats simple ish to do for LLMs, and low risk.
Drafting contracts is also something we could probably do, as its mostly boilerplate. However the consequence for mis-drafting a contract is multi-million dollars.
Man, this comment made me think of a Kafkaesque future where two AI lawyers and an AI Judge are stuck in an infinite loop arguing over a case, meanwhile the defendant is running around trying to get anyone in the legal system to recognize that the AI is stuck.
If the human involved has no skill then they might as well not be there, since they're just a fall guy when things go wrong and won't do anything to prevent it from happening.
I said that still requires skill, just not as much.
The end game of this is just a human capable of taking the blame when AI makes an occasional mistake and being paid for that service and risk.
Yeah but even what you describe makes it an extremely useful tool and productivity boost. Sure, we're not going to deploy a lawyer agent with full autonomy and no more oversight than a real lawyer. But isn't it wild that's now the frontier?
It's not like self driving cars where better than a human 80% of the time isn't good enough and they aren't really usable until its 95%, 99% etc.