Comment by gguncth
2 days ago
Sure, but in two years AI has gone from “impressive tool, but not a replacement for knowledge workers” to “the study where it beats our highest caliber of knowledge workers may have some methodological deficits.” In another two years it’s going to be curtains.
The issue is, it almost always outperforms knowledge workers.
IF the right questions are asked, and IF steered into and corrected at a few crucial points. IF not it goes off in the wrong direction really quick and that's a problem that's still mostly unsolved in the last 2 years.
And that can be catastrophic in high risk environments, like legal, medical or high risk software products where being wrong in the wrong place can mean bankruptcy or even cost a life.
I help run a few marketing websites where I let the CEO's run crazy with Claude cowork, they are making PR's like a madman, but they are not allowed to touch any of the API's & platforms where there is real user data & sensitive information.
Ya, while the tools are really solid and have seen huge leaps these past two years, in no way will an LLM be able to do any of it unguided in two years. Just a humble opinion that I would love to see be wrong.
"in no way will an LLM be able to do any of it unguided in two years"
IDK "not any of it" seems a bit strong, especially thinking towards 2028. For a lot of knowledge professions, there is a surprising amount of tasks that are just dumb work compared to the rest.
There's a huge difference between one shot and few shot versus building a robust harness with deterministic and adversarial quality gates. And I'm finding that agents can actually do a pretty good job of a surprising number of things if you are very clear about your dimensions of quality and the rubrics that you get agents to research and then use to validate against those dimensions of quality.
Make sure to use a deterministic pipeline or harness to go step by step so agents aren't checking their own work and I sometimes get alpha from having a codex check the work of a clod but I am seeing pretty good output across multiple domains when I have three independent quality gates and a loop which only spits it out to a human if it doesn't converge at a reasonable cost.
> Just a humble opinion that I would love to see be wrong
Out of curiosity, why would you love to be wrong about that? What possible outcome could you see being a net positive for society if the vast majority of knowledge workers (and ultimately, as robotics progress, most workers in general) are replaced by AI?
9 replies →
Yeah it can do things unguided if the tests to confirm its correctness are very solid. Thats where a lot of progress has been made and where agents are good, but this is domain specific, and a chance where startups can shine.
> And that can be catastrophic in high risk environments, like legal, medical or high risk software products where being wrong in the wrong place can mean bankruptcy or even cost a life.
Which also happens with humans – does it do so at a lower rate? On its own, it kind of sounds like similar anti-self-driving-car arguments.
yeah thats why I mentioned it works well IF guided by the correct expert.
I agree that you can create a set of domain specific rules, reinforcement layer validation tools, like self driving, that vastly improves the accuracy of au & llm's. Making humans less and less needed. But where LLM's comes from the magic of generic knowledge, this will be the opposite, narrowing it down.
I kinda disagree. High risk environments just means that they will have to have a human-in-the-loop for a longer time which drastically reduce the skill required for such human (which is still requires high skill just not stupidly high).
The employers will think it requires less skill, whereas in fact it might actually require more skill to do a good job of being the human-in-the-loop.
For example, my sister is a translator and she says that checking AI translations is actually harder in many ways than doing a translation in the first place, but the agencies pay less for checking than actual translation.
1 reply →
Doesn't it increase the skill required? You need to be able to jump in at the perfect time, while waiting patiently for 99% of the time. It's like self-driving that requires you to "jump in" at the worst possible time (0.5 seconds from a crash), and stay put the rest of the time--but don't get bored or inattentive. The only way to do that would be to be so naturally good at the danger point that you can do it basically reflexively.
I think the opposite, only the most skilled will be required.
But it depends on the skill:
- For landing pages & simple saas solutions: marketeers & founders have more skill, since they understand the user best. The real skill is not the basic coding, but understanding the market.
- For security risks/architecture: senior devs can spot things in seconds
Im not a doctor or lawyer, but im sure there are cases where AI is really good in a similar way and cases where they miss the most crucial aspects.
> drastically reduce the skill required for such human
I mean thats what is wanted by some companies.
The problem, especially for things like legal is that it requires someone more skilled to read through and understand that the argument is bollocks, or the law/precedent they are banking on is in fact the right one.
We have a tool that auto-writes letters to our management companies when they break SLAs. We have a slider that goes from polite to we are going to extract your first born.
Thats simple ish to do for LLMs, and low risk.
Drafting contracts is also something we could probably do, as its mostly boilerplate. However the consequence for mis-drafting a contract is multi-million dollars.
1 reply →
If the human involved has no skill then they might as well not be there, since they're just a fall guy when things go wrong and won't do anything to prevent it from happening.
1 reply →
The end game of this is just a human capable of taking the blame when AI makes an occasional mistake and being paid for that service and risk.
Yeah but even what you describe makes it an extremely useful tool and productivity boost. Sure, we're not going to deploy a lawyer agent with full autonomy and no more oversight than a real lawyer. But isn't it wild that's now the frontier?
It's not like self driving cars where better than a human 80% of the time isn't good enough and they aren't really usable until its 95%, 99% etc.
> the study where it beats our highest caliber of knowledge workers may have some methodological deficits
The point is that if the study can't validate the claims being made then we can't actually extrapolate from that claim. What you're predicting may or may come true, but the study (which is the topic at hand) isn't useful for supporting the assertion.
> Sure, but in two years AI has gone from “impressive tool, but not a replacement for knowledge workers” to “the study where it beats our highest caliber of knowledge workers may have some methodological deficits.”
With that kind of logic ... anything is possible.
I'd say if it does have methodological deficits, it should be ignored. Measuring a length with a wet spaghetti can only result in nonsense.
Autopilots have been able to land planes for years (decades?), and yet they still don't land passengers planes at any increased rate.
Assuming it keeps improving at the same rate, which I think we are already seeing not play out. If you compare the first six months when GPT truly hit the mainstream to the previous six months, the improvements are not nearly as evident. That isn’t to say they aren’t noticeable, I could definitely tell it’s improving, but not nearly at the pace it once was.
There’s also the fact that they can’t possibly keep improving frontier models at the same rate (I.e. training investment) when investment starts slowing down. The amount of cash being burned is completely unsustainable and you’re already seeing some pullback.
On the other hand we keep seeing only marginal generational imorovements in CPU space, yet performance gains over last 10 years in CPUs are very material.
Every new model might not be a leap like it used to be, but give it enough time and improvements add up.
Nobody is disputing that. I specifically said that I can see the improvements from the last six months. What I’m saying is we can’t assume that every two years it will improve at the same rate.
The further we get into this, the more AI feels like 3-D printing. Significantly bigger and will be more widely used for sure. But nowhere near the “new industrial revolution” that all these companies are making it out to be
7 replies →
The issue is that before GPT models basically were useless for any conversation. We are literally in science fiction realm. From a text conversation perspective the gap between where we are at and what’s left to get to is relatively small.
In my opinion, the main thing we need to do is have training happen continuously. And probably more real world data (from sensors).
> what’s left to get to is relatively small
Not necessarily. In many (most?) areas of tech the rate of advancement follows a logarithmic curve. That is to say, the first 90% is achieved quickly but the last 10% takes significantly more time.
The ELIZA effect has been around since 1966. I think lots of folks feel “AI” has advanced much more quickly that it really has because of the nature of its many past boom / bust cycles.
1 reply →
it's also worth keeping in mind that alot of the 'improvements' are actually advancements in harnesses and tools.
This is the hot button right here. Most of the advancements have also come at the cost of excess: exponential token use at the expense of marginal gains.
Context is still a large limiting factor, and we have band aids around that area already. And the further along we go the further distributed LLMs get in terms of additional pieces.
As for the original article and sentiment I'm sure AI will be a boon for law. It's going to be much easier for the general consumer / person / small business to represent themselves which feels like a win. The downside is I feel like we're tracking towards a digital hell of "virtual lawyers" that will be at the whim of any org. Consumer laws really need to change now to help avoid this dystopian path we're on.
1 reply →
I agree. But notice that you assume that there is a metric with which you can messure improvement. Which is fine if you are measuring against your personal taste.
But it might be that the optimization target itself has a ceiling. If you're training toward human approval ratings from a broad population, you converge toward what median preference selects for. The plateau is baked into what you're measuring against.
It doesn't even need to 'improve' at the same rate to have extraordinary impact in society. Even if the frontier models stayed roughly the same in cost and capability for just 1-2 years, the harnesses and processes built around them would mature. We have not yet metabolized these models. Frankly, a lot of this feels like late 80s early 90s complaints about how office computerization wasn't happening yet--it was, just not at the rate promised by the companies selling computers to businesses. We don't look back at those people in the 80s saying that paper was here to stay as visionaries just because they noticed that propaganda temporarily outran the business environment.
I just wish people would take a step back and think about the timescales here. Language Models are Unsupervised Multitask Learners was in 2019. Here we are seven years later and LOOK AROUND. The landscape is unrecognizable. It's worth thinking about who, in those seven years, had an accurate estimate of the future and whose estimate fundamentally failed. And just as it is valuable to note where propaganda about progress speeds past where we are, we should remember that it is costless to announce that at some unspecified future time all of this will settle down and things will go back to the way they were.
> I just wish people would take a step back and think about the timescales here. Language Models are Unsupervised Multitask Learners was in 2019. Here we are seven years later and LOOK AROUND. The landscape is unrecognizable. It's worth thinking about who, in those seven years, had an accurate estimate of the future and whose estimate fundamentally failed. And just as it is valuable to note where propaganda about progress speeds past where we are, we should remember that it is costless to announce that at some unspecified future time all of this will settle down and things will go back to the way they were.
People can understand all this and still disagree with you.
1 reply →
I will never trust an AI as much as a person
>the study where it beats our highest caliber of knowledge workers may have some methodological deficits.
That isn’t even remotely what this study is looking at.
Your “some methodological deficits” is doing a lot of work.
What if the methodological deficits are actually causing the paper to underestimate the quality of the AI responses? Why assume any deficits would bias the AI's competence upwards instead of downwards?
Why not assume the AI is god and the rapture is happening tomorrow?
I mean, my shoe could beat the highest caliber of knowledge workers with enough methodological deficits.
"the study that claims it beats our highest caliber of knowledge workers has methodological deficits" ftfy
so extrapolating from that, in another two years it will continue to bamboozle