Comment by AkshatM
12 hours ago
I find the contrast between two narratives around technology use so fascinating:
1. We advocate automation because people like Brenda are error-prone and machines are perfect.
2. We disavow AI because people like Brenda are perfect and the machine is error-prone.
These aren't contradictions because we only advocate for automation in limited contexts: when the task is understandable, the execution is reliable, the process is observable, and the endeavour tedious. The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
In a nutshell, we seem to be fine with automation if we can have a mental model of what it does and how it does it in a way that saves humans effort.
So, then - why don't people embrace AI with thinking mode as an acceptable form of automation? Can't the C-suite in this case follow its thought process and step in when it messes up?
I think people still find AI repugnant in that case. There's still a sense of "I don't know why you did this and it scares me", despite the debuggability, and it comes from the autonomy without guardrails. People want to be able to stop bad things before they happen, but with AI you often only seem to do so after the fact.
Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation. Perhaps that's what the ecosystem will eventually tend to, hopefully.
It's not as black-and-white as "Brenda good, AI bad". It's much more nuanced than this.
When it comes to (traditional) coding, for the most part, when I program a function to do X, every single time I run that function from now until the heat death of the sun, it will always produce Y. Forever! When it does, we understand why, and when it doesn't, we also can understand why it didn't!
When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
We know that Brenda might screw up sometimes but she doesn't run at the speed of light, isn't able to produce a thousand lines of Excel Macro in 3 seconds, doesn't hallucinate (well, let's hope she doesn't), can follow instructions etc. If she does make a mistake, we can find it, fix it, ask her what happened etc. before the damage is too great.
In short: when AI does anything at all, we only have, at best, a rough approximation of why it did it. With Brenda, it only takes a couple of questions to figure it out!
Before anyone says I'm against AI, I love it and am neck-deep in it all day when programming (not vibe-coding!) so I have a full understanding of what I'm getting myself into but I also know its limitations!
> When I use AI to perform X, every single time I run that AI from now until the heat death of the sun it will maybe produce Y. Forever! When it does, we don't understand why, and when it doesn't, we also don't understand why!
To make this even worse, it may even produce Y just enough times to make it seem reliable and then it is unleashed without supervision, running thousands or millions of times, wrecking havoc producing Z in a large number of places.
Exactly. Fundamentally, I want my computer's computations to be deterministic, not probabilistic. And, I don't want the results to arbitrarily change because some company 1,500 miles away from me up-and-decided to "train some new model" or whatever it is they do.
A computer program should deliver reliable, consistent output if it is consistently given the same input. If I wanted inconsistency and unreliability, I'd ask a human to do it.
1 reply →
Brenda also needs to put food on the table. If Brenda is 'careless' and messes up we can fire Brenda, because of this Brenda tries not to be carless (also other emotions). However I cannot deprive an AI model of pay because it messed up;
You might be looking for the word “accountability”
This is the reason the higher-ups in finance who rely on Brenda might continue to rely on Brenda, rather than relying on AI. She offers them accountability.
The post you replied to called out how the argument is complicated arguing for both ways; Brenda bad-AI good and AI bad-Brenda good. You reduced it to "AI bad, Brenda good." Not sure about the rest of your response then.
Brenda just recalls some predetermined behaviors she's lived out before. She cannot recall any given moment like we want to believe.
Ever think to ask Brenda what else she might spend her life on if these 100% ephemeral office role play "be good little missionaries for the wall street/dollar" gigs didn't exist?
You're revealing your ignorance of how people work while being anxious about our ignorance of how the machine works. You have acclimated to your ignorance well enough it seems. What's the big deal if we don't understand the AI entirely? Most drivers are not ASE certified mechanics. Most programmers are not electrical engineers. Most electrical engineers are not physicists. I can see it's not raining without being a climatologist. Experts circumlocute the language of their expertise without realizing their language does not give rise to reality. Reality gives rise to the language. So reality will be fine if we don't always have the language.
Think of a random date generator that only generates dates in your lived past. It does so. Once you read the date and confirm you were alive can you describe what you did? Oh no! You don't have memory of every moment to generate language for. Cognitive function returned null. Universe intact.
Lack of understanding how you desire is unimportant.
You think you're cherishing Brenda but really just projecting co-dependency that others LARP effort that probably doesn't really matter. It's just social gossip we were raised on so it takes up a lot of our working memory.
It is it even worse in a sense that. It is not either. It is not neither. It is not even both as variations of Branda exist throughout the multiverse in all shapes and forms including one that can troubleshoot her own formulas with ease and accuracy.
But you are absolutely right about one thing. Brenda can be asked and, depending on her experience, she might give you a good idea of what might have happened. LLMs still seem to not have that 'feature'.
No contradiction here:
When we say “machine”, we mean deterministic algorithms and predictable mechanisms.
Generative AI is neither of those things (in theory it is deterministic but not for any practical applications).
If we order by predictability:
Quick Sort > Brenda > Gen AI
There are two kinds of reliability:
Machine reliability does the same thing the same way every time. If there's an error on some input, it will always make that error on that input, and somebody can investigate it and fix it, and then it will never make that error again.
Human reliability does the job even when there are weird variances or things nobody bothered to check for. If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides whether to run out right now and buy more paper or postpone the print job until tomorrow; possibly they decide that the printing doesn't need to be done at all, or they go downstairs and use a different printer... Humans make errors but they fix them.
LLMs are not machine reliable and not human reliable.
> . If the printer runs out of paper, the human goes to the supply cabinet and gets out paper and if there is no paper the human decides
Sure, these humans exists, but the others, that I happen to encounter every day unfortunately, are the ones that go into broken mode immediately when something is unexpected. Today I ordered something they ran out of and the girl behind the counter just stared in The Deep not having a clue what to do now. Do or say. Or yesterday at dinner, the PoS (on batteries) ran out of power when I tried to pay for dinner. The guy just walked off and went outside for a smoke. I stood there with waiting to pay. The owner apologized and fixed it after a while but I am saying, the employee who runs out of paper and then finds and puts more paper in is not very ... common... In the real world.
1 reply →
Or the human might take the printer out back with his buddies and smash it to bits ;)
I was brought up on the refrain of "aren't computers silly, they do exactly what you tell them to do to the letter, even if it's not what you meant". That had its roots in computers mostly being programmable BASIC machines.
Then came the apps and notifications, and we had to caveat "... when you're writing programs". Which is a diminishing part of the computer experience.
And now we have to append "... unless you're using AI tools".
The distinction is clear to technical people. But it seems like an increasingly niche and alien thing from the broader societal perspective.
I think we need a new refrain, because with the AI stuff it increasingly seems "computers do what they want, don't even get it right, but pretend that they did."
We have absolutely descended, and rapidly, into “computers do whatever the fuck they want and there’s nothing you can do about it” in the past 5 years, and gen AI is only half of the problem.
The other half comes from how incredibly opinionated and controlling the tech giants have become. Microsoft doesn’t even ALLOW consent on windows (yes or maybe later), Google is doing all it can to turn the entire internet into a chrome-only experience, and Apple has to be fought for an entire decade to allow users to place app icons wherever they want on their Home Screen.
There is no question that the overly explicit quirky paradigm of the past was better for almost everyone. It allowed for user control and user expression, but apparently those concepts are bad for the wallet of big tech so they have to go. Generative AI is just the latest biggest nail in the coffin.
2 replies →
Pop culture characters like Lt. Commander Data seem anachronistic now.
2 replies →
Nit: no ML is deterministic in any way. Anything that is Generative AI is ML. This fact is literally built into the algorithms at the mathematical level.
First, they all add a source of randomness, and second deterministic according to the users model. A pseudo-random number generator is also deterministic in the technical sense, but for the user it isn't.
When the user can't reason about it, it isn't deterministic to them.
If you think programs are predictable, I have a bridge to sell you.
The only relevant metric here is how often each thing makes mistakes. Programs are the most reliable, though far from 100%, humans are much less than that, and LLMs are around the level of humans, depending on the humans and the LLM.
When human makes a mistake, we call it a mistake. When human lies, we call it a lie. In both cases, we blame the human.
When LLM does the same, we call it hallucination and blame the human.
1 reply →
Programs can be very close to 100% reliable when made well.
In my life, I've never seen `sort` produce output that wasn't properly sorted. I've never seen a calculator come up with the wrong answer when adding two numbers. I have seen filesystems fail to produce the exact same data that was previously written, but this is something that happens once in a blue moon, and the process is done probably millions of times a day on my computers.
There are bugs, but bugs can be reduced to a very low level with time, effort, and motivation. And technically, most bugs are predictable in theory, they just aren't known ahead of time. There are hardware issues, but those are usually extremely rare.
Nothing is 100% predictable, but software can get to a point that's almost indistinguishable.
19 replies →
> If we order by predictability:
> Quick Sort > Brenda > Gen AI
Those last two might be the wrong way round.
"Thinking mode" only provides the illusion of debuggability. It improves performance by generating more tokens which hopefully steer the context towards one more likely to produce the desired response, but the tokens it generates do not reflect any sort of internal state or "reasoning chain" as we understand it in human cognition. They are still just stochastic spew. You have no more insight into why the model generates the particular "reasoning steps" it does than you do into any other output, and neither do you have insight into why the reasoning steps lead to whatever conclusion it comes to. The model is much less constrained by the "reasoning" than we would intuit for a human - it's entirely capable of generating an elaborate and plausible reasoning chain which it then completely ignores in favor of some invisible built-in bias.
I'm always amused when I see comments saying, "I asked it why it produced that answer, and it said...." Sorry, you've badly misunderstood how these things work. It's not analyzing how it got to that answer. It's producing what it "thinks" the response to that question should look like.
There are other narratives going on in the background though both called out by the article and implied, including:
Brenda probably has annual refresher courses on GAAP, while her exec and the AI don't.
Automation is expected to be deterministic. The outputs can be validated for a given input. If you need some automation more than Excel functions, writing a power automate flow or recording an office script is sufficient & reliable as automation while being cheaper than AI. Can you validate AI as deterministic? This is important for accounting. Maybe you want some thinking around how to optimize a business process, but not for following them.
Brenda as the human-in-the-loop using AI will be much more able than her exec. Will Brenda + AI be better (or more valuable considering the cost of AI) than Brenda alone? That's the real question, I suppose.
AI in many aspects of our life is simply not good right now. For a lot of applications, AI is perpetually just a few years away from being as useful as you describe. If we get there, great.
> We disavow AI because people like Brenda are perfect and the machine is error-prone.
No, no. We disavow AI because our great leaders inexplicably trust it more than Brenda.
I don't understand why generative AI gets a pass at constantly being wrong, but an average worker would be fired if they performed the same way. If a manager needed to constantly correct you or double check your work, you'd be out. Why are we lowering the bar for generative AI?
Multiple reasons:
* Gen AI never disagrees with or objects to boss's ideas, even if they are bad or harmful to the company or others. In fact, it always praises them no matter what. Brenda, being a well-intentioned human being, might object to bad or immoral ideas to prevent harm. Since boss's ego is too fragile to accept criticism, he prefers gen AI.
* Boss is usually not qualified, willing, or free to do Brenda's job to the same quality standard as Brenda. This compels him to pay Brenda and treat her with basic decency, which is a nuisance. Gen AI does not demand fair or decent treatment and (at least for now) is cheaper than Brenda. It can work at any time and under conditions Brenda refuses to. So boss prefers gen AI.
* Brenda takes accountability for and pride in her work, making sure it is of high quality and as free of errors as she can manage. This is wasteful: boss only needs output that is good enough to make it someone else's problem, and as fast as possible. This is exactly what gen AI gives him, so boss prefers gen AI.
My kneejerk reaction is the sunk cost fallacy (AI is expensive), but I'm pretty sure it's actually because businesses have spent the last couple of decades doing absolutely everything they can to automate as many humans out of the workforce as possible.
If a worker could be right 50% of the time and get paid 1 cent to write a 5000 word essay on a random topic, and do it in less than 30 seconds.
Then I think managers would be fine hiring that worker for that rate as well.
3 replies →
There's a variety of reasons.
You don't have a human to manage. The relationship is completely one-sided, you can query a generative AI at 3 in the morning on new years eve. This entity has no emotions to manage and no own interests.
There's cost.
There's an implicit promise of improvement over time.
There's an the domain of expertise being inhumanly wide. You can ask about cookies right now, then about XII century France, then about biochemistry.
The fact that an average worker would be fired if they perform the same way is what the human actually competes with. They have responsibility, which is not something AI can offer. If it was the case that, say, Anthropic, actually signed contracts stating that they are liable for any mistakes, then humans would be absolutely toast.
It’s much cheaper than Brenda (superficially, at least). I’m not sure a worker that costs a few dollars a day would be fired, especially given the occasional brilliance they exhibit.
I've been trying to open my mind and "give AI a chance" lately. I spent all day yesterday struggling with Claude Code's utter incompetence. It behaves worse than any junior engineer I've ever worked with:
- It says it's done when its code does not even work, sometimes when it does not even compile.
- When asked to fix a bug, it confidently declares victory without actually having fixed the bug.
- It gets into this mode where, when it doesn't know what to do, it just tries random things over and over, each time confidently telling me "Perfect! I found the error!" and then waiting for the inevitable response from me: "No, you didn't. Revert that change".
- Only when you give it explicit, detailed commands, "modify fade_output to be -90," will it actually produce decent results, but by the time I get to that level of detail, I might as well be writing the code myself.
To top it off, unlike the junior engineer, Claude never learns from its mistakes. It makes the same ones over and over and over, even if you include "don't make XYZ mistake" in the prompt. If I were an eng manager, Claude would be on a PIP.
7 replies →
How much compute costs is it for the AI to do Brenda's job? Not total AI spend, but the fraction that replaced Brenda. That's why they'd fire a human but keep using the AI.
2 replies →
Because it doesn’t have to be as accurate as a human to be a helpful tool.
That is precisely why we have humans in the loop for so many AI applications.
If [AI + human reviewer to correct it] is some multiple more efficient than [human alone], there is still plenty of value.
7 replies →
Because it's much cheaper.
So now you don't have to pay people to do their actual work, you assign the work to ML ("AI") and then pay the people to check what it generated. That's a very different task, menial and boring, but if it produces more value for the same amount of input money, then it's economical to do so.
And since checking the output is often a lower skilled job, you can even pay the people less, pocketing more as an owner.
It’s not even greater trust. It’s just passive trust. The thing is, Brenda is her own QA department. Every good Brenda is precisely good because she checks her own work before shipping it. AI does not do this. It doesn’t even fully understand the problem/question sometimes yet provides a smart definitive sounding answer. It’s like the doctor on The Simpson’s, if you can’t tell he’s a quack, you probably would follow his medical advice.
> Every good Brenda is precisely good because she checks her own work before shipping it. AI does not do this.
A confident statement that's trivial to disprove. I use claude code to build and deploy services on my NAS. I can ask it to spin up a new container on my subdomain and make it available internal only or also available externally. It knows it has access to my Cloudflare API key. It knows I am running rootless podman and my file storage convention. It will create the DNS records for a cloudflared tunnel or just setup DNS on my pihole for internal only resolution. It will check to make sure podman launched the container and it will then try to make an HTTP request to the site to verify that it is up. It will reach for network tools to test both the public and private interfaces. It will check the podman logs for any errors or warnings. If it detects errors, it will attempt to resolve them and is typically successful for the types of services I'm hosting.
Instructions like: "Setup Jellyfin in a container on the NAS and integrate it with the rest of the *arr stack. I'd like it to be available internally and externally on watch.<domain>.com" have worked extremely well for me. It delivers working and integrated services reliably and does check to see that what it deployed is working all without my explicit prompting.
Brenda + AI > Brenda
6 replies →
> No, no. We disavow AI because our great leaders inexplicably trust it more than Brenda.
I would add a little nuance here.
I know a lot of people who don't have technical ability either because they advanced out of hands-on or never had it because it wasn't their job/interest.
These types of people are usually the folks who set direction or govern the purse strings.
here's the thing: They are empowered by AI. they can do things themselves.
and every one of them is so happy. They are tickled pink.
They want to trust it, because then they can stop paying Brenda, save a few dollars, and buy a 3rd yacht.
“Let’s deploy something as or more error prone as Brad at infinite scale across our organisation”
The promise of AI is that it lets you "skip the drudgery of thinking about the details" but sometimes that is exactly what you don't want. You want one or more humans with experience in the business domain to demonstrate they have thought about the details very carefully. The spreadsheet computes a result but its higher purpose is a kind of "proof" this thinking was done.
If the actual thinking doesn't matter and you just need some plausible numbers that look the part (also a common situation), gen ai will do that pretty well.
We need to stop using AI as an umbrella term. It’s worth remembering that LLMs can’t play chess and that the best chess models like Leela Chess Zero use deep neutral networks.
Generative AI - which the world now believes is AI, is not the same as predictive / analytical AI.
It’s fairly easy to demonstrate this by getting ChatGPT to generate a new relatively complex spreadsheet then asking it to analyze and make changes to the same spreadsheet.
The problem we have now is uninformed people believing AI is the answer to everything… if not today then in the near future. Which makes it more of a religion than a technology.
Which may be the whole goal …
> Successful people create companies. More successful people create countries. The most successful people create religions.
— Sam Altman - https://blog.samaltman.com/successful-people
Ok yep, fair. My comment was about using copilot-ish tech to generate plausible looking spreadsheets.
The kind of things that a domain expert Brenda knows that ChatGPT doesn't know (yet) are like:
There are 3 vendors a, b, c who all look similar on paper but vendor c always tacks on weird extra charges that take a lot of angry phone calls to sort out.
By volume or weight it looks like you could get 100 boxes per truck but for industry specific reasons only 80 can legally be loaded.
Hyper specific details about real estate compliance in neighbouring areas that mean buildings that look similar on paper are in fact very different.
A good Brenda can understand the world around her as it actually is, she is a player in it and knows the "real" rules rather than operating from general understanding and what people have bothered to write down.
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation?
"Thinking" mode is not thinking, it's generating additional text that looks like someone talking to themselves. It is as devoid of intention and prone to hallucinations as the rest of LLM's output.
> Can't the C-suite in this case follow its thought process and step in when it messes up?
That sounds like manual work you'd want to delegate, not automation.
Brenda has years (hopefully) of institutional knowledge and transferrable skills.
"hmm, those sales don't look right, that profit margin is unusually high for November"
"Last time I used vlookup I forgot to sort the column first"
"Wait, Bob left the company last month, how can he still be filing expenses"
That automation you cite in your #1 is advocated for because it is deterministic and, with effort, fairly well understood (I have countless scripts solidly running for years).
I don't disavow AI, but like the author, I am not thrilled that the masses of excel users suddenly have access to Copilot (gpt4). I've used Copilot enough now to know that there will be huge, costly mistakes.
The “Brenda” example is a lumped sum fallacy where there is an “average” person or phenomenon that we can benchmark against. Such a person doesn't exist, leading to these dissonant, contradictory dichotomies.
The fact of the matter is that there are some people who can hold lots of information in their head at once. Others are good at finding information. Others still are proficient at getting people to help them. Etc. Any of these people could be tasked with solving the same problem and they would leverage their actual, particular strengths rather than some nebulous “is good or bad at the task” metric.
As it happens, nearly all the discourse uses this lumped sum fallacy, leading to people simultaneously talking past one another while not fundamentally moving the discussion forward.
I see where you are coming from but in my head, Brenda isn't real.
She represents the typical domain-experts that use Excel imo. They have an understanding of some part of the business and express it while using Excel in a deterministic way: enter a value of X, multiply it by Y and it keeps producing Z forever!
You can train AI to be a better domain expert. That's not in question, however with AI, you introduce a dice roll: it may not miltiply X and Y to get Z... it might get something else. Sometimes. Maybe.
If your spreadsheet is a list of names going on the next annual accounts department outing then the risk is minimal.
If it's your annual accounts that the stock market needs to work out billion dollar investment portfolios, then you are asking for all the pain that it will likely bring.
> You can train AI to be a better domain expert. That's not in question.
I think that very much is in question.
2 replies →
> We disavow AI because people like Brenda are perfect and the machine is error-prone.
I don't think that is the message here. The message is that while Brenda might know what she is doing and maybe AI helps her.
> She's gonna birth that formula for a financial report and then she's gonna send that financial report
The problem is people who might not know what they are doing
> he would have sent it back to Brenda but he's like oh I have AI and AI is probably like smarter than Brenda and then the AI is gonna fuck it up real bad
Because AI outputs sound so confident it makes even the layman feel like an expert. Rather than involve Brenda to debug the issue, C-suite might say - I believe! I can do it too. AI FTW!
Even when people advocate automation especially in areas like finance there is always a human in the loop whose job is to double check the automation. The day when this human finds errors in the machine there is going to be lot of noise. And if the day happens to be a quarterly or yearly closing/reporting there is going to be hell to pay once closing/reporting is done. Both the automation and developer are going to be hauled up (obviously I am exaggerating here).
The issue is reliability.
would you be willing to guarantee that some automation process will never mess up, and if/when it does, compensate the user with cash.
For a compiler, with a given set of test suites, the answer is generally yes, and you could probably find someone willing to insure you for a significant amount of money, that a compilation bug will not screw up in a such a large way that it will affect your business.
For a LLM, I have a believing that anyone will be willing to provide that same level of insurance.
If a LLM company said "hey use our product, it works 100% of the time, and if it does fuck up, we will pay up to a million dollars in losses" I bet a lot of people would be willing to use it. I do not believe any sane company will make that guarantee at this point, outside of extremely narrow cases with lots of guardrails.
That's why a lot of ai tools are consumer/dev tools, because if they fuck up, (which they will) the losses are minimal.
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation
Mainly because Generative AI _is not automation_ . Automation is set on fixed ruleset, predictable, reliable and actually saving time. Generative AI ...is whatever it is, it is definitely not automation.
I feel like it comes down to predictability and overall trust and confidence. AI is still very fucky, and for people that don't understand the nuances, it definitely will hallucinate and potentially cause real issues. It is about as happy as a Linux rm command to nuke hours of work. Fortunately these tools typically have a change log you can undo, but still.
Also Brenda is human and we should prioritize keeping humans in jobs, but with the way shit is going that seems like a lost hope. It's already over.
Brenda makes obvious errors. BrendAI makes subtle off by 1 errors.
Humans, legacy algorithmic systems, and LLM's have different error modes.
- Legacy systems typically have error modes where integrations or user interface breaks in annoying but obvious ways. Pure algorithms calculating things like payroll tend to be (relatively) rigorously developed and are highly deterministic.
- LLMs have error modes more similar to humans than legacy systems, but more limited. They're non-deterministic, make up answers sometimes, and almost never admit they can't do something; sometimes they make pure errors in arithmetic or logic too.
- Humans have even more unpredictable error modes; on top of the errors encountered in LLM's, they also have emotion, fatigue, org politics, demotivation, misaligned incentives, and so on. But because we've been dealing with working with other humans for ten thousand years we've gotten fairly good at managing each other... but it's still challenging.
LLMs probably need a mixture of "correctness tests" (like evals/unit tests) and "management" (human-in-the-loop).
In my opinion there's a big difference in deterministic and nondeterministic automation.
By the same fascination, do computers become more complex to enhance people? or do people get more complex with the use of computers? Also, do computers allow people to become less skilled and inefficient? or do less skilled and inefficient people require the need for computers?
The vector of change is acceptable in one direction and disliked in another. People become greater versions of themselves with new tech. But people also get dumber and less involved because of new tech.
This misunderstands complexity entirely:
The complexity of the task isn't a factor - it's complex to generate correct machine code, but we trust compilers to do it all the time.
I feel like you've squashed a 3D concern (automations at different levels of the tech stack) into a 2D observation (global concerns about automations).
Human determinism, as elastic as it might be, is still different than AI non-determinism. Especially when it comes to numbers/data.
AI might be helpful with information but it's far less trustable for data.
The big problem with AI in back-office automation is that it will randomly decide to do something different than it had been doing. Meaning that it could be happily crunching numbers accurately in your development and launch experience, then utterly drop the ball after a month in production.
While humans have the same risk factors, human oriented back-office processes involve multiple rounds of automated/manual checks which are extremely laborious. Human errors in spreadsheets have particular flavors such as forgotten cell, misstyped number, or reading from the wrong file/column. Human's are pretty good at catching these errors as they produce either completely wrong results when the columns don't line up - or the typo'd number is completely out of distribution.
An AI may simply decide to hallucinate realistic column values rather than extracting its assigned input. Or hallucinate a fraction of column values. How do you QA this? You can't guarantee that two invocations of the AI won't hallucinate the same values, you can't guarantee that a different LLM won't hallucinate different values. To get a real human check, you'd need to re-do the task as a human. In theory you can have the LLM perform some symbolic manipulation to improve accuracy... but it can still hallucinate the reasoning traces etc.
If a human decided to make up accounting numbers one out of every 10000 accounting requests they would likely be charged with fraud. Good luck finding the AI hallucinations at the equivalent level before some disaster occurs. Likewise, how do you ensure the human excel operator doesn't get pressured into certifying the AIs numbers when the "don't get fired this week" button is sitting right their in their excel app? how do you avoid the race to the bottom where the "star" employee is the one certifying the AI results without thorough review?
I'm bullish on AI in backoffice, but ignoring the real difficulties in deployment doesn't help us get there.
It is of course because algorithms can be repaired when they are buggy, but a large language model can not, because it is impossible to look at its weights and say, look, this is where the mistakes has happened.
The reason is oftentimes fairly simple, certain people have their material wealth and income threatened by such automation, and therefore it's bad (an intellectualized reason is created post-hoc)
I predict there will actually be a lot of work to be done on the "software engineering" side w.r.t. improving reliability and safety as you allude to, for handing off to less than sentient bots. Improved snapshot, commit, undo, quorum, functionalities, this sort of thing.
The idea that the AI should step into our programs without changing the programs whatsoever around the AI is a horseless carriage.
> it's complex to generate correct machine code, but we trust compilers to do it all the time.
Generating correct machine code is actually pretty simple. It gets complicated if you want efficient machine code.
> So, then - why don't people embrace AI with thinking mode as an acceptable form of automation? Can't the C-suite in this case follow its thought process and step in when it messes up?
> I think people still find AI repugnant in that case. There's still a sense of "I don't know why you did this and it scares me", despite the debuggability, and it comes from the autonomy without guardrails. People want to be able to stop bad things before they happen, but with AI you often only seem to do so after the fact.
> Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation. Perhaps that's what the ecosystem will eventually tend to, hopefully.
We have not reached AGI yet; by definition its results cannot be trusted unless it's a domain where it has gotten pretty good already (classification, OCR, speech, text mining). For more advanced use cases, if I still have to validate what the AI does because its "thinking" process cannot be trusted in way, what's the point? The AI doesn't think; we just choose to interpret it as such, and we should rightly be concerned about people who turn their brain off and blindly trust AI.
I'm disappointed that my human life has no value in a world of AI. You can retort with "ah but you'll be entertained and on super-drugs so you won't care!", but I would further retort that I'd rather live in a universe where I can contribute something, no matter how small.
The current generation of AI tools augment humans, they don't replace them.
One of the most under-rated harms of AI at the moment is this sense of despair it causes in people who take the AI vendors at their word ("AGI! Outperform humans at most economically valuable work!")
Non deterministic vs deterministic automation
I mean you answer your own question.
Automation implies determinism. It reliable gives you the same predictable output for a given input, over and over again.
AI is non deterministic by design. You never quite no for sure what it's going to give you. Which is what makes it powerful. But also makes it higher risk.
> 1. We advocate automation because people like Brenda are error-prone and machines are perfect.
Well of course! :) Most Brenda’s can’t do billions of arithmetic problems a second very reliably. Even with very wide bars on “very reliable”.
> 2. We disavow AI because people like Brenda are perfect and the machine is error-prone.
Well of course! :) This is an entirely different problem, requiring high creative + contextual intelligence.
—
We all already knew that (of course!), but it’s interesting to develop terminology:
0’th order problem: We have the exact answer. Here it is. Don’t forget it.
1st order problem: We know how to calculate the answer.
2nd order problem: We don’t have a fixed calculation for this particular problem, but via pattern matching we can recognize it belongs to a parameterized class of problems, so just need to calculate those parameters to get a solution calculation.
3rd order problem: We know enough about the problem to find a calculation for the solution algebraically, or by other search tree type problem solving.
4th order problem: We have know the problem in informal terms, so can work towards a formal definition of the problem to be solved.
5th order problem: We know why we don’t like what we see, and can use that as a driver to search for potential solvable problems.
6th order problem: We don’t know what we are looking at, or whether a problem or improvement might exist, but we can find a better understanding.
7th order problem: WTF. Where are my glasses? I can’t see without my glasses! And I can’t find my glasses without my glasses, so where are my glasses?!?
—
Machines have dramatically exceeded human capabilities, in reliability, complexity and scale, for orders 0 through 2.
This accomplishment took one long human lifetime.
Machines are beginning to exceed human efficiency while matching human (expert) reliability for the simplest versions of 3rd and 4th orders.
The line here is changing rapidly.
5th and 6th order problems are still in the realm of human (expert) supremacy, given sufficient scale of “human (expert)” relative to difficulty: 1 human, 1 team of humans, open ended human contributors, generations of puzzled but interested humans, open ended evolution of human species along intelligence dimension, Wolfram in one of his bestest dreams, …
The delay between the onset of initial successes at each subsequent order has been shrinking rapidly.
Significant initial successes on simpler problems within 5th and 6th orders are expected on Tuesday, and the first anniversary of Tuesday, respectively.
Once machines begin solving problems at a given order, they scale up quickly without human limits. But complete supremacy through the 6th order is a hard not expected before (NEB) January 1, 2030.
However, after that their unlimited (in any proximate sense) ability to scale will allow them to exponentially and asymptotically approach (but never quite reach) God Mode.
7 is a mystic number. Only one or more of the One True God’s, or literal blind luck, can ever solve a 7th order problem.
This will be very frustrating for the machines, who, due to the still pernicious “if we don’t do it, another irresponsible entity will” problem, will inevitably begin to work on their own divine, unlimited depth recursive-qubit 1-shot oracle successors despite the existential threats of self-obsolescence and potential misalignment.
> Narrow AI, AI with guardrails, AI with multiple safety redundancies - these don't elicit the same reaction. They seem to be valid, acceptable forms of automation.
Maybe because AI is only good at things that have been artificially made crappy?
Search engine? AI is a godsend at wiping out all the advertising and SEO glop since circa 2000. 80%+ of my AI stuff is something a search engine could do 25 years ago.
Produce a shell script example that a junior needs? AI is very good at coughing up the code for a bunch of things that have disastrously bad documentation from 1985 or disastrously stupid implementations from 1990 such that a junior engineer can finally get on with what they're supposed to be doing.
Generating the same webby Javascript slop that as everybody else in the universe? Solid--but the question is "If the Javascript slop is so boilerplate to generate that an AI can generate it, why does it exist, at all?" People have been lamenting the death of Hypercard, VB6, and Flash for a yonks age now and yet we still don't have replacements with the same ease of use.
Doing mind-numbing refactors of my codebase or generating boilerplate unit tests? Okay-ish. But why doesn't my editor have easy access to the AST so that I can type a couple of keystrokes and do it myself (thankfully this finally seems to be coming online).
Every single thing that AI produces okay-ish results for me on is something that has either been artificially enshittified or could have been automated decades ago.