← Back to context

Comment by mediaman

1 day ago

Great read. One of the interesting insights from it is how difficult good application of AI is.

A lot of companies are just "deploying a chatbot" and some of the results from this study show that this doesn't work very well. My experience is similar: deploying simple chatbots to the enterprise doesn't do a lot.

For things to get better, two things are required, neither of which are easy:

- Integration into existing systems. You have to build data lakes or similar system that allow the AI to use data and information broadly across an enterprise. For example, for an AI tool to be useful in accounting, it's going to need high quality data access to the company's POs, issued invoices, receivers, GL data, vendor invoices, and so on. But many systems are old, have dodgy or nonexistent APIs, and data is held in various bureaucratic fiefdoms. This work is hard and doesn't scale that well.

- Knowledge of specific workflows. It's better when these tools are built with specific workflows in mind that are designed around specific peoples' jobs. This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities. My experience is that I sell software as "AI solutions," but often I feel a lot of the value created is because it's replacing bad processes (either terrible older software, or attempting to do collaborative work via spreadsheet), and the AI tastefully sprinkled throughout may not be the primary value driver.

Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.

Moreover, this is deceiving because for some types of work (coding, ideating around marketing copy) you really don't need that much scaffolding at all because the capabilities are latent in the AI, and layering stuff on top mostly gets in the way.

My experience is that this type of work is a narrow slice of the total amount of work to be done, though, which is why I'd agree with the overall direction this study is suggesting that creating actual measurable major economic value with AI is going to be a long-term slog, and that we'll probably gradually stop calling it AI in the process as we attenuate to it and it starts being used as a tool within software processes.

> how difficult good application of AI is.

The only interesting application I've identified thus far in my domain in Enterprise IT (I don't do consumer-facing stuff like chatbots) is in replacing tasks that previously would've been done by NLP: mainly extraction, synthesis, classification. I am currently working a long-neglected dataset that needs a massive remodel and I think that would've taken a lot of manual intervention and a mix of different NLP models to whip into shape in the past, but with LLMs we might be able to pull it off with far fewer resources.

Mind you at the scale of the customer I am currently working with, this task also would've never been done in the first place - so it's not replacing anyone.

> This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities

Yes, the other use case I'm seeing is in peppering already existing workflow integrations with a bit of LLM magic here and there. But why would I re-work a worklfow that's already implemented and well-understood in Zapier, n8n or Python with total reliability.

> Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.

> My experience is that this type of work is a narrow slice of the total amount of work to be done

Reading you I get the sense we are on the same page on a lot of thing and I am pretty sure if we worked together we'd get along fine. I'm struggling a bit with the LLM delulus as of late so it's a breath of fresh air to read people out there who get it.

  • As I see it three letter organizations have been using frameworks like Apache UIMA to build information extraction pipelines that are manual at worst and hybrid at best. Before BERT the models we had for this sucked, only useful for certain things, and usually requiring training sets of 20,000 or so examples.

    Today the range of things for which the models are tolerable to "great" has greatly expanded. In arXiv papers you tend to see people getting tepid results with 500 examples, I get better results with 5000 examples and diminishing returns past 15k.

    For a lot of people it begins and ends with "prompt engineering" of commercial decoder models and evaluation isn't even an afterthought For information extraction, classification and such though you get often good results with encoder models (e.g. BERT) put together with serious eval, calibration and model selection. Still the system looks like the old systems if your problem is hard and has to be done in a scalable way, but sometimes you can make something that "just works" without trying too hard, keeping your train/eval data in a spreadsheet.

LLM chatbots are a step forward for customer support. Well, ours started hallucinating a support phone number that while is a real number is not our number. Lots of people started calling which was a bad time for everyone. Especially the person's number it actually is. So maybe two steps forward and occasionally one back.

  • As a customer, LLM chatbots are fifteen steps backwards and have approximately zero upsides. I hate them with a deep and abiding passion.

an analogy i find useful is the search-engine (google).

yea the search-engine improved productivity of almost everyone, but didnt change any workflows.

I work on an application that uses AI to index and evaluate any given corpus (like papers, knowledge bases etc) of knowledge and it has been a huge help here, and I know its because we are dealing with what is effectively structured data that can be well classified once identified, and we have relatively straightforward ways of doing identification. The real magic is when the finely tuned AI started to correctly stitch pieces of information together that previously didn't appear to be related that is the secret sauce beyond simply indexing for search

Code is similar - programming languages have rules that are well known, couple that with proper identification, pattern matching and thats how you get to these generated prototypes[0] done via so called 'vibe coding' (not the biggest fan of the term but I digress)

I think this is early signs that this generation of LLMs at least, are likely to be augmentations to many existing roles as opposed to strictly replacing them. Productivity will increase by a good magnitude once the tools are well understood and scoped to task

[0]: They really are prototypes. You will eventually hit walls by having an LLM generate the code without understanding the code.

> Integration into existing systems.

Integration alone isn't enough. Organizations let their data go stale, because keeping it updated is a political task instead of a technical one. Feeding an AI stale data effectively renders it useless, because it doesn't have the presence of mind to ask for assistance when it encounters an issue, or to ask colleagues if this process is still correct even though the expected data doesn't "fit".

Automations - including AI - require clean, up-to-date data in order to function effectively. Orgs who slap in a chatbot and call it a day don't understand the assignment.

I think when the costs and latencies of reasoning models like o1-pro, o3 and o4-mini-high come down, chatbots are going to be much more effective for technical support. They're quite reliable and knowledgeable, in my experience.

The pivot to cloud had a decade warmup before HOWTO was normalized to existing standards.

In the lead up a lot of the same naysaying we see about AI was everywhere. AI can be compressed into less logic on a chip, bootstrap from models. Require less state management tooling software dev relies on now. We’re slowly being trained to accept a down turn in software jobs. No need to generate the code that makes up an electrical state when we can just tune hardware to the state from an abstract model deterministically. Energy based models are the futuuuuuure.

https://www.chipstrat.com/p/jensen-were-with-you-but-were-no...

Lot of the same naysaying about Dungeons and Dragons and comic books in the past too. Life carried on.

Functional illiterates fetishize semantics, come to view their special literacy as key to the future of humanity. Tale as old as time.