Comment by Fraterkes
2 days ago
I think the dichotomy you see with how positive people are about ai has almost entirely to do with the kind of questions they ask.
That seems obvious, but a consequence of that is that people who are sceptical of ai (like me) only use it when they've exhausted other resources (like google). You ask very specific questions where not a lot of documentation is available and inevetably even o3 ends up being pretty useless.
Conversely there's people who love ai and use it for everything, and since the majority of the stuff they ask about is fairly simple and well documented (eg "Write me some typescript"), they rarely have a negative experience.
I think there are also other aspects:
- Some people simply ask a lot more questions than others (this ignores whether they like or dislike AI), i.e. some people simply prefer to find things out more by themselves, and thus also use other resources like Google or Stack Overflow as a last resort. So their questions to an AI will likely be more complicated, because they already found out the easy parts by themselves.
- If I have to make the effort to explain to the AI in a sufficiently exhaustive way what I need (which I often have to do), I expect the answers of the AI to be really good. If it isn't, having explained my problem to the AI was simply a waste of time.
> I expect the answers of the AI to be really good. If it isn't, having explained my problem to the AI was simply a waste of time.
I find the worst part to be when it doesn't correct flaws in my assumptions.
For example, yesterday I asked it "what is the difference between these two Datadog queries"? And it replied something that was semi-correct, but it didn't discover the fundamental flaw - that the first one wasn't a valid query because of unbalanced parens. In fact, it turns out that the two strings (+ another one) would get concatenated and only then would it be a valid query.
A simple "the first string is not a valid query because of a missing closing paren" would have saved a lot of time in trying to understand this, and I suspect that's what I would have received if I had prompted it with "what's the problem with this query" but LLMs are just too sycophantic to help with these things.
I have found that o3, specifically, will tell me relevant information that I didn't ask for.
But most other models don't.
I do have a custom instruction in place to ask if I'm aware of concepts related to my question - perhaps in coming up with these, it notices when something relevant hasn't been mentioned.
I don't think that dichotomy is true at all, at least not with experienced software people.
Many folks I know are skeptical of the hype, or maybe full on anti/distrustful, due to reasons I think are valid. But many of those same people have tried llm tools, maybe chatgpt or copilot or cursor, and recognize the value even w/ huge misgivings. Some of have gone further with tools like claude code and seen the real potential there, quite a step beyond fancy auto-complete or just-in-time agents...but even there you can end up in rabbit-holes and drowning in horrible design.
In your incredibly reductive scale, I'm closer to 'love' than 'skeptical', but I'm often much of both sides. But I'd never write a prompt like 'write me some typescript' for any real work, or honestly anything close to that, unless its just for memes or demonstrations.
But no-one who programs for a living uses prompts like that, at least not for real work. That is just silly talk.
I obviously don't mean that people literally write "write me some typescript", because nobody wants code that does something arbitrary. I'm also not saying that every reaction to ai falls between love and skeptical: I wrote a 3 sentence comment on a complex topic to sketch out an idea.
The tone of your comment suggests that my comment upset you, which wasn't my intent. But you have to try to be a little generous when you read other peoples stuff, or these discussion will get very tedious quickly.
Ironically, his comment does not come off at all as offended, but yours does.
1 reply →
I use it very routinely to generate tikz diagrams. It is obviously wrong and I need to manually tweak a little bit. But the hardest part is often to get something working at first, and in this AI is first class. It gets me 90% there, and rest is me.
I think you touched on an important aspect, but did not explore it further.
If we accept that AI is a tool, then then problem is the nature of the tool as it will vary heavily from individual to individual. This partially accounts for the ridiculous differences from self reported accounts of people, who use it on a regular basis.
And then, there is a possibility that my questions are not that unusual and/or are well documented ( quite possible ) so my perception of the usefulness of those answers is skewed.
My recent interaction with o4 was pretty decent on a very new ( by industry standards ) development and while documentation for it exists, it is a swirling vortex of insanity from where I sit. I was actually amazed to see how easily 4o saw some of those discrepancies and listed those to me along with likely pitfalls that may come with it. We will be able to find if that prediction holds v.soon.
What I am saying is that it has its uses.
The thing about tools is that they need to be predictable. I can't remember the source, but it's a concept I read that really stuck with me. A predictable tool can be used skillfully and accurately because the user can anticipate how it works and deploy it effectively. It will always be aligned with the user intent because the user decides how and when it is used.
A tool that constantly adapts to how it is used will frequently be misaligned with user intent. Language models constantly change their own behavior based on the specific phrasing you gave it, the context you deployed it in, and the inherent randomness in token generation. Its capacity to be used as a tool will be inherently limited by this unpredictability.
Sounds to me like you're not describing its limited usefulness in general, you're describing its limited usefulness when as considered a tool.
That might seem like a distinction without a point but people are not tools and people are useful. If someone argues people are tools then they don't fit your definition any more than an LLM does.
LLMs are as useful as people I would say, and as unpredictable. I run a small consulting company and we develop custom software for businesses, government, whomever. I often have to interface with other teams and other people. To get the most out of working with people you have to know how to talk with them.
Also, I noticed you didn't say anything negative about LLMs :) You just compared it w/ tooling. So I'm curious, do you think outside of the unsuitability of consideration as a tool, do you find LLMs useful?
1 reply →
Well, I use it before google, since it in general summarizes webpages and removes the ads. Quite handy. It’s also very useful to check if you understand something correctly. And for programming specifically I found it really useful to help naming stuff (which tends to be hard not in the least place because it’s subjective).
> You ask very specific questions where not a lot of documentation is available and inevetably even o3 ends up being pretty useless.
You have any example questions where o3 failed to be helpful?
I use it pretty similarly to you, only resorting to it to unblock myself basically, otherwise I'm mostly the one doing the actual work, LLMs help with specific functions or specific blockers, or exploring new "spaces". But almost all the times I've gotten stuck, o3 (and o3-pro mode) managed to unstuck me, once I've figured out the right way to ask the question, even when my own searching and reading didn't help.
I am personally somewhere in-between these two places. I've used ChatGPT to get unstuck a few times this past week because I was at the end of my rope with regards to some GPU crashes that I couldn't make heads or tails of. I then used it for less headache-inducing things and overall it's been an interesting experience.
For research I'm enjoying asking ChatGPT to annotate its responses with sources and reading those; in some cases I've found SIGGRAPH papers that I wouldn't have stumbled upon otherwise, and it's nice to get them all in a response.
ChatGPT (4o, if it's of any interest) is very knowledgeable about DirectX12 (which we switched to just this week) and I've gained tons of peripheral knowledge with regards to the things I've been battling with, but only one out of four times has it been able to actually diagnose directly what the issue was; three separate times it's been something it didn't really bring up or note in any meaningful regard. What helped was really just me writing about it, thinking about everything around it and for that it's been very helpful.
Realistically, if someone let an agent running on this stuff loose on our code base it would likely end up wasting days of time and still not fix the issue. Even worse, the results would have to be tested on a specific GPU to even trigger the issue to begin with.
It seems to me that fancy auto-complete is likely the best this would be able to do still, and I actually like it for that. I don't use LLM-assisted auto-complete anymore, but I used to use GitHub Copilot back in 2022 and it was more productive than my brief tests of agents.
If I were to regularly use LLMs for actual programmit it would most likely be just for tab-completion of "rest of expressions" or 1 line at a time, but probably with local LLMs.
I had to create a Cython module wrapping some C, used Claude 4 and GPT 4.1, they were worse than useless. One can imagine why I needed help with that project.
It's kind of true. I only use it for simple stuff that I don't have time for. For example, how to write a simple diagram in tikz. The Ai does the simple and busywork of providing a good enough approximation which I can tweak and get what I want.
For hard questions, I prefer to use my own skills, because AI often regurgitates what I'm already aware. I still ask AI in the off-chance it comes up with something cool, but most often, I have to do it myself.
I find that in the latter case its at least a serviceable rubber duck.