Comment by aydyn
5 days ago
Learn to work on interesting problems? If the problem you are working on is novel and hard, the AI will stumble.
Generalizing your experience to everyone else's betrays a lack of imagination.
5 days ago
Learn to work on interesting problems? If the problem you are working on is novel and hard, the AI will stumble.
Generalizing your experience to everyone else's betrays a lack of imagination.
> Generalizing your experience to everyone else's betrays a lack of imagination.
One guy is generalizing from "they don't work for me" to "they don't work for anyone."
The other one is saying "they do work for me, therefore they do work for some people."
Note that the second of these is a logically valid generalization. Note also that it agrees with folks such as Tim Gowers, who work on novel and hard problems.
No, that's decidedly not what is happening here.
One is saying "I've seen an LLM spectacularly fail at basic reasoning enough times to know that LLMs don't have a general ability to think" (but they can sometimes reproduce the appearance of doing so).
The other is trying to generalize "I've seen LLMs produce convincing thought processes therefore LLMs have the general ability to think" (and not just occasionally reproduce the appearance of doing so).
And indeed, only one of these is a valid generalization.
When we say "think" in this context, do we just mean generalize? LLMs clearly generalize (you can give one a problem that is not exactly in it's training data and it can solve it), but perhaps not to the extent a human can. But then we're talking about degrees. If it was able to generalize at a higher level of abstraction maybe more people would regard it as "thinking".
4 replies →
s/LLM/human/
2 replies →
This is my experience. For rote generation, it's great, saves me from typing out the same boilerplate unit test bootstrap, or refactoring something that exists, etc.
Any time I try to get a novel insight, it flails wildly, and nothing of value comes out. And yes, I am prompting incrementally and building up slowly.
[flagged]
We've banned this account for repeated abusive comments to fellow community members. Normally we give warnings, but when it's as extreme and repetitive as we can see here, an instant ban is appropriate. If you don't want to be banned, you can email us at hn@ycombinator.com and demonstrate a sincere commitment to use HN as intended in future.
Even people who do actual hard work need a lot of ordinary scaffolding done for them.
A secretary who works for an inventor is still thinking.
Research mathematicians have been finding the tools useful [1][2]. I think those problems are interesting, novel, and hard. The AI might stumble sometimes, but it also produces meaningful, quality results sometimes. For experts working on interesting problems, that is enough to be useful.
[1] https://mathstodon.xyz/@tao/115420236285085121 [2] https://xcancel.com/wtgowers/status/1984340182351634571
That's a motte and bailey fallacy. Nobody said that they aren't useful, the argument is that they can't reason [1]. The world is full of useful tools that can't reason or think in any capacity.
[1] That does not mean that they can never produce texts which describes a valid reasoning process, it means that they can't do so reliably. Sometimes their output can be genius and other times you're left questioning if they even have the reasoning skills of a 1st grader.
I don't agree that LLMs can't reason reliably. If you give them a simple reasoning question, they can generally make a decent attempt at coming up with a solution. Complete howlers are rare from cutting-edge models. (If you disagree, give an example!)
Humans sometimes make mistakes in reasoning, too; sometimes they come up with conclusions that leave me completely bewildered (like somehow reasoning that the Earth is flat).
I think we can all agree that humans are significantly better and more consistently good at reasoning than even the best LLM models, but the argument that LLMs cannot reliably reason doesn't seem to match the evidence.
Even most humans will stumble on hard problems, that is the reason they are hard in the first place
I'm genuinely curious what you work on that is so "novel" that an LLM doesn't work well on?
I feel like so little is TRUELY novel. Almost everything is built on older concepts and to some degree expertise can be applied or repurposed.
Anything relatively new in a technology LLMs struggle with, especially if the documentation is lacking.
Godot for example in ChatGPT.
It may no longer still be the case, but the documentation for GoDot was lacking and often samples written by others didn't have a version number associated with it. So samples it would suggest would never work, and even when you told it the version number it failed to generate workable code.
The other stuff I've noticed is custom systems. One I work with is a variation of Java, but LLMs were treating it as javascript. I had to create a LoRA just to get the model from not trying to write javascript answer. Even then it could never work, because it had never been trained on real world examples.
It doesn't have to be very novel at all. Anything but the most basic TODO-list app.
Literally anything in the science domain. Adding features to your software app is indeed usually not novel.
That's where the bar is now?
1 reply →
Dude. We don't all work for NASA. Most day to day problems aren't novel. Most jobs aren't novel. Most jobs can't keep a variety of sometimes useful experts on hand. I do my job and I go home and do my hobbies. Anything I can use at work to keep friction down and productivity up is extremely valuable.
Example prompt (paraphrasing and dumbed down, but not a ton): Some users across the country can't get to some fileshares. I know networking, but I'm not on the networking team so I don't have full access to switch, router, and firewall logs/configurations. It looks kind of random, but there must be a root cause, let's find it.
I can't use Python(security team says so) and I don't have access to a Linux box that's joined to the domain and has access the shares.
We are on a Windows domain controller. Write me a PowerShell 5.1 compatible script to be run remotely on devices. Use AD Sites and Services to find groups of random workstations and users at each office and tries to connect to all shares at each other site. Show me progress in the terminal and output an Excel file and Dot file that clearly illustrates successful and failed connections.
---
And it works. Ok, I can see the issue is from certain sites that use x AND y VPN ipsec tunnels to get to particular cloud resources. I give this info to networking and they fix it right away. Problem resolved in less than an hour.
First of all, a couple years ago, I wouldn't have been able to justify writing something like this while an outage is occuring. Could I do it myself? Sure, but I'm going to have to look up the specifics of syntax and certain commands and modules. I don't write PowerShell for a living or fun, but I do need to use it. I am familiar and know how to write it. But I sure as fuck couldn't sit down and spend an hour or two screwing around working on building a goddamn Dot file generator. Yes, years ago I had a whole pile of little utility modules I could use. But that's a far cry from what I can do now to fit the exact situation < 15 minutes while I do other things like pick up the phone, message coworkers, etc.
Secondly, rather than building little custom tools to hook together as I need, I can just ask for the whole thing. I don't need to save any of that stuff anymore and re-figure out what CheckADFSConns(v2).PS1 that I wrote 8 months ago does and how to use it. "Oh, that's not the one, what the did I name that? Where did I put it?"
I work in an environment that is decades old, the company is over 100 years old, I didn't build any of it myself, is not a tech company, and has tons of tech debt and weird shit. AI is insanely useful. For any given problem, there are dozens of different rabbit holes I could go down because of decades of complete system overhaul changes. Today, I can toss a variety of logs at AI and if nothing else, get a sense of direction of why a handful of PCs are rejecting some web certificates. (Combination of a new security policy and their times mismatching the domain controller, because it was new, and NTP wasn't configured properly. I wasn't even looking for timestamps, but it noticed event offsets and pointed it out).
I feel like this community isn't very familiar with what that's like. We aren't all working on self driving cars or whatever seems hard at a brand new company with new everything and no budget. Some of us need to keep the systems running that help people to make actual things. These environments are far from pristine and are held together by underpaid and underappreciated normies through sheer willpower.
Is this kind of work breaking technical frontiers? No. But it's complicated, difficult, and unpredictable. Is it novel? The problems are, sometimes.
Generalizing your experience to everyone else's betrays your lack of self-awareness, sir.