Comment by geon

5 days ago

Having seen LLMs so many times produce incoherent, nonsensical and invalid chains of reasoning...

LLMs are little more than RNGs. They are the tea leaves and you read whatever you want into them.

They are clearly getting to useful and meaningful results with at a rate significantly better than chance (for example, the fact that ChatGPT can play chess well even though it sometimes tries to make illegal moves shows that there is a lot more happening there than just picking moves uniformly at random). Demanding perfection here seems to be odd given that humans also can make bizarre errors in reasoning (of course, generally at a lower rate and in a distribution of kinds of errors we are more used to dealing with).

  • The fact that a model trained on the internet, on which the correct rules of chess are written, is unable to determine what is and is not a legal move, seems like a sign that these models are not reasoning about the questions asked of them. They are just giving responses that look like (and often are) correct chess moves.

    • It's a sign that they are 'reasoning' imperfectly. If they were just giving responses that 'looked like' chess moves, they would be very bad at playing chess.

      (And I would hazard a guess that they are a primarily learning chess from the many games that are posted, as opposed to working things out from the rules. Indeed, if you make up a game and tell chatGPT the rules, it tends to be even worse at following them, let alone figuring out optimal play. But again, it will do so significantly better than random chance, so it's doing something with the information you give it, even if it's not doing so very well. I think it's reasonable to call this thinking, or reasoning, but this mostly becomes an argument of semantics. either way they do it significantly better than random chance but still not tremendously well. If your expectation is that they cannot work with anything novel then you're going to be continually surprised, but if your expectation is that they're as good as a human that has 'learned' from all the material its been given, especially material that's in-context and not in the training data, then you're also going to be disappointed.)

Ridiculous. I use it daily and get meaningful, quality results. Learn to use the tools.

  • > Learn to use the tools.

    Thing is, you wouldn't need to learn to use the tools if the tool was able to think. A thinking entity is able to adapt to other parties who lack learnings. This confirms that LLMs are little more than fancy RNGs.

    > I use it daily and get meaningful, quality results.

    That's what the tea leaf readers say too, funnily enough.

  • Learn to work on interesting problems? If the problem you are working on is novel and hard, the AI will stumble.

    Generalizing your experience to everyone else's betrays a lack of imagination.

    • > Generalizing your experience to everyone else's betrays a lack of imagination.

      One guy is generalizing from "they don't work for me" to "they don't work for anyone."

      The other one is saying "they do work for me, therefore they do work for some people."

      Note that the second of these is a logically valid generalization. Note also that it agrees with folks such as Tim Gowers, who work on novel and hard problems.

      9 replies →

    • This is my experience. For rote generation, it's great, saves me from typing out the same boilerplate unit test bootstrap, or refactoring something that exists, etc.

      Any time I try to get a novel insight, it flails wildly, and nothing of value comes out. And yes, I am prompting incrementally and building up slowly.

      2 replies →

    • Even people who do actual hard work need a lot of ordinary scaffolding done for them.

      A secretary who works for an inventor is still thinking.

      1 reply →

    • Even most humans will stumble on hard problems, that is the reason they are hard in the first place

    • I'm genuinely curious what you work on that is so "novel" that an LLM doesn't work well on?

      I feel like so little is TRUELY novel. Almost everything is built on older concepts and to some degree expertise can be applied or repurposed.

      5 replies →

    • Dude. We don't all work for NASA. Most day to day problems aren't novel. Most jobs aren't novel. Most jobs can't keep a variety of sometimes useful experts on hand. I do my job and I go home and do my hobbies. Anything I can use at work to keep friction down and productivity up is extremely valuable.

      Example prompt (paraphrasing and dumbed down, but not a ton): Some users across the country can't get to some fileshares. I know networking, but I'm not on the networking team so I don't have full access to switch, router, and firewall logs/configurations. It looks kind of random, but there must be a root cause, let's find it.

      I can't use Python(security team says so) and I don't have access to a Linux box that's joined to the domain and has access the shares.

      We are on a Windows domain controller. Write me a PowerShell 5.1 compatible script to be run remotely on devices. Use AD Sites and Services to find groups of random workstations and users at each office and tries to connect to all shares at each other site. Show me progress in the terminal and output an Excel file and Dot file that clearly illustrates successful and failed connections.

      ---

      And it works. Ok, I can see the issue is from certain sites that use x AND y VPN ipsec tunnels to get to particular cloud resources. I give this info to networking and they fix it right away. Problem resolved in less than an hour.

      First of all, a couple years ago, I wouldn't have been able to justify writing something like this while an outage is occuring. Could I do it myself? Sure, but I'm going to have to look up the specifics of syntax and certain commands and modules. I don't write PowerShell for a living or fun, but I do need to use it. I am familiar and know how to write it. But I sure as fuck couldn't sit down and spend an hour or two screwing around working on building a goddamn Dot file generator. Yes, years ago I had a whole pile of little utility modules I could use. But that's a far cry from what I can do now to fit the exact situation < 15 minutes while I do other things like pick up the phone, message coworkers, etc.

      Secondly, rather than building little custom tools to hook together as I need, I can just ask for the whole thing. I don't need to save any of that stuff anymore and re-figure out what CheckADFSConns(v2).PS1 that I wrote 8 months ago does and how to use it. "Oh, that's not the one, what the did I name that? Where did I put it?"

      I work in an environment that is decades old, the company is over 100 years old, I didn't build any of it myself, is not a tech company, and has tons of tech debt and weird shit. AI is insanely useful. For any given problem, there are dozens of different rabbit holes I could go down because of decades of complete system overhaul changes. Today, I can toss a variety of logs at AI and if nothing else, get a sense of direction of why a handful of PCs are rejecting some web certificates. (Combination of a new security policy and their times mismatching the domain controller, because it was new, and NTP wasn't configured properly. I wasn't even looking for timestamps, but it noticed event offsets and pointed it out).

      I feel like this community isn't very familiar with what that's like. We aren't all working on self driving cars or whatever seems hard at a brand new company with new everything and no budget. Some of us need to keep the systems running that help people to make actual things. These environments are far from pristine and are held together by underpaid and underappreciated normies through sheer willpower.

      Is this kind of work breaking technical frontiers? No. But it's complicated, difficult, and unpredictable. Is it novel? The problems are, sometimes.

      Generalizing your experience to everyone else's betrays your lack of self-awareness, sir.

  • They are only meaningful and quality if you don’t know what you’re doing. But please do show some of this meaningful and quality work so I can be proven wrong.

    • Yes, please this is literally what I want to see. I have yet to see an example where an LLM did anything that was sufficiently difficult. Not saying they can't be useful, but for anything past the basics they are really all over the place. And if we were paying anywhere near the true costs it wouldn't be even worth trying.

      2 replies →

    • See my comment to parent. One example of many. You can say "Oh, well, it just sounds like your company needs better tools and processes, you don't really need AI for any of that. You should just invest in a tool for this and monitor that and have managment prioritize..."

      Yeah, I know, yet here we are and it saves me boatloads of time.

      3 replies →