← Back to context

Comment by Game_Ender

6 days ago

What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).

0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...

1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...

The problem is the same prompt will yield good results one time and bad results another. The "get better at prompting" is often just an excuse for AI hallucination. Better prompting can help but often it's totally fine, the tech is just not there yet.

  • While this is true, I have seen this happen enough times to confidently bet all my money that OP will not return and post a link to their incorrect ChatGPT response.

    Seemingly basic asks that LLMs consistently get wrong have lots of value to people because they serve as good knowledge/functionality tests.

  • If you want a correct answer the first time around, and give up if you don't get it, even if you know the thing can give it to you with a bit more effort (but still less effort than searching yourself), don't you think that's a user problem?

    • I am unconvinced that searching for this yourself is actually more effort than repeatedly asking the Mighty Oracle of Wrongness and cross-checking its utterances.

You say it's successful, but in your second prompt is all kinds of wrong.

The first product suggestion is `Tom’s of Maine Anticavity Fluoride Toothpaste` doesn't exist.

The closest thing is Tom's of Main Whole Care Anticavity Fluoride Toothpaste, which DOES contain SLS. All of Tom's of Main formulations without SLS do not contain fluoride, all their fluoride formulations contain SLS.

The next product it suggests is "Hello Fluoride Toothpaste" again, not a real product. There is a company called "Hello" that makes toothpastes, but they don't have a product called "Hello fluoride Toothpaste" nor do the "e.g." items exist.

The third product is real and what I actually use today.

The fourth product is real, but it doesn't contain fluoride.

So, rife with made up products, and close matches don't fit the bill for the requirements.

This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things from the majority of users seems to be junk and the messaging designed around FUD and FOMO

  • Just like some people who wrote long sentences into Google in 2000 and complained it was a fad.

    Meanwhile the rest of the world learned how to use it.

    We have a choice. Ignore the tool or learn to use it.

    (There was lots of dumb hype then, too; the sort of hype that skeptics latched on to to carry the burden of their argument that the whole thing was a fad.)

    • Arguably, the people who typed long sentences into Google have won; the people who learned how to use it early on with specific keywords now get meaningless results.

      3 replies →

    • > Meanwhile the rest of the world learned how to use it.

      Very few people "learned how to use" Google, and in fact - many still use it rather ineffectively. This is not the same paradigm shift.

      "Learning" ChatGPT is not a technology most will learn how to use effectively. Just like Google they will ask it to find them an answer. But the world of LLMs is far broader with more implications. I don't find the comparison of search and LLM at an equal weight in terms of consequences.

      The TL;DR of this is ultimately: understanding how to use an LLM, at it's most basic level, will not put you in the drivers seat in exactly the same way that knowing about Google also didn't really change anything for anyone (unless you were an ad executive years later). And in a world of Google or no-Google, hindsight would leave me asking for a no-Google world. What will we say about LLMs?

      1 reply →

  • The AI skeptics are the ones who never develop the skill though, it's self-destructive.

    • People treat this as some kind of all or nothing. I _do_ us LLM/AI all the time for development, but the agentic "fire and forget" model doesn't help much.

      I will circle back every so often. It's not a horrible experience for greenfield work. A sort of "Start a boilerplate project that does X, but stop short of implementing A B or C". It's an assistant, then I take the work from there to make sure I know what's being built. Fine!

      A combo of using web ui / cli for asking layout and doc questions + in-ide tab-complete is still better for me. The fabled 10x dev-as-ai-manager just doesn't work well yet. The responses to this complaint are usually to label one a heretic or Luddite and do the modern day workplace equivalent of "git gud", which helps absolutely nobody, and ignores that I am already quite competent at using AI for my own needs.

Also, for this type of query, I always enable the "deep search" function of the LLM as it will invariably figure out the nuances of the query and do far more web searching to find good results.

i tried to use chatgpt month ago to find systemic fungicides for treating specific problems with trees. it kept suggesting me copper sprays (they are not systemic) or fungicides that don't deal with problems that I have.

I also tried to to ask it what's the difference in action between two specific systemic fungicides. it generated some irrelevant nonsense.

  • "Oh, you must not have used the LATEST/PAID version." or "added magic words like be sure to give me a correct answer." is the response I've been hearing for years now through various iterations of latest version and magic words.

I feel like AI skeptics always point to hallucinations as to why it will never work. Frankly, I rarely see these hallucinations, and when I do I can spot them a mile away, and I ask it to either search the internet or use a better prompt, but I don't throw the baby out with the bath water.

  • I see them in almost every question I ask, very often made up function names, missing operators or missed closure bindings. Then again it might be Elixir and lack of training data, I also have a decent bullshit detector for insane code generation output, it’s amazing how much better code you get almost every time by just following up with ”can you make this more simple and using common conventions”.