Comment by malfist
6 days ago
Today I had a dentist appointment and the dentist suggested I switch toothpaste lines to see if something else works for my sensitivity better.
I am predisposed to canker sores and if I use a toothpaste with SLS in it I'll get them. But a lot of the SLS free toothpastes are new age hippy stuff and is also fluoride free.
I went to chatgpt and asked it to suggest a toothpaste that was both SLS free and had fluoride. Pretty simple ask right?
It came back with two suggestions. It's top suggestion had SLS, it's backup suggestion lacked fluoride.
Yes, it is mind blowing the world we live in. Executives want to turn our code bases over to these tools
What model and query did you use? I used the prompt "find me a toothpaste that is both SLS free and has fluoride" and both GPT-4o [0] and o4-mini-high [1] gave me correct first answers. The 4o answer used the newish "show products inline" feature which made it easier to jump to each product and check it out (I am putting aside my fear this feature will end up kill their web product with monetization).
0 - https://chatgpt.com/share/683e3807-0bf8-800a-8bab-5089e4af51...
1 - https://chatgpt.com/share/683e3558-6738-800a-a8fb-3adc20b69d...
The problem is the same prompt will yield good results one time and bad results another. The "get better at prompting" is often just an excuse for AI hallucination. Better prompting can help but often it's totally fine, the tech is just not there yet.
While this is true, I have seen this happen enough times to confidently bet all my money that OP will not return and post a link to their incorrect ChatGPT response.
Seemingly basic asks that LLMs consistently get wrong have lots of value to people because they serve as good knowledge/functionality tests.
2 replies →
If you want a correct answer the first time around, and give up if you don't get it, even if you know the thing can give it to you with a bit more effort (but still less effort than searching yourself), don't you think that's a user problem?
25 replies →
You say it's successful, but in your second prompt is all kinds of wrong.
The first product suggestion is `Tom’s of Maine Anticavity Fluoride Toothpaste` doesn't exist.
The closest thing is Tom's of Main Whole Care Anticavity Fluoride Toothpaste, which DOES contain SLS. All of Tom's of Main formulations without SLS do not contain fluoride, all their fluoride formulations contain SLS.
The next product it suggests is "Hello Fluoride Toothpaste" again, not a real product. There is a company called "Hello" that makes toothpastes, but they don't have a product called "Hello fluoride Toothpaste" nor do the "e.g." items exist.
The third product is real and what I actually use today.
The fourth product is real, but it doesn't contain fluoride.
So, rife with made up products, and close matches don't fit the bill for the requirements.
This is the thing that gets me about LLM usage. They can be amazing revolutionary tech and yes they can also be nearly impossible to use right. The claim that they are going to replace this or that is hampered by the fact that there is very real skill required (at best) or just won't work most the time (at worst). Yes there are examples of amazing things, but the majority of things from the majority of users seems to be junk and the messaging designed around FUD and FOMO
Just like some people who wrote long sentences into Google in 2000 and complained it was a fad.
Meanwhile the rest of the world learned how to use it.
We have a choice. Ignore the tool or learn to use it.
(There was lots of dumb hype then, too; the sort of hype that skeptics latched on to to carry the burden of their argument that the whole thing was a fad.)
6 replies →
The AI skeptics are the ones who never develop the skill though, it's self-destructive.
6 replies →
Also, for this type of query, I always enable the "deep search" function of the LLM as it will invariably figure out the nuances of the query and do far more web searching to find good results.
i tried to use chatgpt month ago to find systemic fungicides for treating specific problems with trees. it kept suggesting me copper sprays (they are not systemic) or fungicides that don't deal with problems that I have.
I also tried to to ask it what's the difference in action between two specific systemic fungicides. it generated some irrelevant nonsense.
"Oh, you must not have used the LATEST/PAID version." or "added magic words like be sure to give me a correct answer." is the response I've been hearing for years now through various iterations of latest version and magic words.
1 reply →
I feel like AI skeptics always point to hallucinations as to why it will never work. Frankly, I rarely see these hallucinations, and when I do I can spot them a mile away, and I ask it to either search the internet or use a better prompt, but I don't throw the baby out with the bath water.
I see them in almost every question I ask, very often made up function names, missing operators or missed closure bindings. Then again it might be Elixir and lack of training data, I also have a decent bullshit detector for insane code generation output, it’s amazing how much better code you get almost every time by just following up with ”can you make this more simple and using common conventions”.
For reference I just typed "sls free toothpaste with fluoride" into a search engine and all the top results are good. They are SLS-free and do contain fluoride.
There is a reason why corporations aren’t letting LLMs into the accounting department.
Don’t bet on it. I’ve had to provide feedback on multiple proposals to use LLMs for generating ad-hoc financial reports in a fortune 50. The feedback was basically ‘this is guaranteed to make everyone cry, because this will produce bad numbers’ - and people seem to just not understand why.
That is not true. I know of many private equity companies that are using LLMs for a base level analysis, and a separate validation layer to catch hallucinations.
LLM tech is not replacing accountants, just as it is not replacing radiologists or software developers yet. But it is in every department.
That's not what the accounting department does.
2 replies →
This is false. My friend works in tax accounting and they’re using LLMs at his org.
This is where o3 shines for me. Since it does iterations of thinking/searching/analyzing and is instructed to provide citations, it really limits the hallucination effect.
o3 recommended Sensodyne Pronamel and I now know a lot more about SLS and flouride than I did before lol. From its findings:
"Unlike other toothpastes, Pronamel does not contain sodium lauryl sulfate (SLS), which is a common foaming agent. Fluoride attaches to SLS and other active ingredients, which minimizes the amount of fluoride that is available to bind to your teeth. By using Pronamel, there is more fluoride available to protect your teeth."
That is impressive, but it also looks likely to be misinformation. SLS isn't a chelator (as the quote appears to suggest). The concern is apparently that it might compete with NaF for sites to interact with the enamel. However, there is minimal research on the topic and what does exist (at least what I was quickly able to find via pubmed) appears preliminary at best. It also implicates all surfactants, not just SLS.
This diversion highlights one of the primary dangers of LLMs which is that it takes a lot longer to investigate potential bullshit than it does to spew it (particularly if the entity spewing it is a computer).
That said, I did learn something. Apparently it might be a good idea to prerinse with a calcium lactate solution prior to a NaF solution, and to verify that the NaF mouthwash is free of surfactants. But again, both of those points are preliminary research grade at best.
If you take anything away from this, I hope it's that you shouldn't trust any LLM output on technical topics that you haven't taken the time to manually verify in full.
Very interesting. It grabbed that from the marketing at ahttps://www.pronamel.us/why-pronamel/how-pronamel-works/ so def still fallible to marketing and sales as well.
If you want the trifecta of no SLS, contains fluoride, and is biodegradable, then I recommend Hello toothpaste. Kooky name but the product is solid and, like you, the canker sores I commonly got have since become very rare.
Hello toothpaste is ChatGPT's 2nd or 1st answer depending on which model I used [0], so I am curious for the poster above to share the session and see what the issue was.
There is known sensitivity (no pun intended ;) to wording of the prompt. I have also found if I am very quick and flippant it will totally miss my point and go off in the wrong direction entirely.
0 - https://news.ycombinator.com/item?id=44164633
If you've not found a toothpaste yet, see if UltraDex is available where you live.
consider a multivitamin (or least eating big varied salads regularly) - that seemed to get rid of my recurrent canker sores despite whatever toothpaste I use
fwiw, I use my kids toothpaste (kids crest) since I suspect most toothpastes are created equal and one less thing to worry about...
Try Biomin-F or Apagard. The latter is fluoride free. Both are among the best for sensitive teeth.
do you take lysine? total miracle supplement for those
What are you doing to get results this bad?
I tried this question three times and each time the first two products met both requirements.
Are you doing the classic thing of using the free version to complain about the competent version?
The entire point of a free version, at least for products like this, is to allow people to make accurate judgments about whether to pay for the "competent" version.
Well, in that case, the LLM company has made a mistake in marketing their product, but that's not the same as the question of whether the product works.
1 reply →
If the demo version of something is shitty, there's no reason to pay that company money.
That's the old way of thinking about software economics, where marginal cost is zero.
Marginal cost of LLMs is not zero.
I come from manufacturing and find this kind of attitude bizarre among some software professionals. In manufacturing we care about our tools and invest in quality. If the new guy bought a micrometer from Harbor Freight, found it wasn't accurate enough for sub-.001" work, ignored everyone who told him to use Mitutoyo, and then declared that micrometers "don't work," he would not continue to have employment.
1 reply →
"An LLM is bad at this specific example so it is bad at everything"
cool story
“an LLM made a mistake once, that’s why I don’t use it to code” is exactly the kind of irrelevant FUD that TFA is railing against.
Anyone not learning to use these tools well (and cope with and work around their limitations) is going to be left in the dust in months, perhaps weeks. It’s insane how much utility they have.
Once? Lol.
I present a simple problem with well defined parameters that LLMs can use to search product ingredient lists (that are standardized). This is the type of problems LLMs are supposed to be good at and it failed in every possible way.
If you hired master woodworker and he didn't know what wood was, you'd hardly trust him with hard things, much less simple ones
You haven’t shared the chat where you claim the model gave you incorrect answers, whilst others have stated that your query returned correct results. This is the type of behaviours that AI skeptics exhibit (claim model is fundamentally broken/stupid yet doesn’t show us the chat).
They won't. The speed at which these models evolve is a double-edged sword: they give you value quickly... but any experience you gain dealing with them also becomes obsolete quickly. One year of experience using agents won't be more valuable than one week of experience using them. No one's going to be left in the dust because no one is more than a few weeks away from catching up.
Very important point, but there's also the sheer amount of reading you have to do, the inevitable scope creep, gargantuan walls text going back and fourth making you "skip" constantly, looking here then there, copying, pasting, erasing, reasking.
Literally the opposite of focus, flow, seeing the big picture.
At least for me to some degree. There's value there as i'm already using these tools everyday but it also seems like a tradeoff i'm not really sure how valuable is yet. Especially with competition upping the noise too.
I feel SO unfocused with these tools and i hate it, it's stressful and feels less "grounded", "tactile" and enjoyable.
I've found myself in a new weird workflowloop a few times with these tools mindlessly iterating on some stupid error the LLM keeps not fixing, while my mind simply refuses to just fix it myself way faster with a little more effort and that's a honestly a bit frightening.
1 reply →
Surely if these tools were so magical, anyone could just pick them up and get out of the dust? If anything, they're probably better off cause they haven't wasted all the time, effort and money in the earlier, useless days and instead used it in the hypothetical future magic days.
> Surely if these tools were so magical
The article is not claiming they are magical, the article is claiming that they are useful.
> > but it’ll never be AGI
> I don’t give a shit.
> Smart practitioners get wound up by the AI/VC hype cycle. I can’t blame them. But it’s not an argument. Things either work or they don’t, no matter what Jensen Huang has to say about it.
I see this FOMO "left in the dust" sentiment a lot, and I don't get it. You know it doesn't take long to learn how to use these tools, right?
it actually does if you want to do serious work.
hence these types of post generate hundreds of comments “I gave it a shot, it stinks”
3 replies →
Looking forward to seeing you live up to your hyperbole in a few weeks, the singularity is near!
Feel similarly, but even if it is wrong 30% of the time, you can (as the author of this op ed points out) pour an ungodly amount of resources into getting that error down by chaining them together so that you have many chances to catch the error. And as long as that only destroys the environment and doesn’t cost more than a junior dev, then they’re going to trust their codebases with it yes, it’s the competitive thing to do, and we all know competition produces the best outcome for everyone… right?
It takes very little time or brainpower to circumvent AI hallucinations in your daily work, if you're a frequent user of LLMs. This is especially true of coding using an app like Cursor, where you can @-tag files and even URLs to manage context.
> it’s the competitive thing to do
I'm expecting there should be at least some senior executive that realize how incredible destructive this is to their products.
But I guess time will tell.
Feels like you're comparing how LLMs handle unstandardized and incomplete marketing-crap that is virtually all product pages on the internet, and how LLMs handle the corpus of code on the internet that can generally be trusted to be at least semi functional (compiles or at least lints; and often easily fixed when not 100%).
Two very different combinations it seems to me...
If the former combination was working, we'd be using chatgpt to fill our amazon carts by now. We'd probably be sanity checking the contents, but expecting pretty good initial results. That's where the suitability of AI for lots of coding-type work feels like it's at.
Product ingredient lists are mandated by law and follow a standard. Hard to imagine a better codified NLP problem
I hadn't considered that, admittedly. It seems like that would make the information highly likely to be present...
I've admittedly got an absence of anecdata of my own here, though: I don't go buying things with ingredient lists online much. I was pleasantly surprised to see a very readable list when I checked a toothpaste page on amazon just.
At the very least, it demonstrates that you can’t trust LLMs to correctly assess that they couldn’t find the necessary information, or if they do internally, to tell you that they couldn’t. The analogous gaps of awareness and acknowledgment likely apply to their reasoning about code.