← Back to context

Comment by johnfn

2 years ago

I noticed that the author uses ChatGPT3.5 rather than 4, which is a rather large difference. I don't have the knowledge to rerank all questions the author asked, but I will say that a test of ChatGPT 4 leads me directly to youtube-dl, which is better than every other search engine listed.

That was the first thing I checked reading the article. Although the argument would be 3.5 is free - any comparison of systems against ChatGPT that isn't using ChatGPT 4 can be dismissed almost out of hand; there is not much point talking about ChatGPT if it's not using ChatGPT 4 and making proper use of its capabilities.

That is not to say that there aren't valid criticisms of and shortcomings in ChatGPT 4 - just that it's not useful to say ChatGPT when it's referring to 3.5

  • This is silly, most people aren't going to pay for ChatGPT, just like they won't pay for Google or DDG. So using 3.5 in this case is perfectly acceptable when we're talking about free software.

  • >any comparison of systems against ChatGPT that isn't using ChatGPT 4 can be dismissed almost out of hand

    Does everyone or even most use ChatGPT 4? The most used version is -of course- by far the most relevant.

    • ChatGPT 3.5 was great, until 4 came out and now it is garbage in comparison.

      But I suppose what I really want is for everyone who includes ChatGPT in comparisons to explicitly say which version they are using (and, if they are using 3.5 in their comparison I hope they at least try 4 first) and definitely not just say "ChatGPT" when they only mean 3.5. The difference really is that stark.

  • He gives the full queries - do you have chat 4.0 that you ran run it against?

    • Sure. Bear in mind I have custom instructions active - which, if you want to make full and proper use of ChatGPT, you should configure, along with customised GPTs - so I get lots of dot-point descriptions, because that's what I've asked for.

      Also I would not normally write ChatGPT queries the same as I write them for search engines but for the sake of comparison, I'll use their queries verbatim except where my custom instructions affect the context too much.

      > download youtube videos

      https://chat.openai.com/share/3e18e4f0-5527-4479-8a2f-ef17bd...

      I got - good results. They got - "Very bad results (fails to return any kind of useful result) ChatGPT: basically refuses to answer the question, although you can probably prompt engineer your way to an answer if you don't just naively ask the question you want answered".

      > [What] ad blocker [can I use?]

      https://chat.openai.com/share/e1985d7a-c89f-4b5e-bb59-70bd11...

      Looks good to me

      > download firefox

      https://chat.openai.com/share/3a62e5ae-8dbd-4179-8eb0-cc38ee...

      Also good

      > Why do wider tires have better grip?

      > [Provide links to scientific sites that describe] why wider tires have better grip?

      https://chat.openai.com/share/8cbcd1dc-b23f-41f3-83ad-f43f3d...

      Honestly, I have no idea if this is a good answer or not. But I don't use ChatGPT for answers that I don't have confidence that I can determine its veracity; if I needed to know this with certainty, I'd use ChatGPT as a jumping off point for my own research.

      > Why do they keep making cpu transistors smaller?

      > [Provide links to scientific sites that describe] why do they keep making cpu transistors smaller?

      https://chat.openai.com/share/dbb97ac0-840c-402c-a917-657af6...

      > vancouver snow forecast winter 2023

      > Environment Canada winter 2023

      https://chat.openai.com/share/aab017d7-f86b-49c9-b5c0-86a0b1...

      I don't know if almanac.com is any good but giving it the specific "Environment Canada winter 2023" query gave the expected very good result.

      I think ChatGPT 4 generally provided very good results for the test queries, if you tailor the queries just slightly for the format

> I will say that a test of ChatGPT 4 leads me directly to youtube-dl

And yet to other people it starts rambling about how that’s wrong and you shouldn’t do it and doesn’t give a usable answer.

https://news.ycombinator.com/item?id=38822040

I boggles the mind the extent to which people salivate over a system that cannot decide between a correct straight answer, something wrong but plausible, something wrong and impossible, or outright refusing to answer.

  • That's GPT 3.5. It sounds like you have a bit of an axe to grind with ChatGPT, but if you're going to do so, do try to grind it on the correct version.

    • The comment says it’s v4. Since there’s no information on the page either way (funny, considering the original complaint), I took them at their word. If you don’t believe them, that’s up to you.

      For what it’s worth, I do have access to v4 and it did give me an answer right now. But since I also know even v4 can give you wildly different answers to the same question even if you ask them one right after another, that doesn’t prove it either way.

I’ve come to recognize that any article that uses 3.5 has an agenda.

  • I also suspect as much, but obviously can't know for sure. IMHO it's intellectually lazy if not dishonest to benchmark against 3.5 and not make that fact clearly known upfront

    A better benchmark would have had two entries for ChatGPT, showing both 3.5 and 4 results

  • The agenda of not wanting to pay for something just to test it out when there is a free version?

    • The agenda of using the significantly shitty version to try to paint it in a poor light.