Comment by CamperBob2

8 months ago

Having read the reply in 2.5 Pro, I have to agree with you there. I'm surprised it whiffed on those details. They are fairly basic and rather important. It could have provided a better answer (I fed your reply back to it at https://g.co/gemini/share/7f87b5e9d699 ), but it did a crappy job deciding what to include in its initial response.

I don't agree that you can pick one cherry example and use it to illustrate anything general about the progress of the models in general, though. There are far too many counterexamples to enumerate.

(Actually I suspect what will happen is that we'll change the way we write documentation to make it easy for LLMs to assimilate. I know I'm already doing that myself.)

5 comments

CamperBob2

alganet 8 months ago

> I don't agree that you can pick one cherry example

Benchmarks and evaluations are made of cherry picked examples. What makes my example invalid, and benchmark prompts valid? (it's a rethorical question, you don't need to answer).

> write documentation to make it easy for LLMs to assimilate.

If we ever do that, it means LLMs failed at their job. They are supposed to help and understand us, not the other way around.

CamperBob2 8 months ago
If we ever do that, it means LLMs failed at their job. They are supposed to help and understand us, not the other way around.
If you buy into the whole AGI thing, I guess so, but I don't. We don't have a good definition of intelligence, so it's a meaningless question.
We do know how to make and use tools, though. And we know that all tools, especially the most powerful and/or hazardous ones, reward the work and care that we put into using them. Further, we know that tool use is a skill, and that some people are much better at it than others.
What makes my example invalid, and benchmark prompts valid?
Your example is a valid case of something that doesn't work perfectly. We didn't exactly need to invent AI to come up with something that didn't work perfectly. I have examples of using LLMs to generate working, useful code in advanced, specialized disciplines, code that I frankly don't understand myself and couldn't have written without months of study, but that I can validate.
Just one of those examples is worth a thousand examples like yours, in my book. I can now do things that were simply impossible for me before. It would take some nerve to demand godlike perfection on top of that, or to demand useful results with little or no effort on my part.
- alganet 8 months ago
  
  > We do know how to make and use tools
  It's the same principle. A tool is supposed to assist us, not the other way around.
  An LLM, "AGI magic" or not, is supposed to write for me. It's a tool that writes for me. If I am writing for the tool, there's something wrong with it.
  > I have examples [...] Just one of those examples is worth a thousand examples like yours
  Please, share them. I shared my example. It can be a very small "bug report", but it's real and reproducible. Other people can build on it, either to improve their "tool skills" or to improve LLMs themselves.
  An example that is shared is worth much more than an anectode.
  
  2 replies →