Comment by Bridged7756
3 days ago
Mirrors my sentiment. Those tools seem mostly useful for a Google alternative, scaffolding tedious things, code reviewing, and acting as a fancy search.
It seems that they got a grip on the "coding LLM" market and now they're starting to seek actual profit. I predict we'll keep seeing 40%+ more expensive models for a marginal performance gain from now on.
> Those tools seem mostly useful for a Google alternative, scaffolding tedious things, code reviewing, and acting as a fancy search.
Just to get a sense for the rate of change, imagine if you took a survey. Compare what people said about AI tools... 3 years ago, 2 years ago, 1 year ago, 6 months ago. Then think about what is plausible that people will be saying in 3 months, 6 months, 9 months ...
Moving the goalposts has always happened, but it is happening faster than I've ever seen it. Many people seem to redefine their expectations on a monthly basis now. Worse, they seem to be unaware they are doing it.
Fancy search? Ok, I'll bite. Compare today's "fancy search" to what we had ~3 years ago according to your choice of metric. Here's one: minutes spent relative to information found. Today, in ~5 minutes I can do a literature review that would have taken me easily 10+ hours five years ago. We don't need to argue phrasing when we can pick some prototypical tasks and compare them.
We're going to have different takes about where various AI technologies will be in these future timelines. It is much better to run to where the ball is likely to be, even if we have different ideas of where that is.
The human brain, at best, struggles to grasp even linear change. But linear change is not a good way to predict compounding technological change.
> Today, in ~5 minutes I can do a literature review that would have taken me easily 10+ hours five years ago.
And it will not yield the same outcome you would have had. Your own taste in clicking links and pre-filtering as you do your research, is no longer being done if you outsource this. I‘m guilty of this myself. But let’s not kid ourselves.
I’ve had GPT Pro think 40 minutes about the ideal reverse osmosis setup for my home. It came up with something that would have been able to support 10 houses and cost 20k. Even though I did tell it all about what my water consumers are and that it should research their peak usage. It just failed to observe that you can buffer water in a tank.
There‘s a reason they let you steer GPT-Pro as it goes, now.
I don't claim using AI is the same as doing it yourself. My point is that AI capabilities are much more extensive than "fancy search". By giving a metric and an example I hoped to make that point without getting into hair-splitting.
2 replies →
Your quoted example to make that point isn't particularly convincing, IMO. Cursor came out in 2023 and everything on that list would be a typical use case, plus ChatGPT for the search replacement.
Of course, it wasn't nearly as effective back then compared to current SOTA models, but none of those are hard to imagine someone recommending Cursor for anytime in 2024 or later.
If OP instead said something like one shotting an entire line of business app with 10k LoC I would agree with your reminder about perspective. But it feels somewhat hype-y to say that goal posts are being moved "monthly" when most of their list has been possible for years.
I was attempting to give an example to say that AI-LLM technology is more than "fancy search" which to me sounds like "search engine". / I realize now that ChatGPT was released in late 2022, more than 3 years ago. Time flies.
> But it feels somewhat hype-y to say that goal posts are being moved "monthly"...
Here's what I mean. What you see if you kept a journal once a day and wrote down:
1. what impressed you about AI that day;
2. what did you do with it that day that you pretty much took for granted ("just SoTA")
Then compare today against 30 days ago. A lot changes! My point is that it is getting harder to impress us: our standard for what we expect seems to be changing significantly on a ~monthly basis. What does this rate of change where you "just expect something to work as table stakes" feel like to you? Certainly faster than annually, right? 6 months? 3? 2? 1?
For me, a lot of this isn't just the raw technology but also socialization of what the tools can do and the personal experience of doing it yourself.
Can you explain this literature review process?
I don't believe you can do a same quality job with an LLM in 5 minutes.
I don't mean writing a literature review. I mean reviewing the literature to find what I need. My point is that this was not practical with "fancy search" three years ago by which I mean Google-like search engines.
My example: I wanted to get a sense for the feasibility of doing a project that blends Gaussian Processes, active learning, and pairwise comparisons. So I want to dig into the literature to find out what is out there. This was around 5 minutes with Claude. In this case, I don't think I could have found what I wanted in 10 hours of searching and reading. This is the kind of thing that great LLMs unlock.
2 replies →
You're relying on the public's sentiment as a metric. The public's sentiment is, more than often, skewed, influenced by marketing, or flat out wrong. That is not a good metric to rely on.
Did it ever occur to you that the ever changing goalposts might have more to do with the expensive marketing campaigns of the big LLM providers?
We could talk about what's a measurable metric and what's not. Certainly, we have not much more other than "benchmarks" of which, honestly, I don't know the veracity of, or if big LLM cheats somehow, or if the performance is even stable. The core idea is that LLMs remain able to do exactly what they were able to do back at release; text prediction. They got better in some regards, sure.
Your example is worrisome to me. It should be to you too. You didn't write a literature review, you generated a scaffold of a literature review, with the same vices of LLM-based-writing as anything it does and still needing review and revising. I would hope rewriting to avoid your work be associated with LLM-generation. For better or worse, you still need to, normally, revise your work. For, once again, because this point seems to be difficult to grasp, a text predictor is not a reliable source of information. We make tradeoffs, sacrificing reliability for ease of use, but any real work needs human reviewing: which goes back to my first point. In this example it's doing nothing other than it being a fancy search and scaffolding tool.
The ball is likely to be in the same place because, once again, they're text predictors. Not sentient beings, or intelligent. Still generating text, still hallucinating, probably even more so thanks to the ever increasing amount of LLM-written content on the internet and initiatives like poison fountain doing a number on the generated content.
It's wild to me to make such claims about the rate of change of those tools. You're claiming we'll see exponential gains for those tools, I take, while completely ignoring the base set of constraints those models will, never, be able to get rid of. They only know how to produce text. They don't know, and will never really, know if it's right.
Hi. I read your message, and I considered it. I've also read some of your previous HN comments. Briefly, I'll just say I've argued at length against many of the claims you make (you certainly aren't alone in making them). I don't feel it would be useful to repeat these again here, but I'll reference a few, below, just to show that I do care about the subject matter and am happy to dig deeper ...
... but only with certain conversational norms. I say this because I predict we aren't (yet) matched up in a way such that we would have a conversation useful to us. The main reason (I guess) isn't about our particular viewpoints nor about i.e. "if we're both critical thinkers". We're both demonstrating that frame, at least in our language. Instead, I think it is about the way we engage and what we want to get out of a conversation. Just to pick one particular guide star, I strive to follow Rapoport's Rules [1]. FWIW, HN Guidelines are not all that different, so simply by commenting here, one is explicitly joining a sort of social contract that point in their direction already.
Anatol Rapoport or Daniel Dennett were not only brilliant in their areas of specialty but also in teaching us how to criticize constructively in general. I offer the link at [1] just in case you want to read them and give them a try, here. We can start the conversation over (if you want).
---
In response to your comments about consciousness, intelligence, etc, here are some examples of what I mean by intelligence and why:
- intelligence: https://assets.edge.bigthink.com/uploads/attachment/file/151...
I just don’t see how they’ll be able to make a profit. Open models have the same performance on coding tasks now. The incentives are all wrong. Why pay more for a model that’s no better and also isn’t open? It’s nonsense
I wouldn't say the same but it's pretty close. At this point I'm convinced that they'll continue running the marketing machine and people due to FOMO will keep hopping onto whatever model anthropic releases.
Which open model has the same performance as Opus 4.7?
They dont have to be parity today.
If the frontier models reach a point of barely any noticeable improvements the trade off changes.
You do not need a perfect substitute if you are getting it for free...
People will factor in future expectations about the development of open source vs frontier models. Why do you think OAI and anthropic are pushing hard on marketing? its for this reason. They want to get contractual commitments that firms have to honour whilst open source closes the gap.
1 reply →
Open models, in actual practice, don't match up to even one or two generation prior models from Anthropic/OpenAI/Google. They've clearly been trained on the benchmarks. Entirely possible it was by mistake, but it's definitely happening.
GLM 5.1 is absolutely on par with Sonnet 4.5, sometimes better in practice (it holds abstractions over longer context windows better)
It’s about the only one that is at that level though to be fair. They’re all still useful, still!
That hasn’t been my experience. For coding at least I find little difference between closed and open models
[dead]
I think that's precisely why they're paying thousands of people in those other jobs to perform their tasks while collecting new data. Software was easiest because its already mostly written down, but other jobs can be quantized with enough data points. Just give it time