1. The marketing (even "research" papers) vastly overstates these systems' capabilities
2. We've observed unacceptable failure rates for the majority of useful tasks
3. Despite a bunch of money being poured into it, failure rates are decreasing slowly even with exponentially more training and inference resources, and there are precious few tasks where we've arrived at an alternative reframing making those errors tolerable
4. The capabilities are increasing slowly, and it's not obvious that pouring resources into the problem will lead to the phase shift necessary to either overcome (3) and (4) or to generate some novel application that makes it all worth it
Almost every attempted application I've seen has "worked" by using the AI to push externalities to other parties. It hasn't actually made the composite system meaningfully more efficient, with notable exceptions (e.g., the chat interface, once you learn the tool, if you use it correctly, can meaningfully improve certain kinds of productivity).
Plus, for people who actually work in AI, the current approach is obviously wrong. There are roughly two sorts of architectures in use for variable-width data: recurrent networks which store everything and process everything (e.g., transformers), and recurrent networks which compress everything and process the entire compressed space (e.g., mamba), with a special mention for compression-based architectures which have a large enough state space that compression is unnecessary, and another special mention for slight variations yielding constant speedups like MoE. The problem with all of those ideas is that you're necessarily trading off accuracy for compute. To have sufficient accuracy on arbitrary problems, you _must_ store everything, and you _cannot_ meaningfully compress it. The thing that isn't necessarily required though is processing everything at every iteration (a direction MoE works toward), and past a certain scale, unless we get vastly more efficient chips in a hurry (even if we do, depending on the target application), it's obvious that you need a new architecture to build around.
And so on. I work in AI (old-school ML for $WORK, new-school at home). I'm more bullish than a lot of people, especially since I see a direct cause and effect in things that make the business and my life better, but I'm still skeptical of the current craze. At a minimum, it's one of the most wasteful ways we could possibly do this research, and society isn't yet in a good place for the meager results we're getting. New-school AI is, currently, a net-negative to society, and I don't see that changing in the next several years.
One concern is that coding/logical puzzles are verticals where LLMs have lots of training data, they require small context window, and that's why they are doing well, but they don't necessary scale/generalize on other topics. For example I yet to see agents which would grab say Postgres codebase from github, and add untrivial features, send patch which is accepted.
I solved this in the worst way. Took a (correct) guess at what was meant by prefix, and then brute-forced a dozen of them. Did you know that zetta.ai is also an AI company?
I think you might need api access to a nameserver to take that approach. because if you just throw 100,000,000 dns lookups at your default server they are going to throttle you.
As I understand it, yeah, there's a way to ask the .ai nameserver for its entire "zone" (the mapping from domain names to... everything else). That's a "zone transfer" a.k.a. "AXFR" request, which you can make by first locating a nameserver that knows about .ai:
$ dig NS ai.
Now you have the names of .ai's nameservers, and the glue records for some of them:
;; ADDITIONAL SECTION:
v0n3.nic.ai. 107 IN A 199.115.155.1
v0n3.nic.ai. 107 IN AAAA 2001:500:a3::1
v0n0.nic.ai. 7 IN A 199.115.152.1
Now you ask that nameserver for a zone transfer:
$ dig AXFR ai. @199.115.155.1
...And it quickly says "no, not to you; I don't know you and so I'm not going to spend the bandwidth to tell you all that."
; <<>> DiG 9.18.30-0ubuntu0.22.04.2-Ubuntu <<>> AXFR ai. @199.115.155.1
;; global options: +cmd
; Transfer failed.
But hey, that's how you'd ask. Now, if you were on the nameserver's whitelist, you'd see the whole zone, and the answer to the blog's puzzle would be somewhere in there. (But note that the answer is also at the end of TFA; you don't have to solve it yourself if you don't want to.)
I wouldn't use the words "API access" to describe "permission to make AXFR requests," but yeah, it's the same general idea: if you're not on the list, you can't do the thing.
o3-mini solved it
https://chatgpt.com/share/67ee9b1c-57d4-8005-b7ec-16fceab1ff...
o1-pro also solves it
https://chatgpt.com/share/67ee9bc0-2804-8005-85c3-2978e7a8ba...
Looking at this, I do not understand how people can be bearish about LLMs.
Because this is just simple substitution of common math/CS functions.
I think the usual bearish sentiment comes from:
1. The marketing (even "research" papers) vastly overstates these systems' capabilities
2. We've observed unacceptable failure rates for the majority of useful tasks
3. Despite a bunch of money being poured into it, failure rates are decreasing slowly even with exponentially more training and inference resources, and there are precious few tasks where we've arrived at an alternative reframing making those errors tolerable
4. The capabilities are increasing slowly, and it's not obvious that pouring resources into the problem will lead to the phase shift necessary to either overcome (3) and (4) or to generate some novel application that makes it all worth it
Almost every attempted application I've seen has "worked" by using the AI to push externalities to other parties. It hasn't actually made the composite system meaningfully more efficient, with notable exceptions (e.g., the chat interface, once you learn the tool, if you use it correctly, can meaningfully improve certain kinds of productivity).
Plus, for people who actually work in AI, the current approach is obviously wrong. There are roughly two sorts of architectures in use for variable-width data: recurrent networks which store everything and process everything (e.g., transformers), and recurrent networks which compress everything and process the entire compressed space (e.g., mamba), with a special mention for compression-based architectures which have a large enough state space that compression is unnecessary, and another special mention for slight variations yielding constant speedups like MoE. The problem with all of those ideas is that you're necessarily trading off accuracy for compute. To have sufficient accuracy on arbitrary problems, you _must_ store everything, and you _cannot_ meaningfully compress it. The thing that isn't necessarily required though is processing everything at every iteration (a direction MoE works toward), and past a certain scale, unless we get vastly more efficient chips in a hurry (even if we do, depending on the target application), it's obvious that you need a new architecture to build around.
And so on. I work in AI (old-school ML for $WORK, new-school at home). I'm more bullish than a lot of people, especially since I see a direct cause and effect in things that make the business and my life better, but I'm still skeptical of the current craze. At a minimum, it's one of the most wasteful ways we could possibly do this research, and society isn't yet in a good place for the meager results we're getting. New-school AI is, currently, a net-negative to society, and I don't see that changing in the next several years.
Because it is reassuring.
1 reply →
One concern is that coding/logical puzzles are verticals where LLMs have lots of training data, they require small context window, and that's why they are doing well, but they don't necessary scale/generalize on other topics. For example I yet to see agents which would grab say Postgres codebase from github, and add untrivial features, send patch which is accepted.
log* is https://en.wikipedia.org/wiki/Iterated_logarithm
aka inverse of tetration
It's an ad for a search engine.
I solved this in the worst way. Took a (correct) guess at what was meant by prefix, and then brute-forced a dozen of them. Did you know that zetta.ai is also an AI company?
All roads lead to Rome right?
this problem does not look remotely interesting
Yep. Just some mechanical calculation.
Haven't read past the problem statement to avoid spoilers.
How difficult can it be to brute force this. (Try all possible domain names within a reasonable length)
Take help from AI while you are at it.
I think you might need api access to a nameserver to take that approach. because if you just throw 100,000,000 dns lookups at your default server they are going to throttle you.
Depending on what you understand by the term "prefix" you can dramatically shrink the search space without doing any thinking.
As I understand it, yeah, there's a way to ask the .ai nameserver for its entire "zone" (the mapping from domain names to... everything else). That's a "zone transfer" a.k.a. "AXFR" request, which you can make by first locating a nameserver that knows about .ai:
Now you have the names of .ai's nameservers, and the glue records for some of them:
Now you ask that nameserver for a zone transfer:
...And it quickly says "no, not to you; I don't know you and so I'm not going to spend the bandwidth to tell you all that."
But hey, that's how you'd ask. Now, if you were on the nameserver's whitelist, you'd see the whole zone, and the answer to the blog's puzzle would be somewhere in there. (But note that the answer is also at the end of TFA; you don't have to solve it yourself if you don't want to.)
I wouldn't use the words "API access" to describe "permission to make AXFR requests," but yeah, it's the same general idea: if you're not on the list, you can't do the thing.
More than I care to know about AXFR: https://cr.yp.to/djbdns/axfr-notes.html
1 reply →
I think that “prefix” properly interpreted makes this a small search set.
exa.ai/givemeprize