Comment by LPisGood
19 days ago
I think it is very intuitive that more space beats the pants off of more time.
In time O(n) you can use O(n) cells on a tape, but there are O(2^n) possible configurations of symbols on a tape of length n (for an alphabet with 2 symbols), so you can do so much more with n space than with n time.
My intuition: the value of a cell can represent the result of multiple (many) time units used to compute something. If you cannot store enough intermediate results, you may end up needing to recalculate the same / similar results over and over - at least in some algorithms. So one cell can represent the results of hundreds of time units, and being able to store / load that one value to re-use it later can then replace those same hundreds of time units. In effect, space can be used for "time compression" (like a compressed file) when the time is used to compute similar values multiple times.
If intermediate results are entirely uncorrelated, with no overlap in the work at all, that would not hold - space will not help you. Edit: This kind of problem is very rare. Think of a cache with 0 percent hit rate - almost never happens.
And you can't really do it the other way around (at least not in current computing terms / concepts): you cannot use a single unit of time as a standin / proxy for hundreds of cells, since we don't quite have infinitely-broad SIMD architectures.
There's many algorithms with a space vs time tradeoff. But what works best, depends a lot on the time/energy cost of re-computing something, vs the storage/bandwidth cost of caching results.
Expensive calculation, cheap storage → caching results helps.
Limited bandwidth / 'expensive' storage, simple calculation (see: today's hyper-fast CPU+L1 cache combo's) → better to re-compute some things on the fly as needed.
I suspect there's a lot of existing software (components) out there designed along the "save CPU cycles, burn storage" path, where in modern reality a "save storage, CPU cycles are cheap" would be more effective. CPU speeds have grown way way faster than main memory bandwidth (or even size?) over the last decades.
For a datacenter, supercomputer, embedded system, PC or some end-user's phone, the metrics will be different. But same principle applies.
As I understand it, this is the essential insight behind dynamic programming algorithms; the whole idea is to exploit the redundancies in a recursive task by memoizing the partial results of lower order stages.
I think this theorem applies well for modern LLMs: large language model with pre-computed weights can be used to compute very complex algorithms that approximate human knowledge, that otherwise were impossible or would have required many orders more compute to calculate
Also, the O(1) random memory access assumption makes it easy to take memory for granted. Really it's something like O(n^(1/3)) when you're scaling the computer to the size of the problem, and you can see this in practice in datacenters.
I forget the name of the O(1) access model. Not UMA, something else.
O(n^(1/2)) really, since data centers are 2 dimensional, not 3 dimensional.
(Quite aside from the practical "we build on the surface of the earth" consideration, heat dissipation considerations limit you to a 2 dimensional circuit in 3-space.)
More fundamentally O(n^(1/2)) due to the holographic principle which states that the maximal amount of information encodable in a given region of space scales wrt its surface area, rather than its volume.
(Even more aside to your practical heat dissipation constraint)
2 replies →
If you have rows of racks of machines, isn't that 3 dimensions? A machine can be on top of, behind, or next to another that it's directly connected to. And the components inside have their own non-uniform memory access.
Or if you're saying heat dissipation scales with surface area and is 2D, I don't know. Would think that water cooling makes it more about volume, but I'm not an expert on that.
7 replies →
On the other hand, actual computers can work in parallel when you scale the hardware, something that the TM formulation doesn't cover. It can be interesting which algorithms work well with lots of computing power subject to data locality. (Brains being the classic example of this.)
Multitape TMs are pretty well studied
Intuitive yes, but since P != PSPACE is still unproven it's clearly hard to demonstrate.
I think that since many people find it intuitive that P != NP, and PSPACE sits way on top of polynomial hierarchy that it is intuitive even if it’s unproven.
There's not even a proof that P != EXPTIME haha
EDIT: I am a dumbass and misremembered.
I think there is right? It's been a long time but I seem to remember it following from the time hierarchy theorem
I thought there was some simple proof of this, but all I can think of is time hierarchy theorem.
The article is about a new proof wherein P == PSPACE.
Something we all intuitively expected but someone finally figured out an obscure way to prove it.
--------
This is a really roundabout article that takes a meandering path to a total bombshell in the field of complexity theory. Sorry for spoiling but uhhh, you'd expect an article about P == PSPACE would get to the point faster....
This article is not about a proof that P = PSPACE. That would be way bigger news since it also directly implies P = NP.
I think it really depends on the task at hand, and not that intuitive. At some point accessing the memory might be slower than repeating the computation, especially when the storage is slow.
One one hand yes, on the other there might be some problems that are inherently difficult to parallelize (alternating machine PTIME is the same as deterministic PSPACE) where space doesn't buy you much. The jump from paper from t/log t to sqrt(t log t) is huge, but it still might be that unbounded parallelism doesn't buy you much more.
But you also spend time on updating cells, so it is not that intuitive.
I’m not sure what you mean here. If you’re in the realm of “more space” than you’re not thinking of the time it takes.
More precisely, I think it is intuitive that the class of problems that can be solved in any time given O(n) space is far larger than the class of problems that can be solved in any space given O(n) time.
If your program runs in O(n) time, it cannot use more than O(n) memory (upper bound on memory usage.
If your program uses O(n) memory, it must run at least in O(n) time (lower bound on time).
9 replies →
A time-bounded TM is also space bounded, because you need time to write to that many cells. But the other way is not.
This is obviously demonstrably true. A Turing running in O(n) time must halt. The one in O(n) space is free not to.
1 reply →