← Back to context

Comment by DoctorOetker

11 hours ago

EDIT: please change the article link to the most recent version (as of now still v2), it is currently pointing to the v1 version which misses the figures.

I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.

Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?

Got a multidimensional and multivariate function to model (with random samples or a full map)? Just do gradient descent and convert it to approximant EML trees.

Perform gradient descent on EML function tree "phi" so that the derivatives in the Schroedinger equation match.

But as I said, still reading, this sounds too good to be true, but I have witnessed such things before :)

From my experience of working in this problem domain for the last year, I'd say it is pretty powerful but the "too good to be true part" comes from that EML buys elegance through exponential expression blow-up. Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length. There's likely an information-theoretic sweet spot between these extremes.

It's interesting to see his EML approach whereas mine was more on generating a context sensitive homoiconic grammar.

I've had lots of success combining spectral neural nets (GNNs, FNOs, Neural Tangent Kernels) with symbolic regression and using Operad Theory and Category Theory as my guiding mathematical machinery

  • In my experience this exponential expression blow-up is less the result of the approach of decomposing into a minimum of primitives, but rather a result from repetition in expression trees:

    If we make the analogy from Bertrand Russel's Principia Mathematica, he derived fully expanded expressions, i.e. trees where the leaves only may refer to the same literals, everyone claimed this madness underscored how formal verification of natural mathematics was a fools errand, but nevertheless we see successful projects like metamath (us.metamath.org) where this exponential blow-up does not occur. It is easy to see why: instead of representing proofs as full trees, the proofs are represented as DAG's. The same optimization would be required for EML to prevent exponential blow-up.

    Put differently: if we allow extra buttons besides {1, EML} for example to capture unary functions the authors mentally add an 'x' button so now the RPN calculator has {1, EML, x}; but wait if you want multivariate functions it becomes an RPN calculator with extra buttons {1, EML, x,y,z} for example.

    But why stop there? in metamath proofs are compressed: if an expression or wff was proven before in the same proof, it first subproof is given a number, and any subsequent invocations of this N'th subproof refers to this number. Why only recall input parameters x,y,z but not recall earlier computed values/functions?

    In fact every proof in metamath set.mm that uses this DAG compressibility, could be split into the main proof and the repeatedly used substatements could be automatically converted to explicitly separate lemma proofs, in which case metamath could dispose of the single-proof DAG compression (but it would force proofs to split up into lemma's + main proof.

    None of the proven theorems in metamath's set.mm displays the feared exponential blowup.

    • Yes, metamath uses a large collection of specialized but reusable building blocks, so it doesn't blow up exponentially. However, if you want to "just do gradient descent" on general trees built from a single universal primitive, you now have to rediscover all those building blocks on the fly. And while the final result may have a compact representation as a DAG by merging common subexpressions, you also need to be able to represent potential alternative solutions, and that's where the exponential blowup comes in.

      Or you could accept that there's already a large collection of known useful special functions, and work with shallower trees of those instead, e.g. https://arxiv.org/abs/1905.11481

  • > Multiplication alone requires depth-8 trees with 41+ leaves i.e. minimal operator vocabulary trades off against expression length.

    That is sort of comparable to how NAND simplify scaling.

    Division is hell on gates.

    The single component was the reason scaling went like it did.

    There was only one gate structure which had to improve to make chips smaller - if a chip used 3 different kinds, then the scaling would've required more than one parallel innovation to go (sort of like how LED lighting had to wait for blue).

    If you need two or more components, then you have to keep switching tools instead of hammer, hammer, hammer.

    • I'm not sure what you mean by this? It's true that any Boolean operation can be expressed in terms of two-input NAND gates, but that's almost never how real IC designers work. A typical standard cell library has lots of primitives, including all common gates and up to entire flip-flops and RAMs, each individually optimized at a transistor level. Realization with NAND2 and nothing else would be possible, but much less efficient.

      Efficient numerical libraries likewise contain lots of redundancy. For example, sqrt(x) is mathematically equivalent to pow(x, 0.5), but sqrt(x) is still typically provided separately and faster. Anyone who thinks that eml() function is supposed to lead directly to more efficient computation has missed the point of this (interesting) work.

  • Where do you see exponential blow-up? If you replace every function in an expression tree with a tree of eml functions, that is a size increase by a constant factor. And the factor does not seem unreasonable but in the range 10 to 100.

    • The exponential blowup is in the symbolic regression section, where to search among depth K trees requires 2^K parameters.

      As an example, searching for sqrt(x) would require a tree of depth ~40 which is in the trillion-parameter regime.

      1 reply →

  • Its not too good to be trough...

    Its a way to make mathematical formulas completely unreadable. Its a way to spend more time on computing functions like log (3 ems reqd) while using more precision. Its a way to blow the mind of muggles reading hacker news.

While I'm really enjoying this paper, I think you are way overstating the significance here. This is mathematically interesting, and conceptually elegant, but there is nothing in this paper that suggests a competitive regression or optimisation approach.

I might have misunderstood, but from the two "Why do X when you can do just Y with EML" sentences, I think you are describing symbolic regression, which has been around for quite some time and is a serious grown-up technique these days. But even the best symbolic regression tools do not typically "replace" other regression approaches.

> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?

Same reason all boolean logic isn't performed with combinations of NAND – it's computationally inefficient. Polynomials are (for their expressivity) very quick to compute.

  • They are done with transistors though. Transistors form an efficient, single element, universal digital basis.

    And are a much less arbitrary choice than NAND, vs. NOR, XOR, etc.

    Using transistors as conceptual digital logic primitives, where power dissipation isn't a thing, Pass Logic is "The Way".

    • Single transistors aren't yet logic gates by themselves; they are amplifiers with a very specific gain function that makes it possible to use them as switches. Logic gates usually consist of at least two transistors. See https://en.wikipedia.org/wiki/CMOS for an example of how it is done in CMOS technology.

      2 replies →

    • > Transistors form an efficient, single element, universal digital basis

      But transistors can be N or P-channel, so it’s not a single logical primitive, like e.g. NAND-gates.

> Why use splines or polynomials or haphazardly chosen basis functions if you can just fit (gradient descent) your data or wave functions to the proper computational EML tree?

Because the EML basis makes simple functions (like +) hard to express.

Not to diminish this very cool discovery!

  • consider how a wavefunction might be stored for numerically searching the ground state of a molecule, or a crystal lattice, humans appreciate 2D imagery that is about 1 kilopixel x 1 kilopixel; so expanding this to 3 dimensions that means the wavefunction would have to store 10 ^ 9 complex numbers (easily 4 GB at 16 bit precision for real and imaginary compnents so 4 bytes per complex number), do we really believe that a DAG variant of the EML construction would consume a bigger value to represent the analytically correct solution? do we really believe that the 4GB DAG variant of EML would produce a less accurate representation (i.e. less fidelity with the schroedinger equation?) If the ground state hydrogen atom is any indication, my money is on EML-style constructions, not naive 3D arrays modelling the wavefunction by brute force.

    This also re-opens a lot of "party pooper" results in mathematics: impossibility of representing solutions to general quintic (fine print: if we restrict ourselves to arithmetic and roots/radicals). In mathematics and physics there have been a lot of "party pooper" results which later found more profound and interesting positive results by properly rephrasing the question. A negative result for a myopic question isn't very informative on its own.

> I'm still reading this, but if this checks out, this is one of the most significant discoveries in years.

It seems like a neat parlour trick, indeed. But significant discovery?

This isn't all that significant to anyone who has done Calculus 2 and knows about Taylor's Series.

All this really says is that the Taylor's expansions of e^x and ln x are sufficient to express to express trig functions, which is trivially true from Euler's formula as long as you're in the complex domain.

Arithmetic operations follow from the fact that e^x and ln x are inverses, in particular that e^ln(x) = x.

Taylor's series seem a bit like magic when you first see them but then you get to Real Analysis and find out there are whole classes of functions that they can't express.

This paper is interesting but it's not revolutionary.

I can't say I'm surprised at this result at all, in fact I'm surprised something like this wasn't already known.

What URL should we change it to?

  • IMO arxiv links should pretty much always be to the abstract (ie .../abs/...) as opposed to particular versions of the html or pdf.

  • The URL is already pointing at v2, which apparently is the newest one requested by the comment above.

    > Submitted on 23 Mar 2026 (v1), last revised 4 Apr 2026 (this version, v2)