Yon – a topos-oriented language with a content-addressed lattice heap
3 days ago (yon-lang.org)
Hello everyone. In the last two years I spent, as a dev, part of my free time stretching the limits of my knowledge. Not being a mathematician myself, I discovered that formalizing concepts in mathematical language could nonetheless be useful to improve symbolic reasoning about the concepts themselves. I made use of both books and AI, and I followed the development of the latter, mainly with a critical eye. I have several open projects, and from some observations and explorations on one of them I started asking myself what the current limits of reasoning, of logic, of mathematics itself are. So I explored categories, and topoi, above all starting from Mazzola's theory of music. I asked myself whether this could influence type theory in programming, and I ran some experiments. Out of this came this programming language, Yon, inspired by Yoneda and by morphisms. From another project I drew observations on the Leech lattice; from yet another, some experiments with mmap and coordinate-based allocation in a structure that would be advantageous, again, in a topological sense. The language certainly has mistakes here and there and I wrote the documentation in a hurry; the work took 3 weeks in total. It compiles to LLVM for performance reasons, and for now I preferred to avoid a VM and a GC. It contains unusual data structures that turn out to be performant. It's worth a look, and I hope it will win some converts, and that someone will want to help me with its development. I'd love for it to bring fresh stimuli to programming and maybe open a few new frontiers. A few concrete details, for those who want to look under the hood. The compiler is a real pipeline, not an interpreter: an OCaml frontend takes .yon source into a custom MLIR dialect I called "topos", where the categorical constructs live as first-class operations; its lowering passes take everything down to LLVM IR and from there to a native executable. A single command, yonc, drives the whole chain, and you can stop at any intermediate stage to see what a categorical construct actually becomes on its way to silicon. The runtime is where the Leech lattice observations ended up. The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address. That buys three things I would now find hard to give up: equality is a single machine comparison no matter how big the value is (string equality benches flat at ~17 ns up to 32,768-character strings, because it compares handles, never bytes); deduplication is global and automatic, with no interning logic in user code; and giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move. Concurrency I kept deliberately simple-minded: no threads, no shared mutable state. A program splits into isolated "Spaces" (separate processes, isolation enforced by the MMU) that talk over shared-memory channels with explicit failure semantics. About what is verified and what is just hope: the ground truth is a regression suite of 112 examples plus a cross-Space scenario suite, with exit codes identical on Linux x86-64 and macOS Apple Silicon (Intel Macs: untested). The book on the site, 21 chapters plus appendices, had every snippet compiled and run before being written down. The benchmarks appendix declares its environment and method; I tried not to publish any number without one. The limits of 1.0 are written down as well, in a baseline document that lists every fixed pool (256 heaps per chain, 64 Spaces, 16 concurrent RPC sessions, and so on), with the rationale that a hard limit that fails loudly is a specification, while a soft limit that degrades silently is a bug. For the license I went with the GCC model: compiler and toolchain are AGPLv3, the runtime is AGPLv3 with an explicit linking exception, so the language itself stays free, and the programs you write in it are entirely yours, under any license you choose.
Site + book: https://yon-lang.org Repo: https://github.com/yon-language/yon (tag v1.0.0)
Happy to answer anything: the topos dialect, why a lattice rather than a hash, what the categorical constructs lower to, what broke along the way.
I'm not sure where or how to convey this, because I've seen several of these languages designed with AI, documentation created using AI, etc -- posted on Hacker News in the last months or so, and I've responded to each one with roughly the same feedback (and I'm assuming good faith: that the intent is that the poster wishes to grow as a language designer).
Your audience, or whoever you aim your work at, should be treated with respect. Otherwise, why should they give you the time of day? Why would you expect them to respond positively to effort alone when effort (in code and in shit prose) is extremely cheap right now? Their time is not cheap ...
When I read the documentation, and it is extremely clear that you haven't taken the time to clarify your ideas, when much of it is LLM prose, when much of the content introduces highfalutin ideas without motivation, blending categorical concepts (which, by the way, should never be mixed with vague prose claims about the language), violating my reader context model, preventing me from understanding what problem exactly your language design is solving (where is that problem stated clearly?), it is a waste of my time.
> The work took 3 weeks in total ... it's worth a look, and I hope it will win some converts, and that someone will want to help me with its development.
You've gone too fast, too much is vague, nothing is clear.
I'd delete everything, start over, and try and explain just one of the ideas clearly. Seriously. This sounds harsh, but it's honestly the correct approach to something as subtle and nuanced as programming language design.
This isn’t about whether the writer uses LLM or not at all, nor is it about respect. The core novelty it tries to introduce is not hard to understand (even if it is not really that novel). If you don’t want to spend time thinking about what interesting idea it is exploring, that is fine, but pretending or insinuating that it is a LLM problem is just lazy.
> Your audience, or whoever you aim your work at, should be treated with respect.
I just want to amplify this point. As I was reading this, the LLMisms kept jumping out at me and each one felt like the author looking at me and deciding that my time spent reading this prose wasn't actually worth anything to them.
OP: I want YOUR thoughts, not the next token predictions of a gigantic pile of matrix multiplications. I want your awkward sentences, grammar mistakes, half-baked thoughts, self-doubt, silly jokes. I don't want this pile of grandiose mechanical slop completely devoid of humanity.
Just a comment: this sounds a lot like when someone I knew mildly succumbed to AI psychosis, and thought he, with Gemini, had made some physics/metaphysics breakthrough. If you’re losing sleep and feeling distressed or euphoric, maybe lay off for a few days, no matter how hard that is. Talk to friends and/or family about unimportant things. Get outside for a while. Go back to old hobbies (reading, hiking, just going to coffee shops or thrift stores - whatever) and then reassess.
This language looks interesting, but I don’t understand the concepts. Does this stuff make sense to other people?
The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address.
What is ‘Λ₂₄’? What is a ‘lattice point’?
giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move
This kind of sounds like you’re saying that there’s nothing to free, which implies that nothing takes up memory, which I presume is not the case. Do you mean everything is immutable and content-addressed (like Git)? Doesn’t stuff still need to be freed somehow when the programs done with it, otherwise memory will grow for ever?
Agreed. Everything is a weird mixture of poetry and mathematics jargon. Basically every page of the book contains some esotericism which makes empty claims. It's completely divorced from reality.
> Does this stuff make sense to other people?
Nope, and I actually learned about application of category theory to programming language in university.
I tried to get an idea about the main points, and then stumbled over
> a thing is what you can observe of it. > > [...] > > Content addressing is extensionality made physical (chapter 11): two values indistinguishable by observation are not merely equal, they are the same slot
That only works in a category because you have enough (a countably or uncountably infinite number) functions that you can compose and "test" so you don't need (or don't care) about the "value" itself.
But on a real computer that doesn't work, because you can't go beyond a countable number, and even then you run into the halting problem pretty soon. So equality in this model is not computable. Which is sort of bad if you want to somehow store values "in the same slot" just based on observability. It might work for string literals, and even for concatenated strings, but not in general.
Picking some random lattice (a lattice is a partially ordered structure with some extra conditions) as a base of addressing doesn't help...
So yes, crackpot AI slop. The words sort of make sense, but there's nothing solid behind it, and as soon as you look at details it falls apart.
I didn't even get that far; I found the syntax annoying.
There is nothing physics/metaphysics about this. If you don’t understand the terms, don’t pretend you do and write slop as a comment, it is really not that different from using LLM to generate slop.
https://en.wikipedia.org/wiki/Leech_lattice
Maybe I just don’t have the mathematics knowledge to understand it, but that doesn’t really tell me how you could represent one in memory, or use one as a backing store for a hash-addressed data structure.
As I understand it, content addressing function content is problematic because it does not actually "normalise" the content of functions into something interchangeable. A function of input A and output B with performance signature X can still be very different in terms of actual code, but the actual comparison between both is hard to specify.
I was exploring this as a means to solving the open source, or rather the github conundrum, the problem of sharing code socially is that we need a canonical source, and this is sociologically driven than performance driven, and as it turns out, have devastating consequences for FOSS funding. I wanted to explore sharing code "interchangeably" in some sense to avoid this problem, but ultimately this seems unsolveable, even with exploration by Unison etc.
The documentation is a work of art. Every time I try to work out what just one of the unexplained ideas is, it just introduces new unexplained ideas. I don't know where these ideas came from, how they fit together, or why putting them together is useful. I certainly don't know why I would want to write a program in this language, as opposed to any other language I already know.
Sibling comment suggests maybe it’s AI psychosis and that would clarify a lot.
I have a PhD in category theory and know what the Leech lattice is and I still don't understand what is going on here. What is the value of using the Leech lattice to store memory?
> Content addressing is extensionality made physical (chapter 11)
Actually, that's in chapter 12; 11 is the standard library. Maybe the LLM got confused because the chapters are 0-indexed.
I was curious about that topic but it seems over my head. I don't think it works outside of mathematics? In programming, one can have two objects that are identical in both structure and value but have different identities. It's why lisp has eq, eql, equal, etc. How'd you get around that other than adding an identity property?
Also:
> A handle, what your variables actually hold for strings, sections, lists, trees, is that slot index, carried as an f64
Why does the handle need floating point?
I noticed since 5.5 GPT has been adding "lattice" to a lot of things. Not sure if it is the new Gremlins.
Advanced AI psychosis.
Professional help might be necessary.
Honestly, as someone who has at least a moderate tolerance for PL jargon, most of this is completely impenetrable. It's like someone put the whole field of PL in an LLM-powered blender.
> No garbage collector
> Slots are stable for the life of the process; the heap grows with distinct content only.
So how is a program supposed to handle lots of unique content? Like a web server handling user requests?
I apologize, you’re absolutely right!
Taking it at face value, you'd spawn a subprocess to handle each web request and exit once it's done.
You must not mix up technical mathematical words with softer prose words. For instance, "Yon's data model is categorical. A world is a category, a semantic site;"
So if you want to define a world, I expect you to tell me how to supply objects + morphisms + the composition law + the site structure. I don't know what a "semantic site" is, just what a "site" is. You'd need to define it. Anyway, we then get to our first examples of declaring worlds:
This maybe gives me the first bit of data we need for a site. Definitely not the rest. Then we hit this
Two issues with this. One is stylistic. Why on earth would you call it a "subset"? It's not a set! "subworld" is the obvious choice... But the real issue is that like the initial definition, this doesn't tell me how to build `Sub`. I need to know which objects and morphisms of Currency to include into the category Sub? What's the site structure?
So now I think, "OK, maybe you just declare part of the structure and fill it in later, before you actually use it..."
But then your example disproves that notion! You have
with no mention of Account when you declared Shop, I'm still not sure what Code or X are, and then you give what is seemingly supposed to be some working code
So your motivating example really kills off the interest from your two main communities that would use this thing: 1) category theorists have no idea what you're talking about, because nothing here looks like categories - there are no morphisms, no site structure 2) computer/software folks look at your example and think "why on earth would I learn topos theory to do something that sure looks like OOP"
I think a "topos inspired programming language" would be kind of cool if you could pull it off, but I think you really need to figure out how to sell it in the docs to at least one of the two communities above.
Hmm. Only responder taking this seriously and the account is created minutes ago.
I took it seriously in so far as I made a good faith attempt to understand a piece of it and wrote something about why I failed...
Can't help I've just been a HN lurker so far :)
A few notes, because this is obviously vibe coded, and does not work in many ways.
1. Yon's documentation mentions "Homotopy type theory:"
2. Normalization can fail in Yon. Yon's docs say that its universe of propositions has booleans (https://yon-lang.org/book/heyting-core?_highlight=prop). It also says its logic is intuitionistic (AKA, constructive). However, it also says the logical connectives on booleans are CLASSICAL. This implies law of excluded middle, which is NOT constructive without careful sandboxing (e.g., Linear logic).
3. Yon's type definitional equality does not actually reduce types. See here. This is the function used by the type-checker to check if types are equal. https://github.com/yon-language/yon/blob/523e363a4a00e8da141... No reduction actually occurs, conveniently because none of the types actually contain terms (that is, it is simply typed).
> world W { Code is X }
> place Account in W { balance number }
> fun takes_sub(s: { a : Account where Pi(x: Account). Pi(y: Account). Id(Account, x, y) }): number { return 0 }
> fun main(): number { return 0 }
Notably, the identity is the only constructor for Ty indexed by a term. That is, Pi types can ONLY eliminate into the identity. What if I want my Pi type to eliminate into anything else living in Prop? e.g., an existential like \forall (x : Nat), \exists (y : Nat), x < y. Unfortunately impossible in Yon.
This project is clearly produced by AI, and clearly dangerously incorrect. Do not use this for anything serious.
Could you name a few languages you had in mind while developing this, their respective problems, and how your language improves them, feature by feature?
> Yon allocates into xleech2, a content-addressed heap whose geometry is the Leech lattice Λ24: exactly 196,560 slots per heap.
What is the computational complexity of memory allocation into this Leech lattice? What applications did you have in mind where making allocation a maths problem in order to save time on comparisons makes sense? What is going to happen when a program exhausts your little heap?
I have trouble with the idea that these lattice structures could be less computationally complex or less likely to collide than a good simple hash table. I guess they could be more guaranteed to have stable access times?
The more I try to understand, the more it appears that they are a hash table (hash-addressed-structure to be pedantic), but with way more complicated backing than a hash table.
I found this page, https://yon-lang.org/book/coming-from
It's such a weird mixture of poetry and math that it's hard to tell what's going on. I suspect the author does not speak English as a first language (or at all?) and has used an LLM to generate this stuff.
The buzzwords are strong in this one.
this has templeOS vibes
[dead]
[dead]