Yon – a topos-oriented language with a content-addressed lattice heap

3 days ago (yon-lang.org)

Hello everyone. In the last two years I spent, as a dev, part of my free time stretching the limits of my knowledge. Not being a mathematician myself, I discovered that formalizing concepts in mathematical language could nonetheless be useful to improve symbolic reasoning about the concepts themselves. I made use of both books and AI, and I followed the development of the latter, mainly with a critical eye. I have several open projects, and from some observations and explorations on one of them I started asking myself what the current limits of reasoning, of logic, of mathematics itself are. So I explored categories, and topoi, above all starting from Mazzola's theory of music. I asked myself whether this could influence type theory in programming, and I ran some experiments. Out of this came this programming language, Yon, inspired by Yoneda and by morphisms. From another project I drew observations on the Leech lattice; from yet another, some experiments with mmap and coordinate-based allocation in a structure that would be advantageous, again, in a topological sense. The language certainly has mistakes here and there and I wrote the documentation in a hurry; the work took 3 weeks in total. It compiles to LLVM for performance reasons, and for now I preferred to avoid a VM and a GC. It contains unusual data structures that turn out to be performant. It's worth a look, and I hope it will win some converts, and that someone will want to help me with its development. I'd love for it to bring fresh stimuli to programming and maybe open a few new frontiers. A few concrete details, for those who want to look under the hood. The compiler is a real pipeline, not an interpreter: an OCaml frontend takes .yon source into a custom MLIR dialect I called "topos", where the categorical constructs live as first-class operations; its lowering passes take everything down to LLVM IR and from there to a native executable. A single command, yonc, drives the whole chain, and you can stop at any intermediate stage to see what a categorical construct actually becomes on its way to silicon. The runtime is where the Leech lattice observations ended up. The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address. That buys three things I would now find hard to give up: equality is a single machine comparison no matter how big the value is (string equality benches flat at ~17 ns up to 32,768-character strings, because it compares handles, never bytes); deduplication is global and automatic, with no interning logic in user code; and giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move. Concurrency I kept deliberately simple-minded: no threads, no shared mutable state. A program splits into isolated "Spaces" (separate processes, isolation enforced by the MMU) that talk over shared-memory channels with explicit failure semantics. About what is verified and what is just hope: the ground truth is a regression suite of 112 examples plus a cross-Space scenario suite, with exit codes identical on Linux x86-64 and macOS Apple Silicon (Intel Macs: untested). The book on the site, 21 chapters plus appendices, had every snippet compiled and run before being written down. The benchmarks appendix declares its environment and method; I tried not to publish any number without one. The limits of 1.0 are written down as well, in a baseline document that lists every fixed pool (256 heaps per chain, 64 Spaces, 16 concurrent RPC sessions, and so on), with the rationale that a hard limit that fails loudly is a specification, while a soft limit that degrades silently is a bug. For the license I went with the GCC model: compiler and toolchain are AGPLv3, the runtime is AGPLv3 with an explicit linking exception, so the language itself stays free, and the programs you write in it are entirely yours, under any license you choose.

Site + book: https://yon-lang.org Repo: https://github.com/yon-language/yon (tag v1.0.0)

Happy to answer anything: the topos dialect, why a lattice rather than a hash, what the categorical constructs lower to, what broke along the way.

34 comments

amenn

mccoyb 3 hours ago

I'm not sure where or how to convey this, because I've seen several of these languages designed with AI, documentation created using AI, etc -- posted on Hacker News in the last months or so, and I've responded to each one with roughly the same feedback (and I'm assuming good faith: that the intent is that the poster wishes to grow as a language designer).

Your audience, or whoever you aim your work at, should be treated with respect. Otherwise, why should they give you the time of day? Why would you expect them to respond positively to effort alone when effort (in code and in shit prose) is extremely cheap right now? Their time is not cheap ...

When I read the documentation, and it is extremely clear that you haven't taken the time to clarify your ideas, when much of it is LLM prose, when much of the content introduces highfalutin ideas without motivation, blending categorical concepts (which, by the way, should never be mixed with vague prose claims about the language), violating my reader context model, preventing me from understanding what problem exactly your language design is solving (where is that problem stated clearly?), it is a waste of my time.

> The work took 3 weeks in total ... it's worth a look, and I hope it will win some converts, and that someone will want to help me with its development.

You've gone too fast, too much is vague, nothing is clear.

I'd delete everything, start over, and try and explain just one of the ideas clearly. Seriously. This sounds harsh, but it's honestly the correct approach to something as subtle and nuanced as programming language design.

danieltanfh95 10 minutes ago

This isn’t about whether the writer uses LLM or not at all, nor is it about respect. The core novelty it tries to introduce is not hard to understand (even if it is not really that novel). If you don’t want to spend time thinking about what interesting idea it is exploring, that is fine, but pretending or insinuating that it is a LLM problem is just lazy.
skulk 2 hours ago

> Your audience, or whoever you aim your work at, should be treated with respect.
I just want to amplify this point. As I was reading this, the LLMisms kept jumping out at me and each one felt like the author looking at me and deciding that my time spent reading this prose wasn't actually worth anything to them.
OP: I want YOUR thoughts, not the next token predictions of a gigantic pile of matrix multiplications. I want your awkward sentences, grammar mistakes, half-baked thoughts, self-doubt, silly jokes. I don't want this pile of grandiose mechanical slop completely devoid of humanity.

jrmg 3 hours ago

Just a comment: this sounds a lot like when someone I knew mildly succumbed to AI psychosis, and thought he, with Gemini, had made some physics/metaphysics breakthrough. If you’re losing sleep and feeling distressed or euphoric, maybe lay off for a few days, no matter how hard that is. Talk to friends and/or family about unimportant things. Get outside for a while. Go back to old hobbies (reading, hiking, just going to coffee shops or thrift stores - whatever) and then reassess.

This language looks interesting, but I don’t understand the concepts. Does this stuff make sense to other people?

The heap is content-addressed over Λ₂₄: every value is mapped to a lattice point and canonicalized under the Conway group Co₀ (via libmmgroup), so the same content always lives at the same address.

What is ‘Λ₂₄’? What is a ‘lattice point’?

giving up the GC stopped being a renunciation, since cells are immutable and content-addressed, so there is nothing to trace and nothing to move

This kind of sounds like you’re saying that there’s nothing to free, which implies that nothing takes up memory, which I presume is not the case. Do you mean everything is immutable and content-addressed (like Git)? Doesn’t stuff still need to be freed somehow when the programs done with it, otherwise memory will grow for ever?

leecommamichael 1 hour ago

Agreed. Everything is a weird mixture of poetry and mathematics jargon. Basically every page of the book contains some esotericism which makes empty claims. It's completely divorced from reality.
dirkt 1 hour ago
> Does this stuff make sense to other people?
Nope, and I actually learned about application of category theory to programming language in university.
I tried to get an idea about the main points, and then stumbled over
> a thing is what you can observe of it. > > [...] > > Content addressing is extensionality made physical (chapter 11): two values indistinguishable by observation are not merely equal, they are the same slot
That only works in a category because you have enough (a countably or uncountably infinite number) functions that you can compose and "test" so you don't need (or don't care) about the "value" itself.
But on a real computer that doesn't work, because you can't go beyond a countable number, and even then you run into the halting problem pretty soon. So equality in this model is not computable. Which is sort of bad if you want to somehow store values "in the same slot" just based on observability. It might work for string literals, and even for concatenated strings, but not in general.
Picking some random lattice (a lattice is a partially ordered structure with some extra conditions) as a base of addressing doesn't help...
So yes, crackpot AI slop. The words sort of make sense, but there's nothing solid behind it, and as soon as you look at details it falls apart.
- canyp 38 minutes ago
  
  I didn't even get that far; I found the syntax annoying.
danieltanfh95 9 minutes ago

There is nothing physics/metaphysics about this. If you don’t understand the terms, don’t pretend you do and write slop as a comment, it is really not that different from using LLM to generate slop.
esafak 2 hours ago
https://en.wikipedia.org/wiki/Leech_lattice
- jrmg 1 hour ago
  
  Maybe I just don’t have the mathematics knowledge to understand it, but that doesn’t really tell me how you could represent one in memory, or use one as a backing store for a hash-addressed data structure.

danieltanfh95 15 minutes ago

As I understand it, content addressing function content is problematic because it does not actually "normalise" the content of functions into something interchangeable. A function of input A and output B with performance signature X can still be very different in terms of actual code, but the actual comparison between both is hard to specify.

I was exploring this as a means to solving the open source, or rather the github conundrum, the problem of sharing code socially is that we need a canonical source, and this is sociologically driven than performance driven, and as it turns out, have devastating consequences for FOSS funding. I wanted to explore sharing code "interchangeably" in some sense to avoid this problem, but ultimately this seems unsolveable, even with exploration by Unison etc.

cjs_ac 3 hours ago

The documentation is a work of art. Every time I try to work out what just one of the unexplained ideas is, it just introduces new unexplained ideas. I don't know where these ideas came from, how they fit together, or why putting them together is useful. I certainly don't know why I would want to write a program in this language, as opposed to any other language I already know.

skrebbel 2 hours ago

Sibling comment suggests maybe it’s AI psychosis and that would clarify a lot.

Chinjut 23 minutes ago

I have a PhD in category theory and know what the Leech lattice is and I still don't understand what is going on here. What is the value of using the Leech lattice to store memory?

canyp 32 minutes ago

> Content addressing is extensionality made physical (chapter 11)

Actually, that's in chapter 12; 11 is the standard library. Maybe the LLM got confused because the chapters are 0-indexed.

I was curious about that topic but it seems over my head. I don't think it works outside of mathematics? In programming, one can have two objects that are identical in both structure and value but have different identities. It's why lisp has eq, eql, equal, etc. How'd you get around that other than adding an identity property?

Also:

> A handle, what your variables actually hold for strings, sections, lists, trees, is that slot index, carried as an f64

Why does the handle need floating point?

iterateoften 2 hours ago

I noticed since 5.5 GPT has been adding "lattice" to a lot of things. Not sure if it is the new Gremlins.

GreenSalem 2 hours ago

Advanced AI psychosis.

Professional help might be necessary.

swiftcoder 1 hour ago

Honestly, as someone who has at least a moderate tolerance for PL jargon, most of this is completely impenetrable. It's like someone put the whole field of PL in an LLM-powered blender.

wavemode 3 hours ago

> No garbage collector

> Slots are stable for the life of the process; the heap grows with distinct content only.

So how is a program supposed to handle lots of unique content? Like a web server handling user requests?

skrebbel 2 hours ago

I apologize, you’re absolutely right!
Retr0id 2 hours ago

Taking it at face value, you'd spawn a subprocess to handle each web request and exit once it's done.

z4lis 2 hours ago

You must not mix up technical mathematical words with softer prose words. For instance, "Yon's data model is categorical. A world is a category, a semantic site;"

So if you want to define a world, I expect you to tell me how to supply objects + morphisms + the composition law + the site structure. I don't know what a "semantic site" is, just what a "site" is. You'd need to define it. Anyway, we then get to our first examples of declaring worlds:

  world Currency { Code is EUR, USD }
  world Status { State is on, off }

This maybe gives me the first bit of data we need for a site. Definitely not the rest. Then we hit this

  world Sub subset of Currency

Two issues with this. One is stylistic. Why on earth would you call it a "subset"? It's not a set! "subworld" is the obvious choice... But the real issue is that like the initial definition, this doesn't tell me how to build `Sub`. I need to know which objects and morphisms of Currency to include into the category Sub? What's the site structure?

So now I think, "OK, maybe you just declare part of the structure and fill it in later, before you actually use it..."

But then your example disproves that notion! You have

  world Shop { Code is X }

  place Account in Shop {
    balance number
    owner String
  }

with no mention of Account when you declared Shop, I'm still not sure what Code or X are, and then you give what is seemingly supposed to be some working code

  fun main(): number {
    be a holds new Account { balance 40 owner "ada" }
    be _p holds String.print(a.owner)
    return a.balance + 2
  }

So your motivating example really kills off the interest from your two main communities that would use this thing: 1) category theorists have no idea what you're talking about, because nothing here looks like categories - there are no morphisms, no site structure 2) computer/software folks look at your example and think "why on earth would I learn topos theory to do something that sure looks like OOP"

I think a "topos inspired programming language" would be kind of cool if you could pull it off, but I think you really need to figure out how to sell it in the docs to at least one of the two communities above.

leecommamichael 1 hour ago
Hmm. Only responder taking this seriously and the account is created minutes ago.
- z4lis 33 minutes ago
  
  I took it seriously in so far as I made a good faith attempt to understand a piece of it and wrote something about why I failed...
  Can't help I've just been a HN lurker so far :)

omegafixedpoint 32 minutes ago

A few notes, because this is obviously vibe coded, and does not work in many ways.

1. Yon's documentation mentions "Homotopy type theory:"

  > the runnable HoTT fragment is refl/pair/fst/snd

  These are basic features of martin-lof type theory, not homotopy type theory. The documentation makes no reference of an interval type, which is generally the way to go for decidable type-checking in HoTT without univalence as an opaque axiom.
  
  Pi types are mentioned, but Yon does not have dependent types. From what I can tell, they are polymorphic, maybe even just simply-typed (except for identity types under Pi). See here in the repo: https://github.com/yon-language/yon/blob/523e363a4a00e8da1410a2521b1d7d1309d360ce/frontend/ast.ml#L37
  The datatype Ty only refers to other Ty's. Thus, it is not dependent. Terms cannot appear in types. Pi is explicitly indexed by a type defining its domain and codomain. A pi type is not a pi type if its codomain cannot depend on its domain.

2. Normalization can fail in Yon. Yon's docs say that its universe of propositions has booleans (https://yon-lang.org/book/heyting-core?_highlight=prop). It also says its logic is intuitionistic (AKA, constructive). However, it also says the logical connectives on booleans are CLASSICAL. This implies law of excluded middle, which is NOT constructive without careful sandboxing (e.g., Linear logic).

  Yon dangerously allows propositions to be lifted to booleans. If I am interpreting correctly (docs are very vague), this means propositions can be lifted to terms. This causes an obvious failure of normalization due to assumed proof irrelevance (otherwise Prop would not be distinguished) (also see Coquand's paper on this https://arxiv.org/abs/1911.08174).

3. Yon's type definitional equality does not actually reduce types. See here. This is the function used by the type-checker to check if types are equal. https://github.com/yon-language/yon/blob/523e363a4a00e8da141... No reduction actually occurs, conveniently because none of the types actually contain terms (that is, it is simply typed).

  Yon claims to be dependently-typed. See this in the repo: https://github.com/yon-language/yon/blob/523e363a4a00e8da1410a2521b1d7d1309d360ce/frontend/ast.ml#L21
  > Types in Yon Core kernel — the small dependent type theory used for the operational semantics.

  Suppose I'm reading the source code wrong. Conveniently, the comment one line below reveals, once again, that the "type theory" is simply-typed:

  >  *   T, U ::= Type_n                            universe of level n
  >  *         |  Pi(x:T). U                         dependent function

  Pi types eliminate into a FIXED type that does not depend on x. This means there is also no purpose for having a universe hierarchy.

  To confirm, I scoured the docs a bit for any examples using Pi types or Sigma types. I searched the docs, and could not find any, besides this example:

> world W { Code is X }

> place Account in W { balance number }

> fun takes_sub(s: { a : Account where Pi(x: Account). Pi(y: Account). Id(Account, x, y) }): number { return 0 }

> fun main(): number { return 0 }

Notably, the identity is the only constructor for Ty indexed by a term. That is, Pi types can ONLY eliminate into the identity. What if I want my Pi type to eliminate into anything else living in Prop? e.g., an existential like \forall (x : Nat), \exists (y : Nat), x < y. Unfortunately impossible in Yon.

This project is clearly produced by AI, and clearly dangerously incorrect. Do not use this for anything serious.

esafak 3 hours ago

Could you name a few languages you had in mind while developing this, their respective problems, and how your language improves them, feature by feature?

> Yon allocates into xleech2, a content-addressed heap whose geometry is the Leech lattice Λ24: exactly 196,560 slots per heap.

What is the computational complexity of memory allocation into this Leech lattice? What applications did you have in mind where making allocation a maths problem in order to save time on comparisons makes sense? What is going to happen when a program exhausts your little heap?

jrmg 1 hour ago

I have trouble with the idea that these lattice structures could be less computationally complex or less likely to collide than a good simple hash table. I guess they could be more guaranteed to have stable access times?
The more I try to understand, the more it appears that they are a hash table (hash-addressed-structure to be pedantic), but with way more complicated backing than a hash table.
leecommamichael 1 hour ago

I found this page, https://yon-lang.org/book/coming-from
It's such a weird mixture of poetry and math that it's hard to tell what's going on. I suspect the author does not speak English as a first language (or at all?) and has used an LLM to generate this stuff.

drob518 2 hours ago

The buzzwords are strong in this one.

vitriol83 11 minutes ago

this has templeOS vibes

Retr0id 2 hours ago

[dead]

tribal808 3 hours ago

[dead]