Comment by porcoda
4 hours ago
I noticed the same thing, having also written an interpreter for the Wolfram language that focused on the core rule/rewriting/pattern language. At its heart it’s more or less a Lisp-like language where the core can be quite small and a lot of the functionality built via pattern matching and rewriting atop that. Aside from the sheer scale of WL, I ended up setting aside my experiments replicating it when I did performance comparisons and realized how challenging it would be to not just match WL in functionality but performance.
Woxi reminds me of some experiments I did to see how far vibe coding could get me on similar math and symbolic reasoning tools. It seems like unless you explicitly and very actively force a design with a small core, the models tend towards building out a lot of complex, hard-coded logic that ultimately is hard to tune, maintain, or reason about in terms of correctness.
Interesting exercise with woxi in terms of what vibe coding can produce. Not sure about the WL implementation though.
(For context, I write compiler/interpreter tools for a living - have been for a couple decades)
I’ve personally had luck at correcting the complex one-off logic the agents produce with the right prompting.
and when I say prompting, I just mean code review feedback. All of this is engineering management. I review code. I’ll point out architectural flaws if they matter and I use judgement to determine if they matter. Code debt is a choice, and you can afford it in some situations but not others. We don’t nit over style because we have a linter. Better documentation results in better contribution quality. etc.
Agent coordination? Gastown? All I hear is organizational design and cybernetics