Comment by privong

18 hours ago

> You can pause, inspect objects, change values, and even redefine a broken function on the fly to test a fix in any environment (yes even in production, while running).

I see this mentioned often, and it sounds amazingly useful (especially the part about fixing in production!). But how truly widespread is it among the Lisp dialects to be able to connect to a running program, debug, and hotfix it? I understand Common Lisp has it, but I struggled to figure out how to do it in, say, Racket. Admittedly I'm am relatively inexperienced Lisp programmer, so maybe I wasn't looking in the right place or for the right words. Which Lisp dialects do indeed support the extreme version of this capability to inspect and edit running programs?

It’s trivially easy to do in Clojure (literally one line of code to start an nREPL server, after deps/requires), and often very useful in dev and personal, local projects. In practice, I’ve never once used it in a user-facing production system, in 16 years of writing Clojure.

Out of the box, there’s zero security or audit trail. Building that properly isn’t trivial and, even with it in place, many corporate infosec teams would have fits if you suggested that engineers can make arbitrary inspections/modifications to a running production system.

Where it could be appropriate, often you’re running the code in autoscaling containers or something similar. Modifying one instance then is rarely anything but a terrible idea.

Where I have used it is for things like long-running internal batch systems that run a single instance and never touch any sensitive data. Connecting a REPL in those cases is much more flexible and powerful than, say, building a dashboard UI or a control API over http, and you get it for free.

  • Yeah, I mean ... shipping a RCE backdoor gets you some cool hacker war stories but it's still shipping a RCE backdoor.

Yes but I don’t know how someone familiar with a Jetbrains IDE can claim that only Lisp has that feature. I love Common Lisp and SLIME, but most of what it can do, I can also do in Java with the IDE. Change a method definition while it’s running and then restart the method? No problem. Run any code within the context of the running method? Yes, Java can do it. Change local variables values in the middle of a method? Easy!

The Lisp REPLcis still superior because it comes with more stuff, like DECOMPILE, INSPECT and so on that can only exist because the language is essentially a compiler even at runtime, which can also be a problem for sensitive domains… but in Java you can do all those things using the IDE so the distance between what is possible in Lisp and a language with good IDE support like Java and Kotlin is now negligible in my opinion.

  • I've frequently said that Java + JRebel gets the closest to the Common Lisp + slime experience (closer than Python) but as you say the Lisp experience is still superior, the Java ecosystem has yet to close the gap*. The widest part of that gap I'd mention is in not having the condition system built-in to Java (though I'm aware people have tried to make a comparable one as a library), lacking it degrades the debugging experience considerably (even though simple step-debugging is typically more pleasant than in Lisp). IntelliJ's drop frame feature isn't good enough. The other problem is needing Java + something. What you get with just a regular JVM running under your IDE is no better than what other languages offer (if they offer anything) as their cute hotswap/hotpatch feature and comes with big limitations. (Like no changing method signatures or no adding/removing methods or properties, or only applying changes to new objects.) Once you're doing something non-trivial, especially if you're trying to incrementally develop your program rather than just debug one specific problem, you'll have to restart. In contrast Common Lisp's got its disassemble, describe, inspect, compile, fmakunbound, ... all being functions callable at runtime, and update-instance-for-redefined-class is part of the standard language too. Support for live reloading of everything is baked into the language rather than a hack on top, slime is just a convenient way of working with it. It's still convenient to restart the program occasionally, but few things force you to.

    Unfortunately JRebel has killed their free tier, so I'd now point unwilling-to-pay programmers to something like https://github.com/JetBrains/JetBrainsRuntime which is IntelliJ/Eclipse/whatever-independent. I haven't tried it myself yet though... Given they only address the biggest class reloading concerns, I doubt it's actually comparable to JRebel for business-world Java. JRebel handles among other things dynamic reloading from XML changes and reinitializing autowired Spring beans that other classes use for dependencies.

    *Caveat, I've been out of the professional Java grind for a while, I'd be pleasantly surprised if some new version that's come out contradicts me.

it's been my experience that when most people say "Lisp does this that or the other", what they usually mean is "Common Lisp does this that or the other". Often there's an implicit "with SLIME" in there as well

  • This is doable in Common Lisp, Scheme/Racket, and Clojure. Yes, it might require some tooling.

    • Can you elaborate on how this is doable (in, say, Racket) and what tooling is needed? I'm afraid your reply doesn't add much information beyond the same assertion that I quoted that was in the article posted to HN. And I haven't been able to find information on this with Racket.

  • That could very well be it. I guess I had gotten my hopes up, seeing the statement in a piece that purported to be specifically about Scheme .

Python is not Lisp, but jumping into a Python REPL in a halfway-run program and poking at the internals easily is _very_ useful as a debugging tool, quickly getting you answers on some messier programs.

It's a shame that other scripting languages that theoretically have the capabilities to do this don't do this (looking at you, node! Chrome dev tools are fine but way too futzy compared to `import pdb; pdb.set_trace()` and "just" using stdin)

I do also use Emacs, and with Emacs Lisp `trace-function` means you can very quickly get call traces in your running instance without having to pull out a debugger and the like. Not like you can't trace functions with `gdb` of course. But the lowered barrier to entry and the ability to do in-process debugging dynamically means you just have access to richer debugging tools from the outset.

  • In ruby it used to be common to ssh into a box, attach to the console and edit files from the REPL and rerun the code to see if your patch worked. I haven’t touched it in years and I doubt many people do that anymore.

  • Yeah not having an equivalent to pdb.set_trace() is what turned me off compiled languages, but with AI I'm not even sure anymore.

People do it in Clojure all the time in the dev setup. And you technically can do in your customer environments too, but it's of course a bit of a cowboy thing to do there.

  • "Cowboy thing" is putting it mildly. It invites/incentivises terrible behavioral patterns. The next guy looking has no idea what happened to that running system. (That next guy may well be you yourself a week or month later.)

That sort of hotfix workflow isn't really a thing in Racket or Scheme in general. Changing the definition of a function doesn't update everything else that calls that function like it does in CL.

Maybe emacs lisp works that way?

  • You know, after some testing with a bunch of different scheme implementations, I take back what I said, at least for working in a REPL.

        (define (displayln msg) (display msg) (newline))
        (define (inner x) (+ x 1))
        (define (outer x) (inner x))
        (displayln (outer 5))
        (define (inner x) (+ x 2))
        (displayln (outer 5))
    

    outputs 6 and 7 in every one I tried, not the 6 and 6 I expected.

    • Perhaps you were thinking of lexical scope vs dynamic scope? Lexical scoping would prevent a local definition of inner from changing the definition used in outer.

        (let ((inner (lambda (x) (+ x 3)))) (outer 5))
        "7"
      

      But updating the definition of inner with set! or define changes the top-level definition.

  • Clojure allows for that, giving you neat hot reload capabilities when working in Clojurescript. I believe Emacs Lisp works the same way, and allows for fairly fluid debugging sessions.

    Universal hot reload is really a messy beast though. For every "yeah we can just reload this without re-init'ing the structure" there's another "actually reloading causes weird state issues and you have to restart everything anyways" thing.

    I've found that hot reloading _specific targetting things_ tends to get you closer to where you want. But even then... sometimes using browser dev tools to experiment on the output will get you where you want faster than trying to hot reload clojurescript but having to "reset" state over and over again or otherwise work around weirdness.

    I think this flow works well in Emacs though because you're operating on an editor. So you can change things, press buttons, change things, and have a good mental model. Emacs Lisp methods tend to have very little state to them as well (instead the editor is holding a bunch of exposed state).

    Meanwhile React (for example) has _loads_ of hard-to-munge state that means that swapping one component for another inline might be totally fine or might just crash things in a weird way or might not have anything happen. Sometimes just a full page refresh will save you thinking about this

I use it a lot for my one man projects; it is really fantastic in that setting. I use SBCL exclusively; it is very fast and robust and has image based development. I have my own versioning toolkit so I don't go insane.

It is obvious why it is not really used or recommended as it really falls flat in a team setting, mostly even when 2 people are involved. But fixing bugs live as they happen and then spitting out a new .exe for clients is still a lot faster than modern alternatives. Far more dangerous too.

  • What makes you think it falls flat in a team setting? There are plenty of N-pizza-sized teams successfully using Lisp to this day and you're probably aware of many teams successfully using Lisp in the past, too. There's also the success of Clojure. What's required to have a well functioning team is mostly programming language independent; Lisp itself won't save a team lacking those properties anymore than say Java would.

    • Did you even read what I said or who I responded to? I am specifically talking about working inside an image, monkey patching functions and structures live in the running image. A practice almost no one uses anymore and of which I said that as a single dev on a project I use and find convenient, but I would not want to use it in a team; for that, modern workflows with versioning, beaming code, ci/cd, dev containers etc are preferred.

      I prefer lisp over most other things in life, and so does my team. I was specifically not talking about the language though.

Not Lisp, but for those interested in editing programs that are running in production:

I read some Erlang article saying that hot swapping is not actually very useful in production because of some reasons, and instead a blue-green deployment is preferred. Can't find the link atm. This was close: https://news.ycombinator.com/item?id=42405168 Hot swaps for small patches and bugfixes, and hard restarts for changing data structures and supervisor tree.

  • It not that hot swapping isn’t useful, it’s just difficult to do well and you need to write your code in a way that supports it. If you need 0 downtime on a device that can do a blue green deployment then the BEAM has you covered. Most people just don’t need that, so the extra hassle isn’t worth constantly considering how to migrate data in flight.

A common workflow is to run code to test some function in the REPL and then promote it to a test when you are ready, and this process has been the smoothest in lisps, especially since you can create your own test harness if you need to.

Fun fact is that giving AI repls also reduce error rates so much that you can save up to half the tokens/time or more.

Have had to do this live in a production MtG card management application. It worked well. The owner kept their MtG card money. Lisp saved the day.

It’s common in Clojure as well as other Lisps. I was just doing that exact thing, modifying a running program in production, earlier this week, adding in print calls to gather debugging information and then modifying the code to fix the bug and it immediately going live and the correct behavior verified.

I also see this mentioned often and have wondered the same. I can sort of envision this working in a single threaded application, but how would this work in a web application for example? If a problematic function needs to be debugged, can you pick what thread you're debugging? If not, do all incoming requests get blocked while you debug and step through stack frames?

  • Being paused in the debugger is per-thread. If the server's using a thread-per-request model, and you're stopped in the request, then other requests can proceed just fine. If some of those requests also trigger the debugger, they'll pause and have to wait, they won't interrupt your current debugging view. Extra care should be taken in any sort of production debugging, of course. (At a Java BigCo, production debugging was technically allowed but required multiple signoffs, the engineer wasn't the one in control but had to direct someone else, lots of barriers to prevent looking at arbitrary customer data, and of course still limited to what you can do with a standard JVM restarted in debug mode. (Mainly setting breakpoints and walking stack traces.))

    But the nicest part is that once you connect to the production application, apart from network lag it's no different than if you were developing and debugging locally on similarly specced hardware to the server, you have all the same tools. Many of the broader activities around "debugging" don't need to happen in a paused thread that was entered with an explicit breakpoint or error, they can happen in a separate thread entirely. You connect, then you can start inspecting (even modifying) any global state, you can define new variables, you can inspect objects, you can define new functions to test hypotheses, redefine existing functions... if you want all requests to pause until you're done, you can make it so. Or if you want to temporarily redirect all requests to some maintenance page, you can make that so instead. A simple thing I like doing sometimes when developing locally (and I could do it on a production binary too) is to define some (namespaced) global variable and redefine a singly-dispatched method to set it to the self object (possibly conditionally), and once I have it I might redefine the method again to have that bit commented out just so I know it won't change underneath me. Alternatively I can (and sometimes do) instead set this where the object is created. Then I have a nice variable independent of any stack frames that I can inspect, pass to other method calls, change properties of, whatever, at my leisure without really impacting the rest of the program's running operation. Another neat trick is being able to dynamically add/remove inherited mixin superclasses to some class, and when you do that it automatically impacts all existing objects of that class as well. Mixin classes are characterized by having aspect-oriented methods associated with them; you can define custom :before, :after, or :around methods independent of the primary method that gets called for some object.

The nREPL is present even in newer dialects. It is as easy as installing Calva vscode extension for Clojure, or jacking in with Cider. This makes it perfect for LLM interaction as well.