I am a MUMPS programmer – Ask me anything

12 years ago

The vast majority of my work involves maintaining a system written in MUMPS, running on InterSystems Caché.

This may be the Stockholm syndrome speaking, but it's pretty alright. And I say this working with extensively with, among other things, Python and JavaScript also. My educational background is an MSc in Physics, so I'm also familiar with everything from MATLAB to LabVIEW to Assembler to PHP.

Ask me anything!

Just to give everyone here a rough idea, the most annoying thing I found recently is that the function:

$ZCONVERT(stringVar, "O", "JS")

Which escapes stringVar into a valid JavaScript string without quotation marks doesn't escape line separator or paragraph separator (U+2028 and U+2029), when it should.

This was also a bit of a problem in browsers, JSONP and the JSON spec a while ago. Life in the MUMPS world isn't as bad as you'd think. Except for a lack of nice libraries. You do not want to know when regexes made it into the language. Last year. But there has been something similar - pattern matching - which alleviated the need for them a bit. And calling out to DLLs is fairly easy. It's really driven more by the healthcare industry than anything else.

Also, here's a short FizzBuzz I wrote:

  f i=1:1:100 w ! w:'(i#3) "Fizz" w:'(i#5) "Buzz" w:'$x i

Written in pseudocode:

  for i=1:1:100 {
    write newline
    write if not i%3 "Fizz"
    write if not i%5 "Buzz"
    write if not cursorposx i
  }

Shorter than any other FizzBuzz I've seen other than Perl, yet probably more readable.

"everything from MATLAB to LabVIEW to Assembler to PHP."

This is kind of a litany of relatively-unstructured programming languages and sounds like a relatively one-dimensional view of computer program organization techniques. Of those listed, PHP is the one with the most advanced code-organization and object model, but its object model is hardly renowned.

Have you ever considered learning something like Ruby in depth and messing around with intensive object-oriented techniques and write-your-own-DSL metaprogramming and the like, so you can know how the other half of the world gets to program?

(ed) oh, and here's me getting -1'd. wonder what that's about. probably someone with thin skin thinking that asking about highly structured programming implies a put-down on the other kind. maybe I could put in some words of praise for Matlab's awesome matrix handling and it'd help? :P

  • My languages of choice are JavaScript, SQL and Python and I've gone quite deep in all of them. I'd consider ObjectScript compiling down to efficiently work through SQL queries or to form objects and other things a metalanguage, so Caché isn't as one-dimensionally pure imperative as you'd think.

    I don't have any formal background in computer science, so thank you for pointing this out, and it's true. In a nutshell, you're telling me to work through SICP, right? :P

    Edit: Yes, you need to mention MATLAB's matrix handling. Probably also say something about NumPy and PyPy, I'm sure that'd help too.

    • Okay! Since you HAVE done something like Python that's good to know; knowledge of your experience colors my interpretation of your interpretations. :)

Are you looking for work? Because my company is hiring good MUMPS programmers -- we are nearly always on the lookout for qualified people.

I bring it up, not because I think you're likely to be looking for work, but because I thought the others reading this discussion thread might be interested in the fact that it is difficult to hire qualified MUMPS programmers.

I work at a bank, and our banking system (the system that keeps track of the balances in the accounts) is written in MUMPS -- probably because it dates back to the time when that was the shiny new programming language.

  • Yikes. I've not done mumps in a long while, but I've heard from old co-workers that Cache has had a number of significant security vunerabilities in recent years.

    • There was a thing a little while ago, but haven't seen anything big in many years apart from that. Mostly something about database corruption on VMS or ECP or similarly obscure things not really relevant to us.

The first thing I would do if I had to work with something like MUMPS is to write like, a MUMPS LLVM backed or something takes takes a saner language and emits MUMPS, therefore abstracting away the crazy. Why don't you guys do that? Or maybe it's already been done?

  • It has been done. The most popular form of MUMPS out there, Caché ObjectScript is a superset of MUMPS, so all existing MUMPS code will run on Caché.

    But it adds a big bunch of things from error handling to better variable scope to classes to, I wish I was kidding, proper 'if'.

    Something that's also been added (to GT.M as well, afaik), because it is so tightly integrated with its database, are tcommit, trollback and tstart as commands, which are strictly speaking database commands and not the kind of thing you'd expect to find in a programming language.

Came here to plug the GT.M version of MUMPS, which is really great. It uses the underlying UNIX system as much as possible (so, for example, your routines are not stored in the database!)

http://tinco.pair.com/bhaskar/gtm/doc/books/

It's easy to put a CGI interface on top of GT.M - performance is quite good.

http://71.174.62.16/demo/TestCGI.htm

Personally I am working on a utility that wraps GT.M in an "environment" similar to a Python virtualenv, but I'm not sure I'm ready to show my baby to the world yet...

  • Ewww, CGI.

    InterSystems Caché ships with Apache as an administrative web server for its Management Portal, through which you can also run all applications.

    It ships with modules for Apache and IIS (ISAPI), and probably others. These come with a little ini file that's meant to sit in the same directory.

I appear to be an inferior version of you; you've described my job, I'm also familiar w/ MATLAB (loved the absolute pants off that language in class), python, and javascript, but I only have a B.S. in Physics. I also think that MUMPS is unfairly maligned.

Do you live in a place that rhymes with 'Radisson'?

Non-tech question, but with your background, you could have migrated to any field. Anything stand out as a motivation to move in the direction you did? LabVIEW/MATLAB were my early intros into CS. I haven't really touched MUMPS yet, and I should. Thanks for the recommendations.

  • After hundreds of applications sent out - in a vast variety of fields, considering that degree - and an interview with Red Hat I didn't succeed in ultimately, this was the next best thing. In retrospect, probably better. The employment market for scientists here in Australia is pretty much complete crapness unless you do a PhD, which I didn't go for. Plus, I've had the IT experience and interest and our system is a domain I'm really interested in. And the company is pretty awesome too, especially my coworkers who are all equally enthusiastic about both the product and our customers.

    Sorry if I'm not going into that much detail :P

    • Yeh, did not expect much work detail, but that makes sense. Most of my fellows voice op-eds of frustration, so it's nice to see someone defend it a bit ;-)

How does it feel to be using NoSQL so old it came back into fashion? :-P

  • We have dozens if not in the low hundreds of SQL tables.

    So, while we do have a huge pile of legacy code not using SQL you can map your NoSQL data structures to SQL tables and you can also later on convert to a more efficient format that gives you bitmap indices and so on, while still using the global storage backend.

    So, it may have been NoSQL until some point in the nineties and after that it was really NotOnlySQL.

    There's something a bit therapeutic about seeing the indices, including bitmap indices, and all the data on disk in a format that's intuitive and usable for humans that you can use without going through SQL, but either by accessing it directly or through the built-in ORM system. You don't get that with PostgreSQL or MySQL, or conversely, MongoDB or CouchDB. It's both worlds. Sure, there's a lot of stuff from PostgreSQL that I would kill for - any volunteers? - but as a compromise between the two worlds it works quite well indeed.

    Edit: Here's more info:

    http://docs.intersystems.com/cache20131/csp/docbook/DocBook....

    The awesomest points are the %ID pseudo-column, implicit joins, embedded SQL (which compiles SQL down to native MUMPS code, including cursors and all), near enough complete SQL-92 and DDL compliance. And the ORM stuff.

The obvious question would be why? I don't know Caché except what I read on wikipedia, but having worked with plenty of languages over the years it doesn't sound like anything you couldn't easily do in python with 100x readability improvement. I see little reason to go backwards with languages when we (the CS field) have made such awesome improvements over the years. Now, rather than worry about the bs stuff we can focus on algorithms and whether or not an idea is actually useful when working. I build things in Python all the time, throw away most, but the ones that look promising are productized.

  • A lot of systems — legacy and current — are written in MUMPS. Historically MUMPS has been very popular in health care systems (where it originated), and I believe it's still huge there; it is used and supported by a number of niche companies for things like patient data.

    In other words, MUMPS is a platform and an ecosystem as much as a language. Think of Java or Ruby — for a lot of companies, including MUMPS shops, staying with a specific "sub-ecosystem" is simply the most rational choice because they have so much invested it already.

    If you look beyond tech that is currently considered "bleeding edge" — Go, JavaScript, Ruby and so forth — you will find a lot of companies who rely on what you may consider weird or even legacy software. For example, Delphi (a descendant of Borland's Turbo Pascal which is still based on ObjectPascal) is still very popular. In finance, languages like K are still popular. I believe finance still has a ton of stuff based on object databases such as Objectivity/DB, Versant, Matisse and GemStone (Smalltalk), which actually look a lot like today's document-oriented databases. InterSystems Caché, which is based on MUMPS, is a hybrid SQL/OODBMS. In other words, the software market has a lot of aging technology that is still working superbly for the parties involved. Old code is usually proven code.

    • InterSystems Caché is more like UNIX than it is like, let's say, MongoDB. Make the bottom of it efficient - that's where the runtime and the B-tree storage operate - and you can build a world on top. SQL from tables to views to indices, all the ORM and things like classes and MVC are implemented mostly as macros. And it works pretty well.

      1 reply →

    • You are right about healthcare, for example the entire US Veteran's Administration runs on MUMPS and I think epic systems also use cache. This area is ripe for disruption, they've been stuck with the same legacy stuff for 30 years.

  • We use Python too.

    This system was originally put together in the 80s and while a complete reimplementation from scratch in Python is possible, it wouldn't really give us that many benefits that we don't already have. We don't have to worry about algorithms all that much - we may just write an SQL query which takes care of things for us. One of the more complex things we do in new code is maybe two levels of $order, the equivalent of your for x in y loop.

    We're programming on a level quite a bit higher than C, nor are we exposed to the sheer verbosity of Java, thank god. You think Caché ObjectScript is awful? I think Java is awful.

    What Caché gives us is tight integration between the language and the database system, so in that way the choice isn't really between Caché ObjectScript and Python, it's more between PL/pgsql or PL/Python and Caché ObjectScript. Once you go there you'll realise that the code that's written in ObjectScript isn't really affected that much by the choice of programming language anyway. And PL/Python isn't really something you'd want to write a production system in anyway. Between that and built-in ORM, things really aren't that bad.

    And 100x readability improvement is just wrong. Sorry, it is. Sure, that may apply to some old MUMPS code that survived the 70s when disk space was sparse but even this is fairly straightforward to expand into something quite readable. We do have syntax highlighting, function calls look the same, and assigning a value to an object is 'set object.Property = "blah"' instead of 'object.Property = "blah"', a difference that's quite trivial.

    That's not to say the lack of libraries isn't annoying, but everything really important is there in the right places. I've written a decompressor for tar.gz in about 34 quite readable lines, with error handling and all. gzip is built in these days so it's really mostly .tar I needed to worry about. Alternatively, calling out is just a matter of $zf(cmdLine, -1). Similarly, I've just put chosen (https://github.com/harvesthq/chosen) into our web application, based on Caché. It was easy enough to do.

    And it's a pretty top-notch fast SQL implementation with a nice built-in language and a nice ORM and lots of other bonuses like full-text search and bitmap indices and OLAP cubes (yes, it speaks MDX even) if nothing else.

  • For one thing, there's not much of an incentive to port hundreds of thousands of lines of working legacy code to a different language just to make it more readable. Its age isn't a relevant issue either - sure, MUMPS debuted in 1966, but InterSystems Caché and GT.M are still being actively developed. Other databases are pretty old, too - the first version of Oracle was written in 1978.

    If Caché had no strengths, I would agree with you, but as a non-relational database, it's pretty good. Sparse associative arrays are the default data structure, so it's very popular in, e.g., medical applications, where you would want to be able to store thousands of different things, but any given patient will only need a few of them.

    • > Sparse associative arrays are the default data structure, so it's very popular in, e.g., medical applications, where you would want to be able to store thousands of different things, but any given patient will only need a few of them.

      Caché has really been moving away from that though... CacheStorage stores data in a big list in the *D globals.

      I think the power in CacheStorage really comes from the indices and - very relevantly to the healthcare industry - all the relationships. They've got a very complex schema they need to support and Caché continues to be pretty good at that kind of thing - see the implicit joins in their SQL variety for example or Zen.

      2 replies →

    • It's all legacy stuff, the incentive not to port is that it's too hard to do so. Most of the intersystems cutomers I have worked with want to move away to something more modern but cant.

      1 reply →

Don't really have anything to ask, just wanted to say MUMPS was always neat when I worked for a medical software company and did conversions from and older versions of MUMPS to a Caché server.

I liked the one-letter verb abbreviations, even if it made the code feel somewhat write-only. Data right next to your front-end language was neat too.

  • The one letter abbreviations are neat.

    There's no substantial change in readability in going from:

    if condition: print "Hello World"

    Or: if (condition) { console.log("Hello, World\n"); };

    To:

    w:condition "Hello, World",!

    I never type 'write', but just 'w' instead. Here's a list of commands: http://docs.intersystems.com/cache20131/csp/docbook/DocBook....

    > Data right next to your front-end language was neat too.

    Don't know about neat, it made it very hard to separate your model from your logic and while convenient at the time that's also quite a bit of a pain.

Do you have any recommendations for books/learning materials/websites for people who want to learn MUMPs?

How well are you paid? You don't have to give specific numbers, just like in comparison with the average. I've heard rumors of people getting paid a shit ton of money to maintain these sorts of systems but those might be just rumors.

  • Sorry to disappoint you, but my job isn't as such the maintenance of a legacy system. We write new code in Caché, which includes things used by a wider world like JavaScript and SQL as well.

    Ask me again in two or three decades, but our codebase is continuously being touched in all places and there is an ongoing drive to weed out legacy code all the time. But just on the side I've been able to get rid of about 30% of all the legacy code here without spending that much time on it at all mostly with the help of grep, in part because our system has now moved to being 100% web from a desktop client.

    As for salary, never enough :P, but considering my age, the economy, and my lifestyle I'm pretty happy.

Kind of off-topic question but: Could you provide any option to contact you? I have a couple of questions with regards to MUMPS and I'd like to - if possible - drop you an e-mail.

I was a key-programmer for distributor in some countries for MSM database, before they were bought by Intersystem. From my personal experience, mumps better suited to DB related systems only not fancy stuffs.