Comment by hnriot
12 years ago
The obvious question would be why? I don't know Caché except what I read on wikipedia, but having worked with plenty of languages over the years it doesn't sound like anything you couldn't easily do in python with 100x readability improvement. I see little reason to go backwards with languages when we (the CS field) have made such awesome improvements over the years. Now, rather than worry about the bs stuff we can focus on algorithms and whether or not an idea is actually useful when working. I build things in Python all the time, throw away most, but the ones that look promising are productized.
A lot of systems — legacy and current — are written in MUMPS. Historically MUMPS has been very popular in health care systems (where it originated), and I believe it's still huge there; it is used and supported by a number of niche companies for things like patient data.
In other words, MUMPS is a platform and an ecosystem as much as a language. Think of Java or Ruby — for a lot of companies, including MUMPS shops, staying with a specific "sub-ecosystem" is simply the most rational choice because they have so much invested it already.
If you look beyond tech that is currently considered "bleeding edge" — Go, JavaScript, Ruby and so forth — you will find a lot of companies who rely on what you may consider weird or even legacy software. For example, Delphi (a descendant of Borland's Turbo Pascal which is still based on ObjectPascal) is still very popular. In finance, languages like K are still popular. I believe finance still has a ton of stuff based on object databases such as Objectivity/DB, Versant, Matisse and GemStone (Smalltalk), which actually look a lot like today's document-oriented databases. InterSystems Caché, which is based on MUMPS, is a hybrid SQL/OODBMS. In other words, the software market has a lot of aging technology that is still working superbly for the parties involved. Old code is usually proven code.
InterSystems Caché is more like UNIX than it is like, let's say, MongoDB. Make the bottom of it efficient - that's where the runtime and the B-tree storage operate - and you can build a world on top. SQL from tables to views to indices, all the ORM and things like classes and MVC are implemented mostly as macros. And it works pretty well.
Sure. I didn't mean to include Caché when I referred to newer document-oriented databases. Caché has a different architecture. It's more similar in design to K and Kdb [1], I suppose, which is also heavily based around vector operations on persistent arrays.
[1] https://en.wikipedia.org/wiki/K_(programming_language)
You are right about healthcare, for example the entire US Veteran's Administration runs on MUMPS and I think epic systems also use cache. This area is ripe for disruption, they've been stuck with the same legacy stuff for 30 years.
We use Python too.
This system was originally put together in the 80s and while a complete reimplementation from scratch in Python is possible, it wouldn't really give us that many benefits that we don't already have. We don't have to worry about algorithms all that much - we may just write an SQL query which takes care of things for us. One of the more complex things we do in new code is maybe two levels of $order, the equivalent of your for x in y loop.
We're programming on a level quite a bit higher than C, nor are we exposed to the sheer verbosity of Java, thank god. You think Caché ObjectScript is awful? I think Java is awful.
What Caché gives us is tight integration between the language and the database system, so in that way the choice isn't really between Caché ObjectScript and Python, it's more between PL/pgsql or PL/Python and Caché ObjectScript. Once you go there you'll realise that the code that's written in ObjectScript isn't really affected that much by the choice of programming language anyway. And PL/Python isn't really something you'd want to write a production system in anyway. Between that and built-in ORM, things really aren't that bad.
And 100x readability improvement is just wrong. Sorry, it is. Sure, that may apply to some old MUMPS code that survived the 70s when disk space was sparse but even this is fairly straightforward to expand into something quite readable. We do have syntax highlighting, function calls look the same, and assigning a value to an object is 'set object.Property = "blah"' instead of 'object.Property = "blah"', a difference that's quite trivial.
That's not to say the lack of libraries isn't annoying, but everything really important is there in the right places. I've written a decompressor for tar.gz in about 34 quite readable lines, with error handling and all. gzip is built in these days so it's really mostly .tar I needed to worry about. Alternatively, calling out is just a matter of $zf(cmdLine, -1). Similarly, I've just put chosen (https://github.com/harvesthq/chosen) into our web application, based on Caché. It was easy enough to do.
And it's a pretty top-notch fast SQL implementation with a nice built-in language and a nice ORM and lots of other bonuses like full-text search and bitmap indices and OLAP cubes (yes, it speaks MDX even) if nothing else.
For one thing, there's not much of an incentive to port hundreds of thousands of lines of working legacy code to a different language just to make it more readable. Its age isn't a relevant issue either - sure, MUMPS debuted in 1966, but InterSystems Caché and GT.M are still being actively developed. Other databases are pretty old, too - the first version of Oracle was written in 1978.
If Caché had no strengths, I would agree with you, but as a non-relational database, it's pretty good. Sparse associative arrays are the default data structure, so it's very popular in, e.g., medical applications, where you would want to be able to store thousands of different things, but any given patient will only need a few of them.
> Sparse associative arrays are the default data structure, so it's very popular in, e.g., medical applications, where you would want to be able to store thousands of different things, but any given patient will only need a few of them.
Caché has really been moving away from that though... CacheStorage stores data in a big list in the *D globals.
I think the power in CacheStorage really comes from the indices and - very relevantly to the healthcare industry - all the relationships. They've got a very complex schema they need to support and Caché continues to be pretty good at that kind of thing - see the implicit joins in their SQL variety for example or Zen.
Unfortunately at my company we don't get to use the fancier Caché features. We have to stick to ANSI M.
We do make a lot of use of indices, though. Finding records can be incredibly fast.
1 reply →
It's all legacy stuff, the incentive not to port is that it's too hard to do so. Most of the intersystems cutomers I have worked with want to move away to something more modern but cant.
> Most of the intersystems cutomers I have worked with want to move away to something more modern but cant.
We're pretty happy with it. If, theoretically, we'd need to start from scratch, it'd would definitely be in our top very very few choices. Not least of all because of things like DeepSee.
I could certainly think of things much worse than Caché.