FYI: This only implements a subset of K (I'd estimate 1/3).
Calling it a 'release' is an overstatement. The docs state that it is a work in progress. It's also quite buggy (it's easy to get a segmentation fault). The version I saw in January was about 1/3 the size of this version, and also buggy. I hope that the final version of this code is less buggy and more usable.
If you want to learn the K language, don't use this version. Any of the other open source K projects are better than this (more complete, less buggy, better documented). This project is good if you want to learn more about the Arthur Whitney C coding style, because it is so small. Other projects written in this style (some open source K implementations, the J language) are huge by comparison.
> Any of the other open source K projects are better than this (more complete, less buggy, better documented).
One thing that puzzles me, about array languages, is that despite several open source implementations already existing, like J, its surprisingly difficult to find them packaged in Linux repositories. For example, you can't just "apt install J", or "apt install gnu-apl" on Ubuntu. In J case, it seems the default is just compiling it from source. Is there something tricky about packaging them?
The closest to a repository-friendly array language I could find was the klongpy implementation of klong[0], that is pip installable.
You can 'apt install apl' for GNU APL. Most open-source array languages though either have very few users, and/or are moving quite fast and thus an apt-packaged version would likely be rather out-of-date quite basically always. Though, for example, nix has J, BQN, uiua, GNU APL, and Dyalog APL (based on quick searches), so the barrier to entry to apt also is presumably rather high.
Those are some very big claims with respect to performance. Has anyone outside of the author been able to reproduce the claims, considering you need to pay 100k/month just to do it?
I also wonder if the commercial version has anti-benchmark clauses like some database vendors. I've always seen claims that K is much faster than anything else out there, but I've never seen an actual independent benchmark with numbers.
I used to use K professionally inside a hedge fund a few years back. Aside from the terrible user experience (if your code isn’t correct you will often just get ‘error’ or ‘not implemented’ with no further detail), if the performance really was as stellar as claimed, then there wouldn’t need to be a no benchmark clause in the license.
It can be fast, if your data is in the right formats, but not crazy fast. And easy to beat if you can run your code on the GPU.
The last point is spot on... Pandas on GPUs (cudf) gets you both the perf + usability, without having to deal with issues common to stack/array languages (k) and lazy languages (dask, polars). My flow is pandas -> cudf -> dask cudf , spark, etc.
More recently, we have been working on GFQL with users at places like banks (graph dataframe query language), where we translate down to tools like pandas & cudf. A big "aha" is that columnar operations are great -- not far from what array languages focus on -- and having a static/dynamic query planner so optimizations around that helps once you hit memory limits. Eg, dask has dynamic DFS reuse of partitions as part of its work stealing. More SQL-y tools like Spark may make plans like that ahead of time. In contrast, that lands more on the user if they stick with pandas or k, eg, manual tiling.
I've been using kdb/q since 2010. Started at a big bank and have used it ever since.
Kdb/q is like minimalist footwear. But you can run longer and faster with it on. There's a tipping point where you just "get it". It's a fantastic language and platform.
The problem is very few people will pay 100k/month for shakti. I'm not saying people won't pay and it won't be a good business. But if you want widespread adoption you need to create and an ecosystem. Open sourcing it is a start. Creating libraries and packages comes after. The mongodb model is the right approach IMO
Quick for evaluating some idea you just had if you are a quant? Yes absolutely!
So imagine you have a massive dataset, and an idea.
For testing out your idea you want that data to be in an “online analytical processing” (OLAP) kind of database. These typically store the data by column not row and other tricks to speed up crunching through reads, trading off write performance etc.
There are several big tech choices you could make. Mainstream king is SQL.
Something that was trendy a few years ago in the nosql revolution was to write some scala at the repl.
It is these that K is competing with, and being faster than.
Array languages are very fast for code that fits sensibly into arrays and databases spend a lot of compute time getting correctness right. 100x faster than postgres on arbitrary data sounds unlikely-to-impossible but on specific problems might be doable.
Yes, when I took a look at shakti's database benchmarks before, they seemed entirely reasonable with typical array language implementation methods. I even compared shakti's sum-by benchmarks to BQN group followed by sum-each, which I'd expect to be much slower than a dedicated implementation, and it was around the same order of magnitude (like 3x slower or something) when accounting for multiple cores. I was surprised that something like Polars would do it so slowly, but that's what the h2o benchmark said... I guess they just don't have a dedicated sum-by implementation for each type. I think shakti may have less of an advantage with more complicated queries, and doing the easy stuff blazing fast is nice but they're probably only a small fraction of the typical database workload anyway.
Yes, a very specific implementation will be faster than a generic system which includes network delays and ensures you can handle things larger than your memory in a multi-client system. But the result is meaningless - perl or awk will also be faster here.
If you need a database system, you're not going to replace it with K, if you need super fast in-memory calculations, you're not going to use a database system. Apples and oranges.
#define _(e) ({e;})
//!< isolate expression e in its own lexical scope and clamp it with ;
//!< note the outer parens, which is a very powerful c trick: they turn _(e) into a so called
//!< r-value, which basically means we can do x=_(e) for as long as e evaluates to or returns
//!< at least anything at all, i.e. not void. this macro is fundamental to k/simple implementation.
I didn't know that corner of C. Removing the () from the macro does change what you can pass as e, and assigning the result of a block does work as one would expect.
edit:
-Wpedantic on gcc will tell me ISO C doesn't like the construct but it still compiles it happily.
Yes, this is called statement-expression (instead of expression-statement which is the normal "doer" statement that contains just an expression followed by a semicolon).
The Linux kernel makes quite a bit of use of them as far as I'm aware.
They’ve been charging a that amount forever, it’s a crazy ask. But you’ll be happy to hear that the quoted price is about 80% off of the price in 2000, so take advantage of the discount. In 2000 it was $100K/month.
Inflation has been 80% since then, but that doesn't mean it's 80% off. $1000 in 2000 is $1800 today, so a discount of 44%. 80% off would imply it's $5000 today, but prices didn't 5x fortunately.
k is widely used by a handful of big investment banks and big hedge funds for quant/finance stuff. There are only a handful of such companies in the world but they are extremely price insensitive, especially with regard to technology that lets them get a market edge. I suspect this is the dynamic that kx, the company, tapped into over the years. I also suspect this open source release is mainly because investment banks have come around on their desire for open source (rather than proprietary) software over the years, at least on some teams. You can see the open source release doc explicitly positions k vs Python, pandas, and polars.
For example, I have an old friend from a major investment bank who used to work on an internal (proprietary) pub/sub system but who, these days, works on integrations between that system and Apache Kafka.
For those who haven't heard of or aren't familiar with K, the Wikipedia page[1] has a remarkably helpful brief overview:
> K is a proprietary array processing programming language developed by Arthur Whitney and commercialized by Kx Systems. The language serves as the foundation for kdb+, an in-memory, column-based database, and other related financial products. The language, originally developed in 1993, is a variant of APL and contains elements of Scheme. Advocates of the language emphasize its speed, facility in handling arrays, and expressive syntax.
There was also a great thread on HN about it as well[2].
And many programming languages do, for example C# and Racket. I have a feeling it isn't very hard to implement, since that's what you typically do if you need complex numbers in Java.
For people not accustomed to the style of Whitney, you can read various HN threads from the past to learn more about why he writes programs the way he does.
I'm going to have to take another run at learning this corner of computing soon, but it's a prospect I'm not relishing. Everything about it rubs me up the wrong way.
If you'd like an antidote, have a read of Gerald Jay Sussman's books, where you'll see profound concepts from maths and physics captured in succinct and expressive (as opposed to merely terse) code, accompanied by eloquent explanations devoid of boasts or name dropping and provided free of charge online. That will change the way you think about computing too, but it will be a more pleasant experience.
That's a good place to start but it's primarily about programming itself. In "Structure and Interpretation of Classical Mechanics" and "Functional Differential Geometry" he applies his approach of using computer programs as a way of communicating concepts to humans to some fascinating maths and physics topics.
Personally I believe in a sort of "evolution" of software that operates independently of intentions of the programmers.
I can totally believe that he didn't intentionally obfuscate it, but its incomprehensibility made it harder for other people to make a knockoff and thats why it survived and became successful.
I don't understand how. How do you debug something this? How do you go about fixing a bug? There are no docs nor tests. It seems like one would spend hours just trying to understand what's going on. Is there a good source on their methodology? Because my mind is blown.
I've heard it said that it becomes easy to spot common patterns and structures in this style. Without knowing the idioms it's difficult to read but apparently once you know the idioms, you can scan it and understand it well.
The only possible advantage I can think of is that you can fit more code on one screen so I guess in theory you can see your context more easily. But that seems pretty minor compared to... well, look at it!
I read a couple of other threads and some people try to claim less code = fewer bugs, but that's pretty clearly nonsense otherwise minifiers would magically fix bugs.
As for why people actually use this (it seems like they really do), my guess would be that it's used for write-only code, similar to regexes.
Like, using a regex you can't deny that you get a lot of power from not many keystrokes, and that's really awesome for stuff you're never going to read again, like searching code in your editor, one throwaway shell commands.
But they also tend to be completely unreadable and error prone and are best avoided in production code.
He showed K @ Royal Society & bunch of apl & aplus guys were there. Someone asked , where are the comments? AW said comments get out of date with all the changes, if you can’t read the code you Shldnt be working on it. We then all looked at each other..
This immaturity is one reason I don't like array languages: They ape mathematical conventions but mathematicians are interested in communicating, if only with other mathematicians, and so write clearly and understandably. That source code (if you can call it that, as I'm not entirely convinced it's the form Whitney himself programs in) is the opposite of clear to the point of being comically dense, in every sense of the term.
i spent a day writing code like that. it becomes readable surprisingly quickly.
i don't think i'm going to get into the practice of doing it to that extreme, but i'll probably adopt the multiple-things-per-line part for side projects, and skip the shortening of keywords
with a wide monitor, you can have 3 files like this open at a time. you can fit a surprisingly large program on the screen at once, which is the benefit of coding like that
> if you can’t read the code you Shldnt be working on it
An ultimate expression of insider elitism and knowledge hoarding, which can be a self-interested asset to profitable, closed-source, specialized software.
For everyone else, code should present no surprises whenever possible with semantic expressiveness and reserve comments for explanation of surprises, design choices, and protocols.
I don't know about you, but if you don't know the basics about woodworking and have no interest in learning, I certainly don't want you building my furniture. That, um, has nothing to do with elitism.
There is a real tradeoff between making code friendly to the uninitiated and making code ergonomic for the expert. It's completely natural that non-initiates feel unwelcome when the code is written for domain experts. In your company's codebase, is it really best to optimize for onboarding newcomers over optimizing for the productivity of your engineers? Where along that spectrum maximizes your goals?
Are we building chairs or are we building chair-builders?
> if you can’t read the code you Shldnt be working on it
I don't know this AW guy, but to me that's a huge red flag and a sign that a programmer hasn't worked on anything substantial. Ie non-trivial stuff that's maintained by a team over time.
Being able to read the code is irrelevant, as the comments should tell you why the code is doing what it's doing.
For example, yeah I trivially can see the code is doing a retry loop trying to create a file with the same name.
That looks like a bug, if you can't create the file with that name, you change the name in the retry loop.
But the comment will tell me this is due to certain virus scanners doing dumb stuff, so we might have to try the same name a few times.
Sure, good code will have few comments as most of it should be self-documenting through good structure and names of classes and variables. But in non-trivial code there will always be places where it is not obvious why the code does what it does.
Defining substantial as written by a team over a long period of time seems to be putting the value in the wrong place. Is the important thing what the software does, or that it took a lot of people a long time to write it?
This isn't the case. It's more likely a lack of business socialization combined with individual hyper-achievement. Reminds me of Ian Pratt in some ways.
An annoyance in tech, both startups and corporate, is technically-capable people but with outsized egos.
This language is very popular among quant finance people associated with Morgan Stanley. I don’t see the appeal myself. Maybe it helps prevent people stealing the code since it’s so awful looking to work with! At one point I had to learn it and I think I’ve totally forgotten it now— it’s like my brain repressed it. Not my cup of tea, that’s for sure.
Thinking about problems and data manipulation in the way array languages enable is hard but, once you have pushed through the what feels like a barrier of mainstream programming language thinking that is stopping you from “grokking” it, it is a sudden moment of clarity and then you “get it”.
Perhaps a half way house is sql. The difference between ORM-style CRUD and a power user using window functions to make the data dance shows there is still art to be had in programming :)
Agreed. Pushing through until you can think in array languages is well worth it! In my experience one of the top 30 highest ROI mental circuits you can develop.
That being said, I'm not convinced that the extremely minimal syntax is essential. I think it can be done another way ;)
Most people I know who actually learn an array language like k or j usually grow to appreciate the expressiveness and cleverness of these languages. Typically, people have your reaction who have only looked at it and tried it very briefly. I'm surprised. Why did you have to learn it? Where?
Was working at a quant pod at Millennium for a bit where they used it. I was ultimately able to use it but everything took me 20x longer than using Numpy/Pandas. The irony was that the Python code was shorter because there were so many more library functions and better abstractions and syntax. So it was slow and unintuitive for zero benefit whatsoever.
Functionality is, at this time, extremely limited (and the first kfun was out in January so I don't think there's really any intention to get this to usability on a short timeframe). No support for paired syntax like parentheses, functions in braces, and square brackets for indexing and function calls. No stranding or tacit functions. I doubt it's Turing complete. Many primitives are unimplemented and others are flaky: for instance calling count (#) on an atom can give an arbitrary number, print garbage, segfault, or allocate memory until it's killed. But it's got vector instruction support.
If you're looking for a practical k implementation, I recommend ngn/k, and several other implementations are listed at https://k.miraheze.org/wiki/Running_K .
He might be a smart person, with a very high IQ and on a different level than the rest of us, but by writing with this style, with no comments, with no proper capitalization/style and with this attitude, he’s putting me off (IMHO).
Oftentimes, the way something is presented and how the language is used, might be as important as the thing itself ;-)
If you want you can switch out his terse names in the .h and .c and see if that helps. I'm not so sure it does, but experience with array languages and a couple of decades with rather advanced C will. As in, experience is what matters rather than "IQ".
This reads like the incomprehensible ramblings of a mentally ill patient scribbling in the walls. There is zero context, zero explanation. Very vague or incomplete statements scattered all over the place. I don't understand what someone not already familiar with the project is supposed to take away from this.
Is the terseness of the site mean to reproduce the terseness of the language? Is that the gimmick?
Is there some application that demonstrates the utility of this language?
E.g. it's tempting to dismiss Haskell as something invented by mathematicians more concerned with the elegance of their abstractions than actually getting things done, but Pandoc is so undeniably good and useful that you're forced to admit Haskell can be a good choice. What's the Pandoc of K?
That's not the idea. In a sense, the Pandoc of K is K itself. I mean its designed for interactive, fast and terse scripting on financial data for quants. And it's incredibly good for that. So almost all substantial K is proprietary.
There is a lot to learn from him; tiny binaries, super fast performance; programming style you like or don't, that's fine. To have a 200kb binary that's a programming language + database is very nice. It's great we can study a part of it and probably more in the future. We went overboard with bloating and complexity; it's good to be shown you can write current enterprise/commercial products that fits in the memory of an 80s homecomputer without changing your style of programming or tools you use for it. IMHO anyway.
Not for all cases, but he (and his team) take the time to squeeze performance out of things where others just say 'it's fast enough'. There was a monh+ long conversation why all most used json parsers are so terribly slow etc. Not many people take the time to try to optimise the last drop of blood out of everything, especially if you have shareholders or deadlines; you settle for 'good enough'.
Since this was posted, the source code was changed, and a makefile was added.
The new version requires ARM 64 or Intel 64 with AVX2. It requires clang-13 (clang-14 and later won't work). Gcc doesn't work.
With clang-14, I got build errors. First error:
./a.h:38:30: error: use of unknown builtin ‘__builtin_ia32_pminub256’ [-Wimplicit-function-declaration]
OMG, just yesterday I wrote a comment saying that I regret not learning K (I instead chose J) due to being too hung up on the notion of free software at the time... What a coincidence! Now I have no excuses anymore, time to learn K!
K specializes in financial data, i.e. lists of 1d arrays. Other APLs, and J, are more high-dimensional math oriented and specialize in true multidimensional arrays.
IIRC, some old UNIX versions had an APL interpeter in the userland. For me, a k interpreter could be the ultimate UNIX utility. But interoperability with pipes and other UNIX utilities is awkward to say the least, as is having to use other programming languages as duct tape.
Some big claims, but I wonder if there are some published repeatable benchmarks
Also when someone claims 1000x better Performance I want to know why. For example MySQL or PostgreSQL -> Clickhouse I can clearly attribute to column store, compression, vectorization, parallel execution on multiple CPU cores and machines...
Just as Inform 7 works because people who write adventure games are the least likely to mind having to play "guess the verb", K source works because people who write vector languages are the least likely to mind expressing algorithms with, not display: block, but display: inline.
If you ask ChatGPT to reply in the style of Arthur Whitney, you get amazingly concise summaries. Like a language verion of this code. I use that prompt often.
And I though that Skala, FP (monad in X is just a monoid in the category of endofunctors of X) people are pretentious sect, this is so much worse, given their customer base, I immediately presume that this is nothing more than approach to deliberately make things unintelligible for as many people as possible, so that your white collar hedge fund guy would have heart stroke just by glancing at this source code, not even trying to read or understand it, that is one way to treat people and make business, it is despicable. Industry needs to formalise this into well known phenomena much like Security through obscurity [1] so that kind hearted pragmatic people avoid this like a plague.
The website reads like an edgy script-kiddy blog. Is K actually a useful project, or is it just a passion project of someone who happens to be sort of famous?
It’s niche, but a large part of the financial industry relies on it, for the heavy lifting of pricing and modelling, often with higher level APIs in Python or Java.
Incidentally, if you run into a group full of Northern Irish developers at any big bank, you have probably found the K folks.
> The website reads like an edgy script-kiddy blog.
The code does, as well. Either Mr. Whitney's brain is not wired like a regular homo sapiens sapiens, or the entire thing smells of "I am smarted than you and I don't need to lower myself to your level."
I do not buy for a single second that for Mr. Whitney debugging IOCCC-level obfuscated code is easier than plain C code. One writes "normal code" because one will have to read it later, and they don't want to spend ages doing so, unless they have to keep an air of superiority about their abilities to their peers.
I get that APL is obtuse and dense. But writing obtuse and dense C doesn't turn it into APL.
I worked with a guy who could handle loops nested half a dozen deep with data dependencies woven through the structure with exactly the same apparent cognitive overhead as for i = 0, N. The sort of structure you get when you arrange a difficult calculation to match the cache hierarchy of the target machine. Didn't do comments or variable names with much enthusiasm.
He was superb at finding errors at code review. As in looking through code someone else had written and pulling out the mistakes. Presumably everything looked completely trivial to him, regardless of how tangled the control flow had got.
Script kiddies bust their butts phishing and installing black market ransomware. This Whitney fellow is probably sitting in his office somewhere expecting people to just throw $100k (per month!) at him. ;-)
FYI: This only implements a subset of K (I'd estimate 1/3).
Calling it a 'release' is an overstatement. The docs state that it is a work in progress. It's also quite buggy (it's easy to get a segmentation fault). The version I saw in January was about 1/3 the size of this version, and also buggy. I hope that the final version of this code is less buggy and more usable.
If you want to learn the K language, don't use this version. Any of the other open source K projects are better than this (more complete, less buggy, better documented). This project is good if you want to learn more about the Arthur Whitney C coding style, because it is so small. Other projects written in this style (some open source K implementations, the J language) are huge by comparison.
> Any of the other open source K projects are better than this (more complete, less buggy, better documented).
One thing that puzzles me, about array languages, is that despite several open source implementations already existing, like J, its surprisingly difficult to find them packaged in Linux repositories. For example, you can't just "apt install J", or "apt install gnu-apl" on Ubuntu. In J case, it seems the default is just compiling it from source. Is there something tricky about packaging them?
The closest to a repository-friendly array language I could find was the klongpy implementation of klong[0], that is pip installable.
[0]. https://t3x.org/klong/
You can 'apt install apl' for GNU APL. Most open-source array languages though either have very few users, and/or are moving quite fast and thus an apt-packaged version would likely be rather out-of-date quite basically always. Though, for example, nix has J, BQN, uiua, GNU APL, and Dyalog APL (based on quick searches), so the barrier to entry to apt also is presumably rather high.
I could swear that I used to install j902 from apt on Ubuntu. Am I misremembering this?
1 reply →
it's work-in-progress for fun/educational purposes (i.e. read and learn something)
Those are some very big claims with respect to performance. Has anyone outside of the author been able to reproduce the claims, considering you need to pay 100k/month just to do it?
I also wonder if the commercial version has anti-benchmark clauses like some database vendors. I've always seen claims that K is much faster than anything else out there, but I've never seen an actual independent benchmark with numbers.
Edit: according to https://mlochbaum.github.io/BQN/implementation/kclaims.html, commercial licenses do indeed come with anti-benchmark clauses, which makes it very hard to take the one in this post at face value.
I used to use K professionally inside a hedge fund a few years back. Aside from the terrible user experience (if your code isn’t correct you will often just get ‘error’ or ‘not implemented’ with no further detail), if the performance really was as stellar as claimed, then there wouldn’t need to be a no benchmark clause in the license. It can be fast, if your data is in the right formats, but not crazy fast. And easy to beat if you can run your code on the GPU.
The last point is spot on... Pandas on GPUs (cudf) gets you both the perf + usability, without having to deal with issues common to stack/array languages (k) and lazy languages (dask, polars). My flow is pandas -> cudf -> dask cudf , spark, etc.
More recently, we have been working on GFQL with users at places like banks (graph dataframe query language), where we translate down to tools like pandas & cudf. A big "aha" is that columnar operations are great -- not far from what array languages focus on -- and having a static/dynamic query planner so optimizations around that helps once you hit memory limits. Eg, dask has dynamic DFS reuse of partitions as part of its work stealing. More SQL-y tools like Spark may make plans like that ahead of time. In contrast, that lands more on the user if they stick with pandas or k, eg, manual tiling.
I've been using kdb/q since 2010. Started at a big bank and have used it ever since.
Kdb/q is like minimalist footwear. But you can run longer and faster with it on. There's a tipping point where you just "get it". It's a fantastic language and platform.
The problem is very few people will pay 100k/month for shakti. I'm not saying people won't pay and it won't be a good business. But if you want widespread adoption you need to create and an ecosystem. Open sourcing it is a start. Creating libraries and packages comes after. The mongodb model is the right approach IMO
2 replies →
Would you recommend K?
Is something else better (if so what)?
9 replies →
no benchmark clause sounds like webgpu
Quick for building a website? Probably not.
Quick for evaluating some idea you just had if you are a quant? Yes absolutely!
So imagine you have a massive dataset, and an idea.
For testing out your idea you want that data to be in an “online analytical processing” (OLAP) kind of database. These typically store the data by column not row and other tricks to speed up crunching through reads, trading off write performance etc.
There are several big tech choices you could make. Mainstream king is SQL.
Something that was trendy a few years ago in the nosql revolution was to write some scala at the repl.
It is these that K is competing with, and being faster than.
I would probably use Matlab for that sort of stuff tbh. Is K faster than Matlab?
12 replies →
Array languages are very fast for code that fits sensibly into arrays and databases spend a lot of compute time getting correctness right. 100x faster than postgres on arbitrary data sounds unlikely-to-impossible but on specific problems might be doable.
Yes, when I took a look at shakti's database benchmarks before, they seemed entirely reasonable with typical array language implementation methods. I even compared shakti's sum-by benchmarks to BQN group followed by sum-each, which I'd expect to be much slower than a dedicated implementation, and it was around the same order of magnitude (like 3x slower or something) when accounting for multiple cores. I was surprised that something like Polars would do it so slowly, but that's what the h2o benchmark said... I guess they just don't have a dedicated sum-by implementation for each type. I think shakti may have less of an advantage with more complicated queries, and doing the easy stuff blazing fast is nice but they're probably only a small fraction of the typical database workload anyway.
The comparison posted may be true, but on some level it doesn't make sense. It's like this old awk-vs-hadoop post https://adamdrake.com/command-line-tools-can-be-235x-faster-...
Yes, a very specific implementation will be faster than a generic system which includes network delays and ensures you can handle things larger than your memory in a multi-client system. But the result is meaningless - perl or awk will also be faster here.
If you need a database system, you're not going to replace it with K, if you need super fast in-memory calculations, you're not going to use a database system. Apples and oranges.
They at least used to have free evaluation licenses that were good for a month. Our license was even unlocked for unlimited cores.
I doubt they'd give them out to a random individual or small startup, but maybe still possible for a serious potential customer.
Interesting things in here.
I didn't know that corner of C. Removing the () from the macro does change what you can pass as e, and assigning the result of a block does work as one would expect.
edit:
-Wpedantic on gcc will tell me ISO C doesn't like the construct but it still compiles it happily.
Clang offers -Wgnu-statement-expression-from-macro-expansion
So it looks likely that this is the GNU statement expression extension after all and not a part of C. Shame.
You can use these to implement Rust-style `auto x = TRY(...);` in C++ which is pretty nice. Unfortunately MSVC doesn't support this extension.
Yes, this is called statement-expression (instead of expression-statement which is the normal "doer" statement that contains just an expression followed by a semicolon).
The Linux kernel makes quite a bit of use of them as far as I'm aware.
Yeah, it's a gcc extension. Details at https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Statement-Expr...
Everyone here charges too little for software.
https://groups.google.com/g/shaktidb/c/5SPufca3mo4
They’ve been charging a that amount forever, it’s a crazy ask. But you’ll be happy to hear that the quoted price is about 80% off of the price in 2000, so take advantage of the discount. In 2000 it was $100K/month.
Inflation has been 80% since then, but that doesn't mean it's 80% off. $1000 in 2000 is $1800 today, so a discount of 44%. 80% off would imply it's $5000 today, but prices didn't 5x fortunately.
Good joke, though.
2 replies →
k is widely used by a handful of big investment banks and big hedge funds for quant/finance stuff. There are only a handful of such companies in the world but they are extremely price insensitive, especially with regard to technology that lets them get a market edge. I suspect this is the dynamic that kx, the company, tapped into over the years. I also suspect this open source release is mainly because investment banks have come around on their desire for open source (rather than proprietary) software over the years, at least on some teams. You can see the open source release doc explicitly positions k vs Python, pandas, and polars.
For example, I have an old friend from a major investment bank who used to work on an internal (proprietary) pub/sub system but who, these days, works on integrations between that system and Apache Kafka.
I mean, it really depends on who your customers are. You can charge a lot if you can make the typical financial analyst 20% more productive.
Wouldn't work the same way if your core customer base is elementary school teachers.
For those who haven't heard of or aren't familiar with K, the Wikipedia page[1] has a remarkably helpful brief overview:
> K is a proprietary array processing programming language developed by Arthur Whitney and commercialized by Kx Systems. The language serves as the foundation for kdb+, an in-memory, column-based database, and other related financial products. The language, originally developed in 1993, is a variant of APL and contains elements of Scheme. Advocates of the language emphasize its speed, facility in handling arrays, and expressive syntax.
There was also a great thread on HN about it as well[2].
[1] https://news.ycombinator.com/item?id=28493283
Does it support imaginary numbers or did they take them out like everyone else...
J supports them well, so I'd guess K does too.
And many programming languages do, for example C# and Racket. I have a feeling it isn't very hard to implement, since that's what you typically do if you need complex numbers in Java.
For people not accustomed to the style of Whitney, you can read various HN threads from the past to learn more about why he writes programs the way he does.
It’s deliberate and powerful.
Here is a recent one: https://news.ycombinator.com/item?id=39026551
There was an epic post some years ago but couldn’t find it now from my phone.
I'm going to have to take another run at learning this corner of computing soon, but it's a prospect I'm not relishing. Everything about it rubs me up the wrong way.
If you'd like an antidote, have a read of Gerald Jay Sussman's books, where you'll see profound concepts from maths and physics captured in succinct and expressive (as opposed to merely terse) code, accompanied by eloquent explanations devoid of boasts or name dropping and provided free of charge online. That will change the way you think about computing too, but it will be a more pleasant experience.
Which one do you recommend?
The first one I found was "Structure and Interpretation of Computer Programs"
https://web.mit.edu/6.001/6.037/sicp.pdf
That's a good place to start but it's primarily about programming itself. In "Structure and Interpretation of Classical Mechanics" and "Functional Differential Geometry" he applies his approach of using computer programs as a way of communicating concepts to humans to some fascinating maths and physics topics.
1 reply →
The code looks heavily obfuscated. It's more like "source available" than open source. E.g.
Edit: Looking at it a bit more, I can't tell if the code is obfuscated or if the author really wrote it like this...
That's the "Whitney style". See: https://code.jsoftware.com/wiki/Essays/Incunabulum
It's writing C in array-language style rather than intentional obfuscation.
Thanks. I'm now reading this where people are trying to explain what happened in the ref/ directory.
https://github.com/kparc/ksimple/blob/main/a.c
5 replies →
Personally I believe in a sort of "evolution" of software that operates independently of intentions of the programmers.
I can totally believe that he didn't intentionally obfuscate it, but its incomprehensibility made it harder for other people to make a knockoff and thats why it survived and became successful.
You may not believe it, but that's how K/Q/J people write C code.
Bonus: Go visit and do "View Source" on that website. Even HTML has fragrance of K.
I don't understand how. How do you debug something this? How do you go about fixing a bug? There are no docs nor tests. It seems like one would spend hours just trying to understand what's going on. Is there a good source on their methodology? Because my mind is blown.
16 replies →
Does writing in this way has any advantage in practical terms (not aesthetically reasons)?
I've heard it said that it becomes easy to spot common patterns and structures in this style. Without knowing the idioms it's difficult to read but apparently once you know the idioms, you can scan it and understand it well.
2 replies →
The only possible advantage I can think of is that you can fit more code on one screen so I guess in theory you can see your context more easily. But that seems pretty minor compared to... well, look at it!
I read a couple of other threads and some people try to claim less code = fewer bugs, but that's pretty clearly nonsense otherwise minifiers would magically fix bugs.
As for why people actually use this (it seems like they really do), my guess would be that it's used for write-only code, similar to regexes.
Like, using a regex you can't deny that you get a lot of power from not many keystrokes, and that's really awesome for stuff you're never going to read again, like searching code in your editor, one throwaway shell commands.
But they also tend to be completely unreadable and error prone and are best avoided in production code.
8 replies →
That's just how Arthur Whitney writes code.
K by Arthur Whitney: http://archive.vector.org.uk/art10010830
vector the journal of the British APL Association
let them eat APL
When your eng dept comes to you and says "Boss, we want arrays!", then let them eat APL.
He showed K @ Royal Society & bunch of apl & aplus guys were there. Someone asked , where are the comments? AW said comments get out of date with all the changes, if you can’t read the code you Shldnt be working on it. We then all looked at each other..
So far the highest ratio of comment to code I've seen is Hsu's thesis (~200 pages of commentary[0] to ~20 lines of code[1]) at 10 pages/line.
[0] https://scholarworks.iu.edu/dspace/bitstreams/dcbd5240-8454-...
[1] https://www.bonfire.com/co-dfns-thesis-edition/
(a) come to think of it, theses are one and done
(b) thanks to Kragen for pointing out this 02019 work!
Heh, I just noticed Hsu's "above average" shirts: ⊢>+⌿÷≢
One of these days I need to (a) learn more about the Berber culture, and then (b) write an array language which exploits the ⵜⵉⴼⵉⵏⴰⵖ symbology.
https://en.wikipedia.org/wiki/Tifinagh#/media/File:Tifinagh_...
https://www.win.tue.nl/~aeb/natlang/berber/tifinagh/tifinagh...
https://www.edition-originale.com/media/h-3000-saint-exupery...
This immaturity is one reason I don't like array languages: They ape mathematical conventions but mathematicians are interested in communicating, if only with other mathematicians, and so write clearly and understandably. That source code (if you can call it that, as I'm not entirely convinced it's the form Whitney himself programs in) is the opposite of clear to the point of being comically dense, in every sense of the term.
Concision is the handmaiden of clarity.
i spent a day writing code like that. it becomes readable surprisingly quickly.
i don't think i'm going to get into the practice of doing it to that extreme, but i'll probably adopt the multiple-things-per-line part for side projects, and skip the shortening of keywords
with a wide monitor, you can have 3 files like this open at a time. you can fit a surprisingly large program on the screen at once, which is the benefit of coding like that
> if you can’t read the code you Shldnt be working on it
An ultimate expression of insider elitism and knowledge hoarding, which can be a self-interested asset to profitable, closed-source, specialized software.
For everyone else, code should present no surprises whenever possible with semantic expressiveness and reserve comments for explanation of surprises, design choices, and protocols.
I don't know about you, but if you don't know the basics about woodworking and have no interest in learning, I certainly don't want you building my furniture. That, um, has nothing to do with elitism.
There is a real tradeoff between making code friendly to the uninitiated and making code ergonomic for the expert. It's completely natural that non-initiates feel unwelcome when the code is written for domain experts. In your company's codebase, is it really best to optimize for onboarding newcomers over optimizing for the productivity of your engineers? Where along that spectrum maximizes your goals?
Are we building chairs or are we building chair-builders?
Where are the imaginary numbers?
> if you can’t read the code you Shldnt be working on it
I don't know this AW guy, but to me that's a huge red flag and a sign that a programmer hasn't worked on anything substantial. Ie non-trivial stuff that's maintained by a team over time.
Being able to read the code is irrelevant, as the comments should tell you why the code is doing what it's doing.
For example, yeah I trivially can see the code is doing a retry loop trying to create a file with the same name.
That looks like a bug, if you can't create the file with that name, you change the name in the retry loop.
But the comment will tell me this is due to certain virus scanners doing dumb stuff, so we might have to try the same name a few times.
Sure, good code will have few comments as most of it should be self-documenting through good structure and names of classes and variables. But in non-trivial code there will always be places where it is not obvious why the code does what it does.
Defining substantial as written by a team over a long period of time seems to be putting the value in the wrong place. Is the important thing what the software does, or that it took a lot of people a long time to write it?
1 reply →
> a sign that a programmer hasn't worked on anything substantial
Maybe you want to check who he is?
10 replies →
> hasn't worked on anything substantial
This isn't the case. It's more likely a lack of business socialization combined with individual hyper-achievement. Reminds me of Ian Pratt in some ways.
An annoyance in tech, both startups and corporate, is technically-capable people but with outsized egos.
for reference: https://en.wikipedia.org/wiki/Arthur_Whitney_(computer_scien...
This language is very popular among quant finance people associated with Morgan Stanley. I don’t see the appeal myself. Maybe it helps prevent people stealing the code since it’s so awful looking to work with! At one point I had to learn it and I think I’ve totally forgotten it now— it’s like my brain repressed it. Not my cup of tea, that’s for sure.
Thinking about problems and data manipulation in the way array languages enable is hard but, once you have pushed through the what feels like a barrier of mainstream programming language thinking that is stopping you from “grokking” it, it is a sudden moment of clarity and then you “get it”.
Perhaps a half way house is sql. The difference between ORM-style CRUD and a power user using window functions to make the data dance shows there is still art to be had in programming :)
Agreed. Pushing through until you can think in array languages is well worth it! In my experience one of the top 30 highest ROI mental circuits you can develop.
That being said, I'm not convinced that the extremely minimal syntax is essential. I think it can be done another way ;)
6 replies →
But tbh it looks worse than matlab
Most people I know who actually learn an array language like k or j usually grow to appreciate the expressiveness and cleverness of these languages. Typically, people have your reaction who have only looked at it and tried it very briefly. I'm surprised. Why did you have to learn it? Where?
Was working at a quant pod at Millennium for a bit where they used it. I was ultimately able to use it but everything took me 20x longer than using Numpy/Pandas. The irony was that the Python code was shorter because there were so many more library functions and better abstractions and syntax. So it was slow and unintuitive for zero benefit whatsoever.
7 replies →
Functionality is, at this time, extremely limited (and the first kfun was out in January so I don't think there's really any intention to get this to usability on a short timeframe). No support for paired syntax like parentheses, functions in braces, and square brackets for indexing and function calls. No stranding or tacit functions. I doubt it's Turing complete. Many primitives are unimplemented and others are flaky: for instance calling count (#) on an atom can give an arbitrary number, print garbage, segfault, or allocate memory until it's killed. But it's got vector instruction support.
If you're looking for a practical k implementation, I recommend ngn/k, and several other implementations are listed at https://k.miraheze.org/wiki/Running_K .
He might be a smart person, with a very high IQ and on a different level than the rest of us, but by writing with this style, with no comments, with no proper capitalization/style and with this attitude, he’s putting me off (IMHO).
Oftentimes, the way something is presented and how the language is used, might be as important as the thing itself ;-)
Might want to read Notation as a Tool of Thought: https://www.eecg.utoronto.ca/~jzhu/csc326/readings/iverson.p...
If you want you can switch out his terse names in the .h and .c and see if that helps. I'm not so sure it does, but experience with array languages and a couple of decades with rather advanced C will. As in, experience is what matters rather than "IQ".
But but look at all the Turing Award and Putnam Prize winners he was worked with.
I’ve always told myself- no matter how smart my idea is , if no one else understands it it could be as if it didn’t exist.
"But in science the credit goes to the man who convinces the world, not to the man to whom the idea first occurs" - Francis Darwin
Nobody needs to understand the code except for the people writing and maintaining it. The users just need it to run.
Well I know what the next episode of the arraycast is going to talk about now.
Already looking forward to when it might be addressed. :)
Just packaged it up on my private Guix channel. Here's the package def (along with the one for ngn/k as well) if anyone's interested: https://gist.github.com/xelxebar/c37ab9285b297fed3e9e0f9ce78...
This reads like the incomprehensible ramblings of a mentally ill patient scribbling in the walls. There is zero context, zero explanation. Very vague or incomplete statements scattered all over the place. I don't understand what someone not already familiar with the project is supposed to take away from this.
Is the terseness of the site mean to reproduce the terseness of the language? Is that the gimmick?
> I don't understand what someone not already familiar with the project is supposed to take away from this.
This isn‘t an advocacy piece directed at the general public. You‘re not his audience.
Fortunately, there is secondary commentary, like this thread, so we can get an idea what this is about.
Is there some application that demonstrates the utility of this language?
E.g. it's tempting to dismiss Haskell as something invented by mathematicians more concerned with the elegance of their abstractions than actually getting things done, but Pandoc is so undeniably good and useful that you're forced to admit Haskell can be a good choice. What's the Pandoc of K?
That's not the idea. In a sense, the Pandoc of K is K itself. I mean its designed for interactive, fast and terse scripting on financial data for quants. And it's incredibly good for that. So almost all substantial K is proprietary.
[dead]
The "killer app" of K is KDB.
i know Formula 1 has enormous, voluminous, live data feeds streamed from the cars, and god knows where else; kdb/q supposedly enable such.
Is this line in a.c enough for MIT license?
nice to see an exuberantly verbose arthur
The weirdest part about that is that copyrights do not need to mention years, so it's verbose in unnecessary ways.
The loquaciousness brings two Calvin Coolidge anecdotes to mind, the punchlines to which are: "you lose" and "with the same hen?"
3 replies →
This is probably his last attempt at leaving a legacy given his age. I wonder if Wolfram would do something similar.
There is a lot to learn from him; tiny binaries, super fast performance; programming style you like or don't, that's fine. To have a 200kb binary that's a programming language + database is very nice. It's great we can study a part of it and probably more in the future. We went overboard with bloating and complexity; it's good to be shown you can write current enterprise/commercial products that fits in the memory of an 80s homecomputer without changing your style of programming or tools you use for it. IMHO anyway.
Sadly this isn't the norm.
Imho software size should reflect complexity of the problem domain. Not arbitrary metrics like say, the capabilities of a system executing it.
So "Hello World!" should weigh in at mere bytes. Not KBs or even MBs.
Can someone explain how it can be "faster" than anything else?
I think he's referring to the trillion-row regime: https://news.ycombinator.com/item?id=40522433
My money is on "it's not, and the benchmarks are cherry picked"
I mean faster at filtering data than a python script? Sure. Faster than a database or hand-rolled C code? Only if your benchmarks are misleading.
Not for all cases, but he (and his team) take the time to squeeze performance out of things where others just say 'it's fast enough'. There was a monh+ long conversation why all most used json parsers are so terribly slow etc. Not many people take the time to try to optimise the last drop of blood out of everything, especially if you have shareholders or deadlines; you settle for 'good enough'.
3 replies →
Since this was posted, the source code was changed, and a makefile was added.
The new version requires ARM 64 or Intel 64 with AVX2. It requires clang-13 (clang-14 and later won't work). Gcc doesn't work.
With clang-14, I got build errors. First error: ./a.h:38:30: error: use of unknown builtin ‘__builtin_ia32_pminub256’ [-Wimplicit-function-declaration]
Seems to be related to this LLVM change which removed the above builtin: https://reviews.llvm.org/D117798
When I replaced __builtin_ia32_pminub256 with __builtin_elementwise_min and ditto for max, then it compiles and apparently works.
OMG, just yesterday I wrote a comment saying that I regret not learning K (I instead chose J) due to being too hung up on the notion of free software at the time... What a coincidence! Now I have no excuses anymore, time to learn K!
Regret why? It's nice K is finally a viable language for communication and learning.
How those two compare?
K specializes in financial data, i.e. lists of 1d arrays. Other APLs, and J, are more high-dimensional math oriented and specialize in true multidimensional arrays.
K is pragmatically business-oriented, J is what you get after you've been thinking about computing for half a century?
Other than ngn/k ...
https://ktye.github.io/kdoc.htm
https://github.com/ktye/i/releases/download/latest/k.c
IIRC, some old UNIX versions had an APL interpeter in the userland. For me, a k interpreter could be the ultimate UNIX utility. But interoperability with pipes and other UNIX utilities is awkward to say the least, as is having to use other programming languages as duct tape.
The link to the archive ‘k.zip’ has moved to https://shakti.com/ with terse documentation in the ‘education’ section.
The source is an IOCCC candidate and has zero tinkering value.
Does the page 404 for anyone else? Is this a Europe thing?
https://shakti.com/k/k.zip gives 404 not found
try https://shakti.com/k.zip
Some big claims, but I wonder if there are some published repeatable benchmarks
Also when someone claims 1000x better Performance I want to know why. For example MySQL or PostgreSQL -> Clickhouse I can clearly attribute to column store, compression, vectorization, parallel execution on multiple CPU cores and machines...
For what it's worth, there are some benchmarks of kdb+ (the database built on k) here - https://tech.marksblogg.com/billion-nyc-taxi-kdb.html (it's overall a fantastic series of blog posts).
The website, the comments in this thread, and the various C snippets shared here all make me feel very stupid.
Don’t. Feel inspired. Nothing worth learning comes easy.
I think the link for k.zip was just removed.
EDIT: shakti.com/k/k.zip is now returning 404.
Thanks, I thought I was going insane a little bit there :-)
That's ... er ... cryptic.
Just as Inform 7 works because people who write adventure games are the least likely to mind having to play "guess the verb", K source works because people who write vector languages are the least likely to mind expressing algorithms with, not display: block, but display: inline.
Coukd this run in Wasm on OPFS - like how SQlite got official support for OPFS?
How does it compare to the k that you can license from shakti or to ngn/k?
Ngn/k is GPL and thus more restrictive. https://codeberg.org/ngn/k/src/branch/master/LICENSE
as of right now it's a desktop calculator, functions, conditionals and loops are missing, you can see its scope here https://shakti.com/k/k.d
Eat your heart out:
https://codeberg.org/ngn/k/src/branch/master/0.c
Clicking through the page, I'm still not quite sure what I'm seeing.
It's rather obscure to say the least, but it's apparently an open source release of https://en.wikipedia.org/wiki/K_(programming_language)
Click on k, then k.zip.
If you ask ChatGPT to reply in the style of Arthur Whitney, you get amazingly concise summaries. Like a language verion of this code. I use that prompt often.
New link https://shakti.com/k.zip
Apl always was kind of write only language. But now you probly can paste all code in chat gpt and it will explain it
How to compile these codes under clang15?
Not very informative homepage... A lang+db? Fast on one machine? Or does it distribute?
I would be surprised if the database is here. Does not it rely on Q?
This is a new implementation not related to Q (same main author of course though).
“for: hedgefunds banks manufacturers formula1 ..”
... Ah, so this is regexes but for math. Got it.
And I though that Skala, FP (monad in X is just a monoid in the category of endofunctors of X) people are pretentious sect, this is so much worse, given their customer base, I immediately presume that this is nothing more than approach to deliberately make things unintelligible for as many people as possible, so that your white collar hedge fund guy would have heart stroke just by glancing at this source code, not even trying to read or understand it, that is one way to treat people and make business, it is despicable. Industry needs to formalise this into well known phenomena much like Security through obscurity [1] so that kind hearted pragmatic people avoid this like a plague.
[1] https://en.wikipedia.org/wiki/Security_through_obscurity
It’s not at all clear where you read pretentiousness. Is it the mere fact of its existence?
Try it or any other APL for 3 months. You will change your mind.
The website reads like an edgy script-kiddy blog. Is K actually a useful project, or is it just a passion project of someone who happens to be sort of famous?
It’s niche, but a large part of the financial industry relies on it, for the heavy lifting of pricing and modelling, often with higher level APIs in Python or Java. Incidentally, if you run into a group full of Northern Irish developers at any big bank, you have probably found the K folks.
> The website reads like an edgy script-kiddy blog.
The code does, as well. Either Mr. Whitney's brain is not wired like a regular homo sapiens sapiens, or the entire thing smells of "I am smarted than you and I don't need to lower myself to your level."
I do not buy for a single second that for Mr. Whitney debugging IOCCC-level obfuscated code is easier than plain C code. One writes "normal code" because one will have to read it later, and they don't want to spend ages doing so, unless they have to keep an air of superiority about their abilities to their peers.
I get that APL is obtuse and dense. But writing obtuse and dense C doesn't turn it into APL.
I worked with a guy who could handle loops nested half a dozen deep with data dependencies woven through the structure with exactly the same apparent cognitive overhead as for i = 0, N. The sort of structure you get when you arrange a difficult calculation to match the cache hierarchy of the target machine. Didn't do comments or variable names with much enthusiasm.
He was superb at finding errors at code review. As in looking through code someone else had written and pulling out the mistakes. Presumably everything looked completely trivial to him, regardless of how tangled the control flow had got.
Whitney may be similar.
What if he is smarter than you and doesn’t need to lower himself to your level.
Oh, it's just racking millions of dollars from big bank users every year. Nothing a script-kiddy couldn't achieve...
Script kiddies bust their butts phishing and installing black market ransomware. This Whitney fellow is probably sitting in his office somewhere expecting people to just throw $100k (per month!) at him. ;-)
Consider, mutatis mutandis, https://www.smbc-comics.com/comic/why-i-couldn39t-be-a-math-...
Lagniappe: click on the circular red button underneath the comic, to the right of the orange "RANDOM" :-)