CatalaLang/catala: Programming language for law specification

3 years ago (github.com)

This project seems to implicitly assume that a formally specified code of laws, where statutes can be interpreted largely mechanistically, is a good thing (and by extension, that the existing system of human interpreters with discretion and margins of error is a problem to be overcome).

I don't disagree with this assumption outright, but it's certainly not obvious to me that it is correct, and the authors appear to present no arguments supporting the same.

  • I suspect the motivation of this project is a lot more practical than what you’re projecting onto it, and I suspect you might be discounting how often law is transformed into code that runs on state IT systems.

    As someone who is familiar with the process of turning statutes into code, I can appreciate what this project endeavors to do, even if the value is limited to providing clarification for software developers.

    • It is hugely useful. I worked on implementation of law in code for 10 years and the big problem is this: there's a constant communication between software devs and law poeple. For exmaple "did you code this like that ?", "why did the system did that ?", "if I code this way, does it reflect the law precisely ?". You have to constantly trnaslate from the code base to the language of lawyers and vice versa. Doing that with regular code is no good: regular code ends up mixing law code and technical code. Making a clear separation with a dedicated language is a big plus.

      1 reply →

  • This seems obvious to me and seems something that law already strives to: you end up with very precise law regardless via case law, but with a high legal cost to reach that point. At the very least, it's not something I'd expect to be on Catala's homepage.

    I think this is closely related to rule of law too. Per https://www.britannica.com/topic/rule-of-law

    > In particular, laws should be open and clear, general in form, universal in application, and knowable to all.... The law should... comprise determinate requirements that people can consult before acting, and legal obligations should not be retroactively established.

    What are the benefits of ambiguous laws?

    • "What are the benefits of ambiguous laws? "

      Freedom and flexibility. Reality is infinitely complex, trying to encode real world problems 100% accurate into code can only work by rounding a lot, with the result of oversimplification with the result of lots of "unjust" rulings, when you do not take the intent of the law into account, but only the words. We have this problem already (a lot) many things that seem very wrong but are legal by the letters of the law.

      (Mobsters being released because of formalities for example, even though everyone knee they were guilty)

      6 replies →

    • Laws very often look for intent of actions as well as the actions themselves.

      And they are ambiguous: "beyond reasonable doubt" is what, as a percentage, in your view?

    • > What are the benefits of ambiguous laws?

      That they can deal with the massive amounts of ambiguity & blurred lines / grey areas the world has to offer. Too many of our everyday concepts can’t be expressed rigorously for a formal encoding & determinate decision process - a famous example being obscenity’s famed “I know it when I see it” definition - and many are highly context dependent.

  • It fits quite will within French legal tradition. Remember that it was Montesquieu himself who said 'le juge n'est que la bouche de la loi'. When I was working on my law degree, I (being a programmer) was quite interested in this field, and (although my French wasn't/isn't great, so I didn't get to go as deep as I would have liked) I found a large community of such legalistic work in French speaking parts of the world. The professor at my university who was into this sort of stuff also seemed to lean towards that sphere of influence.

For years I've tried to convince my lawyer friend that something exactly like this would be great to have, and then it turns out to have existed probably all the time.

I think this is truly awesome.

Every law should be written in a language like this, and presented publicly with syntax highlighting and consistent formatting rules.

Then it should be made part of the school curriculum to learn the law language.

I believe it would greatly improve everyone's ability to read laws and be confident about their understanding of them, which would be a huge boon for society.

  • This is quite a misunderstanding of how the law actually works, probably enhanced by lots of lawyer TV emphasizing obscure wording tricks.

    In reality, laws are already written in a relatively normal language, and the words almost always mean exactly what they mean in plain English. The only problem is that the legal concepts they describe are themselves complex, and often they end up in a tangle of references to other laws and regulations and legal precedent etc.

    • IANAL but from my experience reading laws, they're written in a way that looks like it's trying to replicate programming-language-esque nested logic, but in prose format—instead of using physical layout to establish the relationships between concepts, they use words, which I find more confusing. I would rather read laws written in a more structured format.

      5 replies →

    • The words almost always mean almost exactly what they mean in plain English. That’s why the law is a huge mess.

      As a fun exercises, try to find how many definitions of “child” there are in the US law and how many times it’s used undefined.

      28 replies →

    • I don't watch that kind of television much, but since I'm not a lawyer, my view of the world is probably too simplistic.

      Still I'd argue that normal language is very poor at handling that tangle of references.

      A good programming language would make those references very easy to untangle and present in their untangled form.

      When I've read (Danish) laws, I've often thought that they would read better as if statements.

      It's not that I think those laws are written in legalese, it's that they are expressing logic in a suboptimal way. Like how "four plus four equals eight" is a suboptimal way to express what could be expressed with 4+4=8.

      5 replies →

    • I think this is a big part of it. My sense (as a non-lawyer who has looked at a fair number of laws and contracts) is that, in addition, there are plenty of laws and contracts that are just poorly written and wording or constructions that lawyers have retained out of caution or traditionalism. This last case, traditionalism and caution, is maybe a special case of the other cases, but it's not always obvious.

      8 replies →

    • I don't know if what you say about laws is correct, but it's certainly not a correct description of contracts. The general public is constantly confronted with utterly unreasonable legal documents: far too long, far too unclear, far too complex. The lawyers writing these, and the entities paying them, both know that the public will never read or understand them. It's pure cinicism.

      3 replies →

  • Although in principle I agree with you, the law generally depends too much on interpretation and precedent to be expressed and understood like you’re hoping for.

    • That information should be available in summarized anonymized form, referenced from the law language.

      The judiciary should keep it up-to-date.

  • And the links! OMG, you could trace precedents and justifications and citations all the way back to the ur-utterances of Hammurabi. Reading that stuff could be quite educational.

    • Former R&D director of legaltech company here. Lawyers already use tools that link everything.

      For instance in the U.S., the Westlaw search engine for legal cases is such a tool, it can parse legal citations and turns them into hyperlinks. It rankes cases given a query based on a state-of-the art machine learning based IR method, and it is aware of cases in the ranking that are currently overturned by higher courts (shown as red flags) or being reviewed (shown as yellow flags).

      The software can also predict the outcome of a legal case and recommend courses of action that makes winning more likely. It includes ruling statistics about all sitting judges.

      Curiously, lawyers like to see the world as static, they do not like e.g. search results to differ between sessions, but of course cases get decided dynamically every day, which must necessarily also change the search results for any given query.

      If people want to see what is the state of the art in AI and the law, I recommend you have a look at "AI & the Law" (ICAIL, the annual conference and the journal of the same name).

    • Am actually working on a project like this. Not quite to the level of insanity you're thinking about, but certainly a couple of thousand years.

  • A big problem is you lose all the case law that interprets the law. Like discarding two centuries of bug reports and patches.

    At the same time, laws have been codified (i.e. the case law rewritten coherently, combining all the patches combined), such as the Uniform Commercial Code.

It would be interesting to also "weave" in test cases. The workface of logic statements is exactly where bugs are introduced.

Especially around temporal events, and that goes to formal models (and even more bugs).

Typically, if there is a rule around height, there would be at least three tests¹: one taller, one equal to, and one shorter. (Without types or something, then also negative, null, and max/min boundary inputs too.)

So you could have tests based on timelines, like

  Given a regulation is passed in 3 months
  And parties are prevented from exercising B
  But "17 tons" of waste are dumped anyway
  And ...
  When ...
  Then ...

Having a model checker integrated would be a boon. Maybe we could have DevOps-like pipelines in formally-verified legislature (or at least the encoding of language to code).

¹ https://en.m.wikipedia.org/wiki/Equivalence_partitioning

  • It will not compile (sarcasm).

    Many laws are written with a lot of double meaning (recent eu regulations on allowing or not allowing russian cars is a good example).

    Though, it could be a good idea to find all the possible double meanings or vague definition when trying to "digitise" the laws into the programming language.

    • Right, some laws are just ftp-dragged & dropped to prod, understanding that it doesn't compile yet. But if legislator managed to express the _intention_ the courts will over time add and make all test cases pass.

      I'm sure the mentality of a PHP developer running a successful but insane legacy site is a better model for this than a perfect OCAML project :)

  • This is great. I often look at unit tests to understand how to use legacy code. Seeing a law come with concrete examples that are part of the legal text, would allow us to test if a rule is logically consistent, and test our understanding as well. Sort of how you want to convince your reviewer that your code works by proving it with a unit/integration/e2e test.

    End to end tests for legal frameworks?

  • imagine... TTD applied to laws.

    Or take it a step further, write a test suite and automatically generate a range of possible laws that satisfy the tests.

I think what I'd rather see is a standardize test suite format for laws that spells out the intentions.

Once I lived in a state that proposed a very simple anti-child porn law with good intent, but it was too simple. It read sort of like "anyone sending explicit pictures of minors from a cell phone will be guilty of conveying child porn". It was written in the proper legal jargon, but wasn't a whole lot more detailed than that. I called the sponsor of the bill and asked if that meant if my hypothetical daughter sent a naked picture of herself to her boyfriend, then wouldn't she be a felon under his new law? He had an "oh, crap, that's not what I meant!" reaction and ended up withdrawing the bill so it could be re-written. (Aside: I felt pretty good about that. Props to the legislator for being quick to understand and respond appropriately!)

Imagine if that were handled like program code, with a test like:

* This law does not apply to minors sending pictures of themselves.

That would do a few big things:

It would make legislators be clear about what they mean. "Oh, we'd never use this online child safety law to ban pro-trans content from the Internet!" "Great! Let's add that as a test case then." I confess that this is a deal breaker: politicians don't like being pinned down like that.

It would probably make it easier to write laws that reflect those intentions. "Hey, that law as written would apply to a 15 year old sexting her boyfriend! The code doesn't pass the tests."

Future courts could use that to evaluate a law's intent. "The wording says it applies to 15 year olds sending selfies, but the tests are explicit that it wasn't meant to. Not guilty."

I'm sure this couldn't happen for a hundred reasons, but I can dream.

  • I think for something like this to be effective, you need the actual intent encoded correctly (so this use case wouldn’t have been solved), and lawmakers acting in good faith (i.e., not drafting legislation that’s intentionally vague such that it can cast a wide net and force people to use the courts to dispute things).

    • You don't need those voting for the bill to act in good faith if you have as part of the system of passing bills that the opposition gets to write the (adversarial part of the) test suite. Then either that forces any loopholes (or other undesirable effects) to be: updated as explicitly intended (with bad publicity and potential for reversion upon a change of government); or taken into account and the bill updated to reflect that, or left in the test case for case law to cite as intended.

      Either way you really want intent to be encoded somehow.

  • 99% of the work is in coming up with the edge cases, and in law the most common thing to do with edge cases is call them out explicitly. I imagine the legislator went back and added a clause to the law that specified "it shall not be considered a violation of this section for a minor to send photos of themselves".

    Laws don't need to be computer-executable, they're about intent and the interpretation thereof, so the test suite itself is really part of the law and may as well just be embedded in it.

  • I partly like this idea in theory, but believe it is literally 100% impossible to come up with a better "test suite" than "the actual court system?"

    • The courts have to evaluate what they think the law's drafters meant: Yeah, it says this, but it's obvious the legislators didn't mean for it to be read that way. It'd be nice if there were footnotes that expounded on what the authors were trying to accomplish to help courts interpret the laws.

      2 replies →

    • I don't want me or my male family members to be labelled sex-offenders whilst you "test" the "court system" to see if the laws work as intended. All because some overzealous prosecutor wanted to be "tough" on "toxic masculinity".

      1 reply →

    • If it was 'literally 100% impossible', you wouldn't need to believe it, you'd know it to be so.

      As for test suites and courts - the two are complementary so there's need to compare them to one another.

    • Yeah, there’s no way that a modern computer could outdo the logical accuracy and processing power of our 300 year old legal system. Court rooms and arguing and paperwork, much more efficient than silicon.

      4 replies →

  • law tests would be good, though they probably have to be "evaled" via the same mechanism which would apply them. Meaning, courts :\

    I personally would be happy if any country would attach rationale for the law to the law itself. And possibly some KPI to see if it works. So the law could be reevaluated later, to see if it works at all, or maybe counterproductive, or maybe some major actual application of the law is not why it was introduced.

  • It's great that your congressperson was enlightened enough to pull the bill. Some states (such as Minnesota) apparently actively prosecute children for sexting.

    [1] https://www.aclu.org/news/juvenile-justice/minnesota-prosecu...

    • I was pleasantly surprised.

      Minnesota is so bizarrely, irrationally wrong on that one. That poor kid needs an adult to sit her down and explain why sending out nude pics as a minor is a really bad idea, not to label her as a sex offender. Now, if someone (especially an adult) received those pics and shared them, go ahead and charge that person.

  • Mindblown. I went to law school and now work as a developer but never thought about it. Writing tests for laws should totally be a thing.

    • > Writing tests for laws should totally be a thing.

      It is a thing already. In both the US and Germany it is common for lawmakers and regulators (e.g. the FCC which was here on HN to solicit comments a few days ago) to provide drafts of laws and regulations to interest groups so that these can raise issues they find.

Something like this got passing mention in Greg Bear's [1] book Moving Mars (1993) [2], under the name Legal Logic. The (human) Martians used it with AI assistance to formulate legislation for their newly independent society.

For those who don't know, Greg Bear was a well-known SF author who died less than a year ago. His passing was discussed here at the time [3] [4].

He was one of the authors that influenced my youth a great deal, and I particularly remember this aspect of Moving Mars as catching my imagination, so will be interested to read what Catala has to offer.

[1] https://news.ycombinator.com/item?id=33675708

programmers love to propose using "programming language" or similar for law

But this fails to realize that _ambiguity (in some ways) is a fundamental important part of law_.

This is because the world itself is fundamental ambiguous (in some ways)/clear cut.

Naturally not all ways of ambiguity are wanted.

But you can be sure that with "code as law" the ways loopholes are abused will get worse in my opinion.

I would even go as far that some many laws should be more focused on what should be upheld then the details how (which is fundamental less clear cut/more ambiguous).

  • Agreed. Although I don't think this is a bad idea, I think of the idea of perfectly defined laws and perfectly enforceable laws are terrifying. If every law on the books today were able to be perfectly enforced and perfectly monitored, our lives would be utterly miserable.

    I'm not going to argue that's a problem with laws vs. enforcement, but either way, our society is built around ambiguity and unequal enforcement of law.

    • I'm not speaking about perfect enforcement.

      But about that law, in difference to what movies love to pretend, is not about clever word tricks and nit-picking formulations.

      (In court it still can be about clever arguing, including nit picking arguments if necessary.)

      But code _is_ about nit picking formulations at least if we ignore documentation, naming conventions etc. but lock solely at what the code does.

      Code is meant to be precise.

      Law is meant to be only as precise as necessary but no more then that. Or you could say it's meant to be as imprecise as viable.

      Code is about the specific case (in general).

      Law is about the generic case (in general), avoiding specific cases where possible.

      Code is made for machines to consume.

      Law is meant to be consumed with ambiguous defined context of the situations in (human) .

      This is so deeply rooted in law that I would argue it's (in general, with exceptions) not possible to translate any current laws to code without accidentally changing their meaning in a lot of subtle but meaningful cases.

    • Enforcement and clarity are different concepts. We don't have a big problem with courts or police declining to enforce laws because they aren't precise enough.

    • Do you have any examples where ambiguity is truly beneficial?

      For all the examples I can think of, the most beneficial outcome is removing the law altogether.

      5 replies →

  • I don't think there is any reason ambiguity would clash with a project like this. Take "value.fair_market" in the concepts section [0]. Sure, lawyers can argue over what this means, but these competing definitions can also be defined programmatically.

    I agree with your idea that our interest in laws shouldn't focus on implementation details but I think they should focus on outcomes. This requires a method to produce an evaluation function to measure the outcomes of a new law, and a system such as Catala to help model expected outcomes and to help select between competing laws (eg if our outcome = "we want less pollution" then our policy might be "ban polluting industries" or "tax pollution externalities." Both have complex consequences which would be better analyzed automatically and measured empirically.)

    [0] https://github.com/CatalaLang/catala#concepts

I don't think there's much of a problem with actually reasoning about a law's text that a computer can help solve. The complicated bit is weighing equities, which still requires humans and lawyers.

  • Absolutely. Although the clarity by creating algorithms from tax tables can be helpful, and sometimes the wording seems ambiguous. Although you probably also need lots and lots of examples. (It is as if you need unit tests!)

    • Most tax-related rules in the US are specified in an XML-based business rules language. That's partly how tax prep companies are able to get rules that don't finalize until 12/31 into products that have to ship 6 weeks later.

  • +1. I see one benefit of this language -- it could make it much easier to write programs to compute taxes and benefits. Beyond that I don't see what it could possibly offer.

    Are there any lawmakers, lawyers or judges excited about this, or is it only programmers?

  • I think you meant, "...humans, lawyers, and bribes."

    No, I don't have a lot of faith in our legal system. Why do you ask?

The naming choice is really unfortunate. It's like naming a programming language "français", or "Deutsch" (or "English").

From the bottom of the readme:

> The language is named after Pierre Catala

I'd suggest changing it to PierreLang then.

  • "The language is named after Pierre Catala, a professor of law who pionneered the French legaltech by creating a computer database of law cases, Juris-Data."

    As a native Catalan speaker I was quite surprised with the name! But it makes sense, since it's quite a common surname!

Laws would do well to follow the rules of software. Small modules with clear responsibilities with an emphasis on readability and test cases that are run before you go to prod, for example. Testing is expensive so I understand why the legal system would rather just push their code and fix bugs when they see them in the wild. The collateral damage for people caught up in real life test cases is tolerable, especially when it’s someone else footing the bill.

Linting and type checking the existing codebase would also be more helpful than rewriting everything in a new language. Enforcing size constraints on vocabulary and word count. Cross referencing between different legal systems. Throwing out dead laws that are no longer executed in prod. Profiling the efficiency of existing laws to find hot spots.

There’s little incentive to do this when the current system is run by a cadre of highly trained legacy COBOL programmers. I’d pick a very small part of the system — incorporate a new city and start from the ground up — and take it from there with the clear eyed expectation that a full rewrite is going to take a century.

  • Moreover I think laws work similarly to software. People wrote law, others find a loophole and use it, the people fix it with patches, and so on...

I was a bit confused by the fact that my first language is Catalan, which in Catalan is spelled Català. So yeah, imagine someone proposing a language specification for the law called English.

Ambiguity is a feature, not a bug. I used to spend a lott of time on business process automation, and even in those more structured and restricted settings trying to codify procedures most often fails. The reason is that reality has (a) so many edge cases that it very rapidly devolves into chasing down an ever diminishing ROI, (b) is unknown by the middle management and business analists, those that would have the authority to construct and sign off on it, and (c) relies on intelligent people applying creative pragmatic solutions to keep the business running and straightjacketing those into inflexible automatons is the most surefire way to sink the ship.

I'd be interested in seeing something like this for verifying game designs / new game rules given an existing design

  • Why? What do game designs have to do with law?

    • Games are activities bound by rules. Laws are rules for government/governed.

      AFAIK There's not really a programming language specific for describing how players interact in a game, so although there's no reason you couldn't implement it in any old programming language. I guess the same thing could be said of the law too until Catala.

      3 replies →

I think having a "linter" for laws can be beneficial. It can help producing laws that are easier to read and understand.

Having a "compiler" for laws can help identifying conflicts between different codes of law. e.g.: Imagine having a compiler error when a law is unconstitutional from a logical standpoint.

But verifying the "business logic" (e.g.: what is the spirit or intent of the law?) of the law will remain a human intelligence task.

My hunch is, in any sufficiently large rule set, there will be inconsistencies. Handwaily think Gödel, or just the need for bounded domains in DDD.

Humans (or, well, AI) is needed to cope with inconsistencies.

That said, pointing out the fact of existence of inconsistencies could be very valuable. But a system needs to embrace them, not fight them.

  • Gödel’s theorems don’t imply inconsistency for all large systems (unless “large” is taken to mean something strange), just for systems which are both not super-weak in what they can say, and complete (or if they have their own consistency as a theorem).

    I don’t think Gödel’s theorems particularly support the claim you’re making.

    In fact, here is an argument that a consistent rule-set (either can be extended to something consistent and complete, or ) can be extended to be made arbitrarily large and consistent:

    take a ruleset which is consistent, but for which there is something for which it has no prescription one way or the other (neither explicitly nor implied collectively by other rules) (I.e. “not complete”). Then, add a rule specifying that thing and nothing else which isn’t implied by that thing. This will be consistent, as if it were not, then the negation of the rule added would have already been an implication.

    This will either yield a larger ruleset of the same kind (consistent and incomplete), or it will yield one which is consistent and complete. Gödel’s theorems show that if the ruleset is an axiom system which is sufficiently expressive (e.g. contains Peano arithmetic) then the latter cannot be the result. So in this case, there are arbitrarily large extensions of the rule-set.

    If it isn’t an axiom system, or is one for a rather weak system, then the “the result is a consistent and complete system” option, well, why would you want it to be larger?

    Edit: perhaps what you are calling “inconsistencies” are what I would just call “exceptions”/“exceptional cases”?

    To my mind, “embracing an inconsistency” doesn’t seem to make much sense in the case of law? Something has to be what actually happens. We (whether fortunately or unfortunately) cannot bring an actual contradiction into reality.

    Well, I suppose if one takes a sub-truth(not sure if this is the right terminology? I mean the opposite of super-truth) approach to vague statements, one might say that a somewhat-bald man causes the statement “that man is bald, and also that man is not bald” to be true (and also false), and as such “bring a contradiction into reality”, but that’s not what I mean by the phrase.

    I mean there is no full precise-ification of any statement, which we can cause to be simultaneously true and false irl.

    Those acting as agents of the law must behave in some particular way.

    When legal requirements contradict, people will not satisfy both of them. Perhaps one will be considered to take priority. Perhaps a compromise position between the requirements will be sought. Perhaps it will be left to the judgement of those following it in a case-by-case basis.

    But in none of these cases is a contradiction implemented. Can they really be said to be embracing the contradiction?

    Upon writing this edit I realize that I’m probably misinterpreting that part of your comment. I suppose the thing you are saying to embrace is not the individual contradictions themselves, so much as the system’s rules-as-written having contradictions, and therefore the necessity of dealing with such contradictions when implementing the rules, as the scenarios to which the contradictory statements apply, occur.

    • I think parent might have been referring to the inconsistency that Gödel noticed in the US Constitution when applying for citizenship.

  • This is less of a problem in legal systems as the legal system self admits to resting on unproven axioms.

This paper definitely has thoughts on the matter https://law.mit.edu/pub/interpretingtherulesofcode/release/4

"As the Rules as Code movement gains momentum, questions are starting to be asked about the performance and practical effects of expressing law computationally. This article examines the strengths, weaknesses, and new opportunities of engaging with these emerging systems."

The Future of Coding podcast covered it recently https://futureofcoding.org/episodes/065

The abstract says, "Software code is built on rules. The way it enforces them is analogous in certain ways to the philosophical notion of legalism, under which citizens are expected to follow legal rules without thinking too hard about their meaning or consequences. By analogy, the opacity, immutability, immediacy, pervasiveness, private production, and ‘ruleishness’ of code amplify its ‘legalistic’ nature far beyond what could ever be imposed in the legal domain, however, raising significant questions about its legitimacy as a regulator."

It's a complex paper/topic that I personally need more time to grasp before throwing my opinions around too heavily. But my first, knee-jerk reaction so far is that moving laws into code is a bad idea. Specifically, as the paper says, "...code by its very nature tends toward a kind of strong legalism. This is the case regardless of the intent of the programmer, however vicious or virtuous that may be."

The "strong legalism" inherent in code means "the sovereign’s exercise of power is de facto legitimate, and thus not open to question." Not to be reductive, but that ain't good.

I feel we've seen evidence of this path already, with (easily refuted, but somewhat common) claims like "data can't be biased" (for example). The tendency to blindly follow a computer's dictate with, "Well, the computer says this is so, so it must be so." is strong in our society at times, I think.

The Catalan language name is written Català.

  • It is named after a French guy called Pierre Catala, not the language. Probably the name comes from the language but according to the documentation does not seem to be with acute accent.

    > The language is named after Pierre Catala, a professor of law who pionneered the French legaltech by creating a computer database of law cases, Juris-Data. The research group that he led in the late 1960s, the Centre d’études et de traitement de l’information juridique (CETIJ), has also influenced the creation by state conselor Lucien Mehl of the Centre de recherches et développement en informatique juridique (CENIJ), which eventually transformed into the entity managing the LegiFrance website, acting as the public service of legislative documentation.

    • Their research group is called 'Prosecco', and another of their projects at Inria is 'Squirrel'... Another one is F* (pronounced F star).

      I imagine that they find their naming choices amusing.

  • I also came to comment that the software's name might cause confusion with the human language, Catalan.

Law is a mess, in part because its authors take shortcuts. For example, from the first example on CatalaLang's README.md:

> If the property was acquired by gift [and various conditions apply], then for the purpose of determining loss the basis shall be such fair market value. [emphasis added]

I think (and I'm not a lawyer or a tax expert) that this means that the basis of an asset can have a different value for the purpose of determining gain or determining loss. Wow, basis isn't just a number, although one might not notice this if one didn't read the six emphasized words.

But the Catala code seems to completely ignore this. Oops. I filed an issue:

https://github.com/CatalaLang/catala/issues/514

In a real use case, I imagine that substantial refactoring of the parts that consume basis might be needed when one notices that the basis is not a number.

If anything, I'm grateful to this article for exposing me to this Bill Gates deposition. I had no idea he was so repugnant and incompetent liar. Sometimes, in my mind, I'm trying to justify the clearly bad actions by big corporations by saying something like "maybe they don't know" or "maybe they are in a tight spot between many bad choices", but here it's loud and clear that it was deliberate. He and his buddies knew what they were doing, they knew it was wrong and they didn't care. Neither had he the guts to own up to his wrongdoing.

That's really cool!

I was puttering around with the idea of a ricardian compiler for legalese, basically a decompiler for something like this that could compile a legal text into clear logical rules. This would aid in proof checking for legal documents to ensure that they're compatible with existing law, that there are no (unintended lol) loopholes and the like. It would also be useful if you wanted to create self enforcing legal documents that can be enforced deterministically by machines, such as collateralized agreements, and finally, even though someone would still need to know legalese, it could make the development of such agreements easier for people and lower the bar tremendously.

I wonder if anyone has built anything like that, if these guys have, or if anyone has built other interesting ricardian compilers.

Would be so funny if all the compute moves from AI to finding loopholes in the programmatically defined law.

  • "haha! actually, if you give it this input, the program crashes! i can finally steal pillows from the bookstore!"

Although I agree on principle, the closest thing we have to 'formal law' are smart contracts, and already billions of dollars have been stolen from bugs in these, despite barely anyone using them. I have some reservations for basing our entire legal system on code.

  • I think that comparison is way off.

    Smart contracts are much more comparable to "a webshop", than actual logic describing rules of arbitrage or other concepts at play in "law".

    • A smart contract is a well-defined series of rules, that anyone can choose to interact with and have certain well-specified guarentees of the outcome -- in that sense I think there are quite a few similarities, the main differences being that a smart contract deals with a way narrower set of concepts and is immutable.

      1 reply →

I would have a much easier time understanding government legislation if everything was provided in such language. I tried to compute my taxes by hand a month ago to see whether and how much money it would save if I enabled "loon middeling" (some Dutch law about income). But I couldn't figure it out. The explanations provided were ambgiuous in some subtle way, leading me to incorrect assumptions. In the end I did figure it out by reverse engineering a free third party calculation tool (which also was not correct, but putting their insight and my insight together made something that came close to the number on my belastingaanslag).

This seems like a worthy project but I think its landing page should define the scope of what it aims to do in more concrete terms. Otherwise it runs the risk of being seen as overambitious and open ended without a concrete problem to solve.

E.g. its not clear if there is an explicit or implicit ontology against which the validity of any codification can be checked.

This is interesting, and not to criticize, but I wonder if transformer model's accuracy in interpreting law will obviate the need for something like this.

It would be interesting to train Large Language Transformer Models to generate this code for you based on the text in the laws. This way you have a deterministic testable output, without risk of hallucinations.

  • Even LLMs would be better off with a clear unambiguous syntax to define human laws and rules.

What are the authors' goals, what is their intended purpose? I can't find a mission statement on their website.

Something to keep in mind is that the courts are not necessarily trying to determine the truth, but rather create a place to allow two parties that represent different interests duke it out. Not always what the courts are used for, but it's a different mentality than science or programming.

This project is doing code -> text, right?

But then, the first line of description in Github says:

> Catala is a domain-specific language for deriving faithful-by-construction algorithms from legislative texts.

This reads like 'text -> code', which is the opposite of what this project seems to be doing.

I'm actually in court later this month pertaining to some stolen traffic cones. Dismissed my lawyer just yesterday due to a dishonest swagger. I'll be preparing some documents using this after I prototype using a few frameworks. Will keep you posted.

This is really cool! But I guess it’s biggest drawback is being unable to deal with case law?

Super interesting. I also think we would win with a kind of versioning system for laws, including a definite objective for a law from the time of it's creation, and constraints under which it should be questioned again

Started something like this years ago with a company that ended up pivoting to a slightly different direction after a while. glad to see something in open source space.

No way would this ever be a good idea. Language changes too much, be it human or machine. This is why you need flexibility, not a rigid structure, in making law.

without formal training, one key thing I picked up is that most public understanding of legal concepts diverges from court understanding because law follows logical and/or gates

so “and” isnt a list of accepted criteria, it is a list of things that must be simultaneously satisfied

but its only using logical gates most of the time

this is a good step in showing that. not a panacea but a good step!

I had so much trouble getting catala to even compile that I got frustrated. It’s a piece of academic code that’s very much abandoned

  • It's hardly abandoned. The git history shows activity almost every day over the last few months. It's clearly being actively developed.

A software for people. For dictating human behavior. Is that what we're looking at here?

Doesn’t TurboTax have some kind of DSL they use to encode all of the tax rules?

"We" will interpret the law for you and the judges, and "we" are not suspicious at all of having a hidden agenda to replace "the law" by "how we see the law" to benefit ourselves.

Is this a joke?

Will it help?

On the one hand, I think it would be fantastic, if you had automated tests for the law. For example, when German politicians introduced the "hacker law", you could have pointed out that "This new law would break the 'security researchers need to be allowed to do penetration testing' test".

On the other hand, "Brexit is in conflict with the Good Friday Agreement, we need a solution for Nothern Ireland." was known without machine readable laws and test, but politicians ignored it anyway.

Maybe what's needed is a law that outlaws test-breaking laws and requires politicians to fix the tests first, but I bet that would just result in a lot of "commented" tests.

Laws are often written in most vague language to allow wide interpretation, especially laws regarding treason, communism, foreign agents, anti-war speeches and such things. Programming language won't help here.

  • I'm also somewhat skeptical. How would a program deduce that "cookies and similar technologies" will mean localStorage, sessionStorage, IndexedDB, etc. Remember that the law was written well before some of these technologies even existed.

Lawyer here.

Oh, this again. I suppose this looks relatively harmless, but I'm always wary of "law is like computer code."

The impulse to think this can strongly solve any real problem in the law is intuitively attractive, but I strongly predict this mostly never happens; it's the law's job to be intensely practical in the face of hard-edged "computer-like" rules.

If anything, you get goofy confusion about what things "are?" My go-to on this is always the "smart contract" -- which can be useful little bits of automated robot money moving code, but emphatically are neither "smart" nor "contracts."

  • I can assure you that the person who wrote this is very well aware of the subtleties involved with formalising the law. Law + Programming is an active research field (https://popl23.sigplan.org/home/prolala-2023), and it is very far away from anything like smart contracts, it is full of brilliant people who have no pretension of replacing the law with computers, but simply be helpers where they can.

    • Sure, I think I posted my response not because "it could never solve ANY problems," but instead "way too many non-lawyers, especially techy-non-lawyers, have the deeply misplaced idea that is a very important, perhaps THE most important, problem to solve in the law." It's just not very high on the list at all.

      2 replies →

    • I agree. The idea of logically representing law isn't the same as replacing law with computers. That's why I even posted this.

  • This is a problem, but not one that Catala suffers from on a first reading.

    Some elements of law are amenable to translation into source code, and indeed anyone working in fintech will probably have done that at some point. If the law gives a threshold for a tax allowance, for example, you need to encode that requirement in accordance with the law. Being able to mark up the text of each regulation should make it much easier to be confident you've not missed anything.

    Trying to write non-financial regulation as code is pretty much doomed to failure. But to the extent that tax or benefits regulations set out numbers that we have to translate into code anyway, it's good to have that code be verifiable against the specific regulatory text.

  • > Oh, this again. I suppose this looks relatively harmless, but I'm always wary of "law is like computer code."

    "To a man with a hammer, everything looks like a nail." (Twain?)

    Hasn't Cyc impressively demonstrated just how incredibly difficult and costly it is to formalize even the most basic matters of daily life?

    There already was a discussion two years ago: https://news.ycombinator.com/item?id=27059899

    • > Hasn't Cyc impressively demonstrated just how incredibly difficult and costly it is to formalize even the most basic matters of daily life?

      I would offer that the "cost/benefit" analysis for such a formalism exists on at least two axes: the concept domain which one is attempting to formalize, and the benefit (and/or size of consumers) of any such working system

      I can wholly understand that trying to translate the entirety of English into a formal logic system sounds overwhelming. But to side with a sibling commenter, why not at least start with the tax code which is a personal pain point, has (presumably) a correct outcome for some cases, and is mostly algorithms-in-English

      And then, for the consumer side: ok, if I snapped my fingers and Cyc existed and worked I struggle to think how exactly my life would change. If the formally-specified tax code existed and worked I wouldn't have to rage-upvote almost every comment on the annual tax hatred thread

      I would even offer that an incomplete version could still be useful if one left "fuzzy" variables in the corpus, and said "welp, we can't define what a $Person is because of the hundreds of years of precedent, so you'll need an actual Judge for that". I don't meant to say that 50% of the corpus can be undefined variables, that's just silly, but I'd hope the tax code isn't built upon 50% undefined behavior, even if accountants want you to think it is

      1 reply →

  • Law is like computer code, if:

    - your compiler was AI-complete and adversarial and hated you

    - your compiler was also not bound by any hard rules and could emit undefined behavior at any time

    - your job scheduling and orchestration system was AI-complete and adversarial and actively hated you

    - your runtime library had 50 different incompatible canonical implementations and can only be run by being forked by publicly-elected officials who blindly merge patches from bad-faith lobbyists

    - the documentation for any of those 50 runtime libraries is paywalled per page behind https://pacer.uscourts.gov/pacer-pricing-how-fees-work if you're lucky

    - the IDE is Microsoft Word, and the linter is a summer associate on their tenth cup of coffee

    - you will inevitably get a non-technical client who thinks that the more times you have "notwithstanding the foregoing" in your code the more you can call yourself Web Scale

  • I like to turn this argument around to see how absurd it is.

    Why not just take the existing law, and have a machine execute it in the style of a computer program?

    We wouldn’t need judges juries or lawyers. You’d just type the specifics of your case and any supporting documents/evidence into the computer and a verdict would pop out.

    Of course, the system could be used for other stuff too, like checking building code compliance or engineering soundness, signing off on military and police action, setting the executive branch’s priorities, and so on.

    • Not a lawyer. My thinking is very likely naive as I have no experience in this matter.

      I see two potential issues:

      - Picking evidences. "Evaluating" the law might need access to all the possible evidences that could exist, but that would certainly never be true, so you'd need someone to know which evidences to present. You probably cannot rely on some interactive process asking you such and such evidences because it would be presenting evidence that would trigger evaluations of chunks of laws. I would guess a lawyer with good knowledge of the law would probably be needed for this.

      - Setting precedents. Wouldn't the "automated" law evaluation run into unprecedented cases all the time? You'd need someone to constantly issue a verdict on unforeseen situations all the time, and I guess you'd need a judge for this.

      Maybe it could work on many "trivial" cases though.

    • Some people have argued that the fuzziness of the legal system can be a good feature for some reason, but you could always have a machine execute the law and a human make the final call. So you wouldn't need judges, juries, or lawyers, but you would need a team of legal shamans that sign off on verdicts

      1 reply →

  • I agree with you, but this is literally what those in civil law jurisdictions believe.

    There are some areas where automating things can be effective – e.g. tax systems.

  • how many times have you fought in court to argue about "the spirit of the law"? I, for one, don't really care about this lang or "law is computer code" thing. just wanted to know lawyers life, I guess.

  • > If anything, you get goofy confusion about what things "are?" My go-to on this is always the "smart contract" -- which can be useful little bits of automated robot money moving code, but emphatically are neither "smart" nor "contracts."

    They are contracts—just not legal contracts. One of many types of contracts in the world that are not legal constructs.

What problem does this solve? It appears to add precision where it's mostly already clear - perhaps it can enforce some kind of rigor... but then like the example given uses "fair market value" as a term which I'd expect to be the kind of thing that's in contention, rather than any of the actual "logic", and it doesn't help with that.

The reason we have courts and lawyers is because of the need for interpretation beyond just writing good logic, so I don't see how this can really do anything. Or is it for something else?

  • Catala still produces plaintext legal documents at the end of the day but can be seen as a markup language for those documents. But because that markup language is a whole lot more precise than the legal text itself, it can be a bit more versatile.

    Examples of how this could be useful:

    - Reducing the overhead for maintaining a list of semantic translations of that legal code into other languages. Of course the official language is the only one that is "legal" but the other translations should be close enough to effectively express the nuance provided the language outputs are maintained by people who can actually speak those languages.

    - Producing machine executable proof or simulation code. This could be used for "fuzzing" the legal code to identify loopholes or unintended outcomes so that legislators can then propose improved terms to avoid those issues. This is by no means "making code law" but it provides an additional tool for understanding the law and how the many different parts of the legal code interact with each other.

    - Adding on to the previous example, sim code could be integrated into complex models for simulating the impact of legal changes on the economy at large or specific segments.

    - Finance related code can be used to generate a tool or API for validating tax, accounting, and compliance documents (as a first pass to catch errors early and reduce overhead) as well as to even prepare some of those documents. These tools often already exist but they are one or more steps removed from the actual legal definition which increases the risk of error as well as the overhead of maintaining them (which can potentially encourage rent seeking behavior by commercial providers of these tools).

    France actually is already doing this to a reasonable degree albeit the "codified" version is based on the law rather than the codified version producing plaintext law. The DGFiP [1] maintains a gitlab organisation [2] that includes both Catala and MLang [3] representations of different parts of the french legal code for exactly these purposes.

    1. https://fr.wikipedia.org/wiki/Direction_g%C3%A9n%C3%A9rale_d...

    2. https://gitlab.adullact.net/dgfip

    3. https://github.com/MLanguage/mlang

    • According to the example in the readme, it's specially for text law that produce codes… So it should be a road to some literal programming or implementation proven.

      Example of text law that should/may become code somewhere: the senate vote to give pension to veterans that meet some criteria… But there already exist less known rules for some cases and they may be incompatible.

      I think that coupled with some kind of prolog, it may help detecting inconsistencies early.

    • > Of course the official language is the only one that is "legal"

      If only! Here in Canada there are two official languages. All laws are drafted, and enacted, in both English and French. Both versions are equally valid, equally binding. And, sometimes, they don't say the same thing.

      2 replies →

  • >It appears to add precision where it's mostly already clear - perhaps it can enforce some kind of rigor...

    I'd argue that the imprecision of law is more feature than bug. Rules as written have edge cases and, as long as the law is written in natural language, you can get a feel for their intent and that helps Judges decide what to do in those situations.

  • I suppose a formalized legal language could:

    - help in quickly testing whether newly drafted laws contradict existing laws(without needing to memorize the existing legal code)

    - check for redundancies

    - checking whether removing one law affects any others

    - statistically analyze legal systems in different countries

    Assuming any of those are important issues in law. I'm not sure

Completely unrelated to Catalan (Català), the language spoken in Catalonia (Catalunya). I think if someone wants to google a question about this, "catala language beginner hello world" won't help them much.

  • I can only conclude that they didn't know that. It's such a bad name; we won't even be able to google "catala lang" because ... Catala is also a "lang"!

    Imagine someone creates a programming language called Russian. Good luck googling "russian lang".

  • You'd write "catalalang" just like you write "golang", if you really need. I think in most context, search engines would be able to infer the context.

Very cool. Pessimistically, I think that having a clear, understandable view of legal text so that people can navigate the law safely is against a lot of entrenched interests.

  • That's far too reductive. The law is abstract, and still needs to be interpreted and prudentially applied to specific situations, and you cannot capture all of that in the law a priori. Furthermore, having a computer crunch through a bunch of predicates is easy compared to getting the facts expressed in a form that is crunchable. So for specific and narrow applications where such representations are not costly to produce or already exist, such an application of computational law is feasible. But broadly? No.

    And then there's the distinction between lex and ius that I think needs to be considered in this context.

  • I don't think it's that. It's hard to write legal texts, and sometimes it's better to be vague, so the courts have some freedom when establishing jurisprudence.

    Treaties can be written with vague wording to allow parties to sign it, even if there isn't 100% agreement. That's an old practice.

    • On the contrary, it's much better to have very clear text, otherwise it will turn against the citizen.

      Imagine that you have an income tax where "income" isn't clearly defined. Someone will end up with an audit and a lawsuit from the tax office because their definition will be, of course, extensive (every income, including non-realized capital gains) whereas most citizens would only consider salaries.

      In the end, you create legal uncertainty, and give courts way too much power.

      For the record, I used to work for my country's government, and had to evaluate some laws in making that were written in an abstruse way. When I asked why, the civil servant told me that it was so "they could pick the most favorable meaning in the case of a lawsuit".

      1 reply →

    • It is probably better to be intentionally and clearly unclear when you want to be, and clearly algorithmic when you want that, than just stir it all together in the name of judicial discretion.

  • I don't think the example given increases legibility.

    Programming tends to use less understandable but more precise verbiage in general.

I know that naming is hard, and it has already been mentioned in the comments here but... I can't believe that somebody named a programming language with the exact same name as an existing natural language spoken by millions of people.

It just seems like a bizarre decision that can't be a benefit at all and can only have negative consequences. Just googling things about it is going to be hard. Why immediately create potential problems for yourselves when you can choose a name that's not an issue?

  • This hasn't seemed to help or hurt the popularity of other languages. You've got hot beverages, single letters, snakes, gemstones, two letter verbs, oxidized steel, languages where two thirds of the name are symbols, etc. It doesn't seem to matter. It appears that society, and search engines, are well-equipped to deal with the concept of homonyms.

    • All of the things you mentioned are not languages themselves. If you google "catala language" (try it! seriously) you're going to get results for the natural language, not this one. It's just an unneeded roadblock that they placed on themselves.

      1 reply →

    • They're even languages named: basic, pascal, java, rust, go, zig, dart, eifel, camel, python, ruby, julia, scheme, racket, joy, mad, coq, lean, ...

  • goofy; the language you're talking about is Catalan of Catalonia, not `catala`.

    > Just googling things about it is going to be hard

    when you're looking for docs on go do you google just "go"?

    edit: fine, it's called catalá in catalonian itself - this is so pedantic now that i might as well at this point say that the missing diacritic is sufficient to disambiguate.

    • > when you're looking for docs on go do you google just "go"?

      I wouldn't use Go as a good example of naming a language. It worked out because the language had the weight of Google behind it, but it's still awkward that you have to use a different name when searching for things than you do at other times.

      4 replies →

    • > when you're looking for docs on go do you google just "go"?

      Golang will do the trick. Catala lang will not unless the language becomes massively popular.

      2 replies →

    • Catalá is the name of the language in Catalan, is totally equivalent to saying "English" if you are a native.

      This is obviously a not innocent choice. At this level I don't believe in coincidences and CatalaLang makes it even more obvious. This looks like a veeeery obvious psy-op, or a independentist version of the old embrace, extend, extinguish.

      My bet is that as they can't stomach the basic legal concepts, they will try silently replace it by the new "updated" meaning of those concepts.

      1 reply →