Comment by enriquelop
9 hours ago
I built a pipeline that converts all Spanish state legislation into version-controlled Markdown. Each law is a file, each reform is a real git commit with the historical date. 8,642 laws, 27,866 commits.
The idea: legislation is just patches on patches on patches. Git already solves this. Instead of reading "strike paragraph 3 and replace with...", you get an actual diff.
The repo is the product. Browse any law, git log to see its full reform history, git diff to see exactly what changed.
Built the pipeline in ~4 hours with Claude Code. Source is BOE (Spain's official gazette) consolidated legislation API.
Exploring whether there's a business here — structured legislation API for legaltech/compliance, or just a useful open dataset. Curious what HN would build with this data.
Laws intent are often clarified in courts through judgments. If you can overlay the judgements on top of the corresponding law, at correct points in time, I think that will have value. It might, for example, show which laws were referenced the most and which needed to be clarified the most. It might give insights into what legal language constructs stood the test of time and which had to be repeatedly clarified.
That's true, but it might not be as important here.
Spain is not a country with a Common Law legal system entirely like the US or the UK. They have a civil law system where prior court judgement does not form a strictly binding precedent. Prior judgements can be important, but case law is not really a thing.
I wonder how true this is, we have the same system in Sweden, that court judgement are not legally binding precedent for lower courts. But in practice lower courts will follow the rulings made by the high court.
Is it not the same in Spain at all?
2 replies →
Laws are often cascaded as well. Specifically in this case, Spain is subdivided into Comunidades Autonomas - each have their own elected parliament. And inside those are cities with their own local laws.
So while this project does track laws, is there any facility to determine which laws from which bodies are relevant to a specific activity in a specific location?
> And inside those are cities with their own local laws.
No, cities don't have their own laws, but the autonomous communities do have some influence in some laws and regulations (not all), like the amount of income tax you have to pay and so on. But cities within the autonomous communities don't have their own laws.
7 replies →
I may be wrong, but I think autonomous community legislation is not published in the BOE itself (the Official State Gazette), but rather in each of their corresponding official gazettes (e.g. DOGC for Catalonia, BOCM for Madrid, BOA for Aragon, BOJA for Andalusia, etc.).
yes: Comunidades Autonomas can only defined laws as "permitted" by the central government under a Estatuto de Autonomia (Autonomy statute? not good with legal jargon), which is effectively a law of its own. So at the central level the law says "in this particularly region, matters of education are dealt with regionally", and then that's when regional laws apply. Same from local laws. In essence, all laws emanate from the central government, but the central government decides to delegate some areas; technically, they could always take it back.
Another thought. Assuming such a dataset (laws+ judgement) could be built, an argument can be made to Parliament to draft new laws that take into account all those judgments and then mark those judgments and old laws in a way that they can no longer be referenced (archived?). This might simplify future cases leading to lower legal costs.
And who knows maybe a way could be found to create smart contracts (smart oracles? smart judges?) and those could lead to instant judgements.
Rarely in a civil law jurisdiction, essential in common-law jurisdictions.
Perhaps reference it in the commit trailer?
> Exploring whether there's a business here — structured legislation API for legaltech/compliance, or just a useful open dataset. Curious what HN would build with this data.
Compiling legal data for specific domains and then selling processes that rely on your private compilation is a battle-tested business plan, but there's a lot of manual work involved and the cost of that work becomes a barrier to entry.
Generally speaking, the people who'd like to cross that barrier are both open to ideas and funded well enough to run little experiments.
Oooh Can you elaborate a bit how the gazette is publishing them? Like what format did you have to parse. And how many documents were there in total? I tried doing the same for German laws 1-2 years ago but LLMs weren't smart enough yet. And the costs would've been at least a couple of thousand €.
Ed: Nevermind, I missed the "BOE (Spain's official gazette) consolidated legislation API" part. Sending jealous greetings from Germany. We just have a bunch of PDFs in Germany. And the private entity that has been publishing them for decades even claims copyright on them!
Do you mean DIN?
Heh we have the exact same status in Greece. It’s sad the upstream is so sloppy.
I looked into this a while back and IIRC, the consolidated legislation doesn’t cover all legislation but only a handful.
Also, in my experience (having built in this space before), regulations aren’t really the issue. Court rulings are, because there’s no open data for them in Spain. And the potential users for a paid product (legal professionals) already know the law; the key players (big law firms) have their own databases of annotated and verified court rulings and other documents.
Very cool project. How are you thinking about indexing and discoverability? Git gives you the change history, but navigating the corpus itself seems like the harder problem: Finding related laws, understanding hierarchical relationships between statutes...
Have you considered embedding semantic hierarchical structure directly in the markdown? Something like https://github.com/wikibonsai/semtree ? It lets you build a navigable tree across markdown files using indented [[wikilinks]] as the organizational spine. Could be a natural fit for legislation that already has an inherent taxonomy (constitutional → organic → ordinary, or by subject area).
I’ve thought of this of many times! But you’re missing the most important part: every reform should be a PR. With discussions and all. That would be the purest form of voting. And all decisions to reach the current state of a law would ve registered and available for everyone to read. Democracy 2.0.
Dreaming costs nothing.
This is brilliant. I had thought about this for a long while, you see laws that are just "go to law 132 and amend paragraph 4, then go to law 24 and amend paragraph 9". Basically "laws" are recorded as diffs, and then it's up to the reader to put up the final product in their heads. They should be doing it this way!
To be fair, the BOE website often offers the consolidated version for many important laws, which includes further amendments by other laws in one text.
This is really cool. I've thought about it for a long time as well but never had the idea of just using git, which is equal parts genius and "obvious" in hindsight, as most great ideas are.
I think the corollary that comes to mind is that reforms, with their git commits, are incrementally valuable if they refer to other parts of the legislation, previous commits, etc. to give more context as to the intent at the time of the law. So maybe there's a way to distill the legislative process into more PR and commit-oriented work—likely ex post as you did here, but perhaps in the future as part of an actual workflow.
And then maybe I'd pitch the idea to some technologically-inclined local government.
Please! Can you make the same for Portugal? Laws here are a mess of reforms...
is there a similar API for Portugal?
I’ve had the idea of playing with our laws and trying to ask questions about their growing volume and complexity. This is timely and dope Enrique - mil gracias!
It would be a good place to start if you wanted to hard fork the government.
Congratulations, this is a brilliant resource. You have done one of those countless things which I often think about doing, but my utter lack of follow-through and other distractions make it a fantasy. I cannot wait to clone the repo and explore it.
As to what can be done with the data, maybe one interesting step could be a graph-database regarding laws which reference other laws or the definitions that they depend on?
Too bad that author, and committer are individuals and not lists. It would be good to see who wrote them and how the voting went as well.
cool idea, how far back (in time) do those 27k commits go?
Just thinking how this could maybe used for (automated) research / visualization on the evolution of (spanish - in this case) law
> how far back (in time) do those 27k commits go
Looking at the commit dates (which seem to be derived from the original publication dates) the history seems quite sparse/incomplete(?) I mean, there have only been 26 commits since 2000.
It seems the commits aren't in proper date order. Here are some newer changes, placed before the latest commits: https://github.com/EnriqueLop/legalize-es/commits/master/?af...
2 replies →