Comment by lucb1e
2 years ago
(TL;DR: this describes an old project of mine which tried to do something similar, which may be useful as inspiration for such a project. More testing is needed to say whether simple string replacements work sufficiently well, but initial results were promising. These days, LLMs are probably even more promising.)
I tried making this once for employment contracts, which would string-replace difficult phrases with a version that uncle Jack on his horse would understand. This was back when a computer's understanding capabilities barely amounted to decompose a sentence into what the subject is you're talking about in the first place. However, the simple string replacements worked so well that I figured we should just crowdsource a dozen contracts and that should be able to kill 90% of the difficult language in any employment contract.
Example replacements: "to come to a transfer of the concerning intellectual property rights" with "to transfer copyright", or "Employee shall henceforth" with "you will" (I have trouble identifying "employee" versus "employer" in texts, so replacing it with 'you' and 'we' is a lot easier for me to read). It also just killed lesser-known words, like replacing "forthwith" with "immediately".
Example paragraph: If and insofar as Employee as part of the execution of their work activities on behalf of Employer whether or not together with others produces a work or other type of something on which intellectual property rights rest or can rest, Employee transfers these intellectual property rights already now just in case, or alternatively Employee grants an unlimited and irrevocable license and Employee lends at the first request of Employer forthwith all required cooperation to achieve a transfer of the concerning intellectual property rights. (Translated from a Dutch work contract I once received.)
Its replacement: If you make something during your work for us which can be copyrighted, you always transfer the copyright and grant a license and on request you immediately help transfer the copyright.
The project never went anywhere because I didn't pursue getting other contracts and seeing how well it works before marketing it as a useful tool for others. The main unknown is whether it would scale, or if the string replacements that work well in one contract start messing things up in others. Nowadays, though, I'd probably start by telling an LLM what kind of transformation it should make and then inputting the contract; that's probably 99% of the result in 0.1% of the time spent.
Very interesting :)
In legal settings, I would rather trust your approach than an LLM. Your approach has "backtraceability" - you can build a log of how each transform worked, from the original, to the final : it's a series of localized transforms.
LLMs (at least to my understanding) cannot have that, because it is based on a global approach. So there is a good chance that, what it spits out might be readable, but it's one-to-one correspondence with the original legal document may not exist. So it may lead you to a false sense of security.