Comment by sn41

2 years ago

Just a side thought related to this: can there be a community supported initiative to parse what TOS documents of different companies mean, and specifically, what to watch out for in each company's TOS vis-a-vis what is the norm in a certain industry?

Right now, the "gotcha" power is entirely one-sided. A wiki-like approach towards documenting TOS might make the user better aware of what to really watch out for when using a particular software.

I think what you're looking for is TOSDR (Terms of Service, Didn't Read): https://tosdr.org

It's been going for several years and has very thorough analyses of various ToS, done by volunteers who are often legal professionals.

I always thought it would be cool if Steam did that for their games. Kind of like the little feature tags they already use, "local multiplayer," "controller supported," etc, but for EULAs (technically they already track whether there is a third party EULA).

So you could just look at a game and it'd have bubbles like "phone homes > sends your data > IP, operating system, language," "not responsible for online interactions," etc whatever other legal nonsense they stuff in those things. There's tens of millions of them so I assume they can be compressed into some couple thousand of rote legal "chunks" that can be filtered and sorted on.

The main problem with “what to watch out for” is that it seems like every TOS is full of the same BS so your options when you spot something to “watch out for” are pretty slim. (Unless you want to live like the Amish, which tbh is looking pretty attractive sometimes!)

Typically that’s what a newspaper is supposed to do. With userbases larger than some countries, it could make sense.

(TL;DR: this describes an old project of mine which tried to do something similar, which may be useful as inspiration for such a project. More testing is needed to say whether simple string replacements work sufficiently well, but initial results were promising. These days, LLMs are probably even more promising.)

I tried making this once for employment contracts, which would string-replace difficult phrases with a version that uncle Jack on his horse would understand. This was back when a computer's understanding capabilities barely amounted to decompose a sentence into what the subject is you're talking about in the first place. However, the simple string replacements worked so well that I figured we should just crowdsource a dozen contracts and that should be able to kill 90% of the difficult language in any employment contract.

Example replacements: "to come to a transfer of the concerning intellectual property rights" with "to transfer copyright", or "Employee shall henceforth" with "you will" (I have trouble identifying "employee" versus "employer" in texts, so replacing it with 'you' and 'we' is a lot easier for me to read). It also just killed lesser-known words, like replacing "forthwith" with "immediately".

Example paragraph: If and insofar as Employee as part of the execution of their work activities on behalf of Employer whether or not together with others produces a work or other type of something on which intellectual property rights rest or can rest, Employee transfers these intellectual property rights already now just in case, or alternatively Employee grants an unlimited and irrevocable license and Employee lends at the first request of Employer forthwith all required cooperation to achieve a transfer of the concerning intellectual property rights. (Translated from a Dutch work contract I once received.)

Its replacement: If you make something during your work for us which can be copyrighted, you always transfer the copyright and grant a license and on request you immediately help transfer the copyright.

The project never went anywhere because I didn't pursue getting other contracts and seeing how well it works before marketing it as a useful tool for others. The main unknown is whether it would scale, or if the string replacements that work well in one contract start messing things up in others. Nowadays, though, I'd probably start by telling an LLM what kind of transformation it should make and then inputting the contract; that's probably 99% of the result in 0.1% of the time spent.

  • Very interesting :)

    In legal settings, I would rather trust your approach than an LLM. Your approach has "backtraceability" - you can build a log of how each transform worked, from the original, to the final : it's a series of localized transforms.

    LLMs (at least to my understanding) cannot have that, because it is based on a global approach. So there is a good chance that, what it spits out might be readable, but it's one-to-one correspondence with the original legal document may not exist. So it may lead you to a false sense of security.