Documind ripped our open source tool and swapped the license

3 days ago

Saw a ShowHN post [0] today about an open source document extractor tool. I thought the workflow sounded pretty similar to our library, and only to realize it's a direct rip of Zerox [1].

Looking through the code:

1. They removed the MIT License and added AGPL.

2. It's not a fork, they purposefully copied the code and did a find/replace to swap out Zerox for their name.

3. There is no attribution or mention of the original library.

4. They are trying to market this as a competitive product.

If you inspect the source code, it's a verbatim copy. They literally just renamed the `ZeroxOutput` to `DocumindOutput` [2][3]. I recognize that plenty of people are using Zerox within commercial products, but to copy to copy the code and pitch it as your own open source product is pretty fraudulent. Especially slapping a copyleft license on it.

[0] https://news.ycombinator.com/item?id=42171311

[1] https://github.com/getomni-ai/zerox

[2] https://github.com/DocumindHQ/documind/blob/main/core/src/types.ts#L25

[3] https://github.com/getomni-ai/zerox/blob/main/node-zerox/src/types.ts#L35

Hello. I apologize that it came across this way. This was not the intention. Zerox was definitely used and I made sure to copy and include the MIT license exactly as it was inside the part of the code that uses Zerox.

I also mentioned that MIT license again in the root license file.

If there's any additional thing I can do, please let me know so I would make changes. Thanks.

  • Hey Tammilore. I see your updates to the README and it's appreciated.

    And I don't mind people building on top of zerox of course, that's why it's open source. I flagged this because it was originally passed off as your own work.

    Although general best practice here would be to fork the repo, that way you could make updates, and always pull in upstream changes as we roll them out. It seems like your implementation [1] could have very easily just pulled in the npm package.

    [1] https://github.com/DocumindHQ/documind/blob/main/extractor/s...

    (Also very minor nit, you don't need an await on the `generateMarkdownDocument` function)

    • Thanks for pointing this out, and I get where you're coming from. I didn’t mean to make it seem like Zerox wasn’t part of the project— which was why I included the MIT license both in the relevant code and the root file to make sure it was properly acknowledged.

      On the forking thing, I honestly didn’t think it was necessary since cloning and modifying seemed fine for the use case, but I see how forking might’ve made things clearer. I’ll keep that in mind going forward.

      Appreciate the heads-up about generateMarkdownDocument, too—I’ll fix that!