← Back to context

Comment by bawolff

4 days ago

98% sounds good enough for the usecase suggested here.

Writing good validators for data is hard. You can be 100% sure that there will be bad data in those 98%. From my own experience I thought I had 50% of the books converted correctly and then I found I still had junk data and gave up, it is not an impossible problem I just was not motivated to fix it on my own. Working with your own copies is fine, but when you try to share that you get into legal issues that I just do not feel are that interesting to solve.

Edit: my point is that I would like to share my work but that is hard to do in a legal way. That is the main reason I gave up.

2% garbage, if some of that garbage falls out the right way, is more than enough to seriously degrade search result quality.