Comment by sdesol
8 days ago
If you look at my post history, you can see an example of how claude and openai can not tell that GitHub is spelled correctly. The end result won't make a difference but it raises questions regarding how else it can misinterpret things.
At this moment I would not trust AI to automatically make changes.
My answer to this in my own pet project is to mask terms found by the NER pipeline from being corrected, replacing them with their entity type as a special token (e.g. [male person] or [commercial entity]). That alone dramatically improved grammar/spelling correction, especially because the grammatical "gist" of those masked words is preserved in the text presented to the LLM for "correction".