Comment by minmax2020
2 years ago
May I ask what library you're using for kanji to hiragana transliteration? I'm working on a language product as well and I'm using pykakasi which is certainly prone to errors. I tried your app and noticed similar errors as well (大いばり should show ooibari instead of daiibari, for example). Wonder if we can do better on transliteration.
MeCab with ipadic and a lot of custom swift logic for fixing issue patterns and matching to JMDict entries as an additional heuristic that the stemming/token was done right. I’m also using a custom generated JLPT classification (a more complete guess at what the full set of JLPT vocab is based on ebook word freq) to choose more likely candidate results. I haven’t improved this in a couple years, it’s one of my upcoming focuses now that I have the app rewritten and out.
Unidic also interesting but harder to use and huge data size.
I’m going to be layering on gpt to further improve.
What're you working on?