Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by yorwba

2 years ago

> We should implement a real Chinese lemmatizer there to chunk the words.

Or find all substrings that are listed in a dictionary (≈everyone uses cc-cedict https://www.mdbg.net/chinese/dictionary?page=cc-cedict ) and give translations for all of them. That way, the user won't be limited to any particular chunking granularity, which is always a finicky aspect of word segmenters to fine-tune.

0 comments

yorwba

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities