← Back to context

Comment by simedw

6 hours ago

Thank you.

I had a quick look at Farsi datasets, and there seem to be a few options. That said, written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?

> written Farsi doesn’t include short vowels… so can you derive pronunciation from the text using rules?

You can't, but Farsi dictionaries list the missing short vowels/diacritics/"eraab" for every word.

For instance, see this entry: https://vajehyab.com/dehkhoda/%D8%AD%D8%B3%D8%A7%D8%A8?q=%D8...

With the short vowel on the first letter it would be written حِساب (normally written as just حساب)

The dictionary entry linked shows that there is a ِ on the first letter ح

But you would have to disambiguate between homographs that differ only in the eraab.