Comment by mediaman

2 years ago

That’s a completely different test. You’re using the vision multimodal ability to decipher Chinese script, essentially adding an OCR step to the process, and it’s not good at OCR of Chinese script.

Try feeding it actual Chinese characters. From what I understand, it’s somewhat competent.