Comment by fzeroracer

10 hours ago

> investing all the work to build his own.

I don't think you understand why the author did this on a fundamental level. Sometimes it isn't explicitly about getting the outcome directly, it's about putting in the work to understand how you get there.

The article says that the author wanted to have an analysis done, but ruled out the option of a human doing it. He did not rule out using a multimodal LLM. It's possible that he just is unaware that multimodal LLMs are capable of doing an analysis. Considering the approach wasn't mentioned in the article one can guess that the author just didn't know it was possible. But there is no way to know for sure.