Comment by aeonfox

17 hours ago

> I think the blame more rests on Apple for falsely representing the quality of their product

There was plenty of other concerning stuff in that article. And from a quick read it wasn't suggested or implied the VO2 max issue was the deciding factor for the original F score the author received. The article did suggest many times over the ChatGPT is really not equipped for the task of health diagnosis.

> There was another problem I discovered over time: When I tried asking the same heart longevity-grade question again, suddenly my score went up to a C. I asked again and again, watching the score swing between an F and a B.

> There was plenty of other concerning stuff in that article.

Yeah for sure, I probably didn't make it clear enough but I do fault OpenAI for this as much as or maybe more than Apple. I didn't think that needed to be stressed since the article is already blasting them for it and I don't disagree with most of that criticism of OpenAI.

The lack of self-consistency does seem like a sign of a deeper issue with reliability. In most fields of machine learning robustness to noise is something you need to "bake in" (often through data augmentation using knowledge of the domain) rather than get for free in training.