Comment by elzbardico

1 month ago

LLMs are not a mythical universal machine learning model that you can feed any input and have it magically do the same thing a specialized ML model could do.

You can't feed an LLM years of time-series meteorological data, and expect it to work as a specialized weather model, you can't feed it years of medical time-series and expect it to work as a model specifically trained, and validated on this specific kind of data.

An LLM generates a stream of tokens. You feed it a giant set of CSVs, if it was not RL'd to do something useful with it, it will just try to make whatever sense of it and generate something that will most probably have no strong numerical relationship to your data, it will simulate an analysis, it won't do it.

You may have a giant context windows, but attention is sparse, the attention mechanism doesn't see your whole data at the same time, it can do some simple comparisons, like figuring out that if I say my current pressure is 210X180 I should call an ER immediately. But once I send it a time-series of my twice a day blood-pressure measurements for the last 10 years, it can't make any real sense of it.

Indeed, it would have been better for the author to ask the LLM to generate a python notebook to do some data analysis on it, and then run the notebook and share the result with the doctor.

8 comments

elzbardico

rfw300 1 month ago

This is true as a technical matter, but this isn't a technical blog post! It's a consumer review, and when companies ship consumer products, the people who use them can't be expected to understand failure modes that are not clearly communicated to them. If OpenAI wants regular people to dump their data into ChatGPT for Health, the onus is on them to make it reliable.

themafia 1 month ago
> the onus is on them to make it reliable.
That is not a plausible outcome given the current technology or of any of OpenAI's demonstrated capabilities.
"If Bob's Hacksaw Surgery Center wants to stay in business they have to stop killing patients!"
Perhaps we should just stop him before it goes too far?
- vineyardmike 1 month ago
  
  > That is not a plausible outcome given the current technology or of any of OpenAI's demonstrated capabilities
  OpenAI has said that medical advice was one of their biggest use-cases they saw from users. It should be assumed they're investigating how to build out this product capability.
  Google has LLMs fine tuned on medical data. I have a friend who works at a top-tier US medical research university, and the university is regularly working with ML research labs to generate doctor-annotated training data. OpenAI absolutely could be involved in creating such a product using this sort of source.
  You can feed an LLM text, pictures, videos, audio, etc - why not train a model to accept medical-time-series data as another modality? Obviously this could have a negative performance impact on a coding model, but could potentially be valuable for a consumer-oriented chat bot. Or, of course, they could create a dedicated model and tool-call that model.
  
  1 reply →

Deklomalo 1 month ago

You state a lot of things without testing it first?

A LLM has structures in its latent space which allows it to do basic math, it has also seen enough data that it has probably structures in it to detect basic trends.

A LLM doesn't just generate a stream of tokens. It generates an embedding and searches/does something in its latent space, then returns tokens.

And you don't even know at all what LLM Interfaces do in the background. Gemini creates sub-agents. There can easily be already a 'trend detector'.

I even did a test and generated random data with a trend and fet it to chatgpt. The output was very coherent and right.

elzbardico 1 month ago
That's not how it works.
- Deklomalo 25 days ago
  
  What exactly?
  Here is the paper were I read about it: https://arxiv.org/html/2601.04480v1

protocolture 1 month ago

This LLM is advertising itself in a medical capacity. You arent wrong, but the customer has been fed the wrong set of expectations. Its the fault of the marketing of the tool.