Comment by Cheer2171

7 days ago

> Final nutritional data is generated by providing a reasoning model with a large corpus of grounding data. The LLM is tasked with creating complete nutritional values, explicitly explaining the rationale behind each value it generates. Outputs undergo rigorous validation steps, including cross-checking with advanced auditing models such as OpenAI’s o1-pro, which has proven especially proficient at performing high-quality random audits. In practice, o1-pro frequently provided clearer and more substantive insights than manual audits alone.

This is not a dataset. This is an insult to the very idea of data. This is the most anti-scientific post I have ever seen voted to the top of HN. Truth about the world is not derived from three LLMs stacked on top of each other in a trenchcoat.

25 comments

Cheer2171

creativeCak3 7 days ago

I agree so much with you. This is not a dataset. This is the vomit of an LLM making stuff up. Like...why couldn't you just collect the data that already exist?? Why do you need an LLM?

Adding an LLM to this just adds a unnecessary layer of complexity, for what benefit? For street cred?

joshdickson 7 days ago

There's an in-depth review of the reasoning for undertaking this project in general and this approach in particular in the Methodology/About section below, see "Current State of Nutritional Data".
Millions of people use food logging apps to drive behavioral change and help adhere to healthy lifestyles. I believe there is immense societal good in continuing to offer improved tools to accomplish this, especially for free, and that's why I created the project and chose to open source the data.
https://www.opennutrition.app/about#current-state-of-nutriti...

justsid 7 days ago

I find this actually very upsetting. My wife does calorie counting and all of the apps for it are horrible, especially the market leaders. But those have one thing going for them: Databases of nutritional information, which can be used for easy meal calorie counting. Just enter the ingredients (usually you can scan a barcode) and how much you ate of the total and it tells you where you are standing on caloric and nutritional intake. But even those datasets aren’t always bang on, especially here in Canada where some products share bar codes with US products but they have different nutritional values. Reading the title, I was very excited about the ability to make my wife a better app to support her needs. Unfortunately this is not at all usable for this use case or really any? What’s the point of having data that you just can’t trust at all?

joshdickson 7 days ago
[flagged]
- rendaw 7 days ago
  
  By 100+ comment discussion I assume you mean this HN post in its whole? People here aren't checking the facts, so the fact that only one person found an issue doesn't mean much.

ZunarJ5 7 days ago

As soon as I saw "AI enhanced for Accuracy" I laughed and wondered if this was a belated April Fools joke.

tmpz22 7 days ago

Imagine how much more efficient government would be if we just generate all the data with LLMs.

NewJazz 7 days ago

Stop. Giving. Them. Ideas.
https://www.reddit.com/r/ABoringDystopia/comments/1jq8kzl/th...

pmichaud 7 days ago

[flagged]

rmah 7 days ago
It doesn't matter how accurate the models are, it's not a "data set" (in the scientific sense), it's more of a conclusion set. Maybe the conclusions are spot on. Maybe not. I have no idea.
- Cheer2171 7 days ago
  
  Right. At my most generous, this is a dataset about LLM behavior when asked to infer nutritional value. It is in no way a nutrition dataset. It is perhaps useful as half of a benchmark for accuracy, compared to actual ground truth. Unlike a scientist, you're not motivated or resourced enough to create the ground truth dataset. So you took a shortcut and hid it from the landing page.
  This workflow, this motivation, this business model, this marketing is an affront to truth itself.
- pmichaud 6 days ago
  
  I think there is a real conversation to be had about “data” in a post LMM world, but I actually don’t care about debating definitions here, I care about whether the product works within a reasonable margin of error.
- joshdickson 7 days ago
  
  I envisioned many lines of inquiry from HN but the idea that a compressed TSV of nutritional data is not a "dataset" (definition: a collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer) was unexpected.
  
  9 replies →
thi2 7 days ago
Tried it with unsweetened oat milk and the info was off in nearly every col.
Not representable because I dont have US food but since its AI enhanced I cant compare my stuff with the stuff in the "dataset" and be sure thats an Us vs germany thing..
- joshdickson 7 days ago
  
  Would you mind posting/messaging me in some way (links in bio) what you expected it to show?
  It looks like for unsweetened oat milk:
  https://www.opennutrition.app/search/unsweetened-oat-milk-mt...
  ...it is leaning into a citation from the Australian Nutrient Database (e.g. Oat beverage, fluid, unfortified. Australian Nutrient Database. Public Food Key F006132. ), which is what I instructed it to do if it thought there was an exact match from a governmental database.
  It's possible this is a poor general source for oat milk or that's not the beverage intended for the entry to stand for. I'll check it out, thank you for the report.
  
  1 reply →