Comment by kylehotchkiss

1 year ago

This is really cool. I posted a photo of what I think was my great grandparents into it and it explained their circumstances in fascinating ways (to the point of mentioning aged clothing, a detailed I overlooked).

I’ve been trying to figure out how to process hundreds of my own scanned photos to determine any context about them. This was convincing enough for me to consider google’s vision API. No way I’d ever trust OpenAI’s apis for this.

Edit: can anybody recommend how to get similar text results (prompt or processing pipeline to prompt)?

14 comments

kylehotchkiss

ozzzy1 1 year ago

> consider google’s vision API. No way I’d ever trust OpenAI’s apis for this.

lol?

qingcharles 1 year ago

I use ChatGPT every day and I just throw a pic and say "alt text", that will give you insane detail, but also limited because the prompt itself insinuates a shorter description for a HTML tag.

I just threw a pic in here of my gf holding a loaf she just made and part of it said "The slight imperfections on the bread's crust indicate it's freshly baked, and the woman's posture and facial expression suggest that she is very pleased with her creation."

kylehotchkiss 1 year ago

That description made me smile, not bad for alt text!

changoplatanero 1 year ago

Where is the difference in trust level between google and openai coming from?

kylehotchkiss 1 year ago
One company has the capacity to maintain HIPAA compliance and the other is best known for vacuuming up the entire web and users prompts. For something as sensitive as family photos, I know which company/product I'd prefer for this potential project.
- superb_dev 1 year ago
  
  Has the capacity to maintain HIPAA compliance maybe, but Google is not a HIPAA covered entity and thus is not subject to any of its rules
- sneak 1 year ago
  
  Google also vacuums up the entire web and users’ prompts? I am confused thoroughly by this position.
  It seems like emotions, not facts.
  
  1 reply →
- SoftTalker 1 year ago
  
  What is the sensitive nature of a photo of two people both long dead?
  
  1 reply →

lph 1 year ago

Yeah, I know the point of this site is to give us a dystopian shock by showing us how much information Big Tech extracts from our photos, but it's inadvertently a pretty good advertisement for Google's Vision API. It did a fantastic job of summarizing the photos I threw at it.

vunderba 1 year ago

I mean I wouldn't trust either entity. If you're serious about maintaining some semblance of privacy then you should opt for a local solution such as BakLLaVa or Llama-3.2-Vision models.

https://huggingface.co/llava-hf/bakLlava-v1-hf

https://huggingface.co/meta-llama/Llama-3.2-11B-Vision