Comment by kylehotchkiss
1 year ago
This is really cool. I posted a photo of what I think was my great grandparents into it and it explained their circumstances in fascinating ways (to the point of mentioning aged clothing, a detailed I overlooked).
I’ve been trying to figure out how to process hundreds of my own scanned photos to determine any context about them. This was convincing enough for me to consider google’s vision API. No way I’d ever trust OpenAI’s apis for this.
Edit: can anybody recommend how to get similar text results (prompt or processing pipeline to prompt)?
> consider google’s vision API. No way I’d ever trust OpenAI’s apis for this.
lol?
I use ChatGPT every day and I just throw a pic and say "alt text", that will give you insane detail, but also limited because the prompt itself insinuates a shorter description for a HTML tag.
I just threw a pic in here of my gf holding a loaf she just made and part of it said "The slight imperfections on the bread's crust indicate it's freshly baked, and the woman's posture and facial expression suggest that she is very pleased with her creation."
That description made me smile, not bad for alt text!
Where is the difference in trust level between google and openai coming from?
One company has the capacity to maintain HIPAA compliance and the other is best known for vacuuming up the entire web and users prompts. For something as sensitive as family photos, I know which company/product I'd prefer for this potential project.
Has the capacity to maintain HIPAA compliance maybe, but Google is not a HIPAA covered entity and thus is not subject to any of its rules
Google also vacuums up the entire web and users’ prompts? I am confused thoroughly by this position.
It seems like emotions, not facts.
1 reply →
What is the sensitive nature of a photo of two people both long dead?
1 reply →
Yeah, I know the point of this site is to give us a dystopian shock by showing us how much information Big Tech extracts from our photos, but it's inadvertently a pretty good advertisement for Google's Vision API. It did a fantastic job of summarizing the photos I threw at it.
I mean I wouldn't trust either entity. If you're serious about maintaining some semblance of privacy then you should opt for a local solution such as BakLLaVa or Llama-3.2-Vision models.
https://huggingface.co/llava-hf/bakLlava-v1-hf
https://huggingface.co/meta-llama/Llama-3.2-11B-Vision