Comment by rorylaitila
4 months ago
When I started cataloging my vintage ad collection (https://adretro.com), I originally started with a defined set of entities (like brands, or product categories). I used OpenAI vision to extract out the categories the ad belongs to. However, I found that it would simply not be consistent in its classification. So I decided to let the model classify however it wants, and I map those results back to my desired ontology after the fact. My mapping is manually in my case. But I could see how I could use techniques to dynamically cluster.
I would love to better understand what you mean by "classify however it wants." Is the output structured?
Yeah, the output is json structured, but I mean the entity value that is returned. A simple case is classifying the Brand of the ad. It might return any of "Ford", "Ford Motor Company", "Ford Trucks", "The Ford Motor Company", "Lincoln Ford" even on very similar ads. Rather than try to enhance the prompt like "always use 'Ford Motor Company' for every kind of Ford" I just accept whatever the value is. I have a dictionary that maps all brands back to a canonical brand on my end.
What are you using to build the dictionary? Particularly when it encounters something you've never seen before.
This is really interesting to me.
1 reply →