← Back to context

Comment by abeppu

1 year ago

I think this is a much more tractable problem if one doesn't think in terms of diversity with respect to identify-associated labels, but thinks in terms of diversity of other features.

Consider the analogous task "generate a picture of a shirt". Suppose in the training data, the images most often seen with "shirt" without additional modifiers is a collared button-down shirt. But if you generate k images per prompt, generating k button-downs isn't the most likely to result in the user being satisfied; hedging your bets and displaying a tee shirt, a polo, a henley (or whatever) likely increases the probability that one of the photos will be useful. But of course, if you query for "gingham shirt", you should probably only see button-downs, b/c though one could presumably make a different cut of shirt from gingham fabric, the probability that you wanted a non-button-down gingham shirt but _did not provide another modifier_ is very low.

Why is this the case (and why could you reasonably attempt to solve for it without introducing complex extra user controls)? A _use-dependent_ utility function describes the expected goodness of an overall response (including multiple generated images), given past data. Part of the problem with current "demo" multi-modal LLMs is that we're largely just playing around with them.

This isn't specific to generational AI; I've seen a similar thing in product-recommendation and product search. If in your query and click-through data, after a user searches "purse" if the results that get click-throughs are disproportionately likely to be orange clutches, that doesn't mean when a user searches for "purse", the whole first page of results should be orange clutches, because the implicit goal is maximizing the probability that the user is shown a product that they like, but given the data we have uncertainty about what they will like.