← Back to context

Comment by itishappy

6 days ago

Those examples seem quite unrelated to one another. The first reads as admitting intentional fraud and deceit, the second reads like dealing with imposter syndrome. I'd love to know the prompt.

Also, not sure how you can judge a style to be clearly better than another. The workflow of generating a bunch of stories in the style of different authors and then voting on a favorite just seems like picking a favorite author. Will the system ever prefer short, hard-hitting sentences? Sure enough, convergence is a noted behavior.

> Also, not sure how you can judge a style to be clearly better than another.

This one is actually easy: The writing style used for a horror is different than what you'd use for a romance novel. Example: If you give it a prompt that asks the AI to generate something in the style of a romance author but the rest of the prompt is describing a horror or sci-fi story you'll end up with something that most people would objectively decide, "ain't right."

> Those examples seem quite unrelated to one another. The first reads as admitting intentional fraud and deceit, the second reads like dealing with imposter syndrome. I'd love to know the prompt.

Yeah. And to read the rest of each of the stories it generated...

Both paragraphs are simply short excerpts which involve no actual narrative, never mind the stuff that LLMs are typically weak at (maintaining consistency, intricate plotting and pacing, subtlety in world and character building) which in the context of stories are far more important to improve than its phrasing.

The fact that the "improvement" apparently eliminates a flaw in the first passage ("gentle vibrations that vibrated through my very being" is pretty clunky description unlikely to be written by a native human; both paragraphs are otherwise passable and equally mediocre writing) by implying apparently completely different (and frankly less interesting) character motivations makes me doubt that it's actually iteratively improving stories rather than just spitting out significant rewrites which incidentally eliminate glaring prose issues.

  • Yeah as we mention in the blog it's really hard to eval on short passages. If you go on the Github can see longer stories where the change is more noticeable. Both those stories are from the same prompt