Comment by mncharity

4 hours ago

Oh, awesome. On my doables list was to try combining text tokens with "scent" embeddings, to give LLMs a higher-dimensional reading experience. In a file listing, larger files might smell "heavy" or "large". Recently modified files "untried" or "freshly disturbed". Files with a history of bugs, "worrisome". Complex files might smell of "be cautious here - fragile". Smelly `ls`.

Or, you might save token sampling telemetry (perplexity, etc) alongside a CoT and result. So when read, it's like a captured performance - this sentence smells "hesitant", that one "confused". Poetry vs prose. Or, a consistentcy checker might add smells of "something's not right here". Or... emojis that emote.

For a dog, that's not merely a lamppost, it's richly-evocotive local history. To a dev long experienced with some codebase, that's not merely a filename, it's that nasty file that bites.

One open question is whether you can find and calibrate embeddings to provide an informative whiff, without badly degrading reasoning. And be cautious of, and suspicious of changes to, a scary file, without becoming too avoidant. Also, salience bias. Also, imagine debugging scent hallucinations.

Activation-rich text - auxiliary non-linguistic embeddings as meta-signals... the random silliness local LLMs encourage.