Comment by smallmancontrov

16 hours ago

Does anyone know where that style came from? Did it become popular in listicles or on github or something? Or is there one person deep inside OpenAI or Anthropic who built the synthetic data pipeline and one day made the decision on a whim to doom us to an eternity of emoji bullet points?

18 comments

smallmancontrov

mediaman 16 hours ago

I think it likely performed well in A/B preference tests with chat users.

I've noticed Claude does far fewer listicles than ChatGPT. I suspect that they don't blindly follow supervised learning feedback from chats as much as ChatGPT. I get Apple vs Google design approach from those two companies, in that Apple tends not to obsess over interaction data, instead using design principles, while Google just tests everything and has very little "taste."

In general I feel like the data approach really blinds people to the obvious problem that "a little" of something can be preferable while "a lot" of the same is not. I don't mind some bullet points here and there but when literally everything is in bullet points or pull quotes it's very annoying. I prefer Claude's paragraph style.

I suppose the downside is that using "taste" like Apple does can potentially lead a product design far, far away from what people want (macOS 26), more so than a data approach, whereas a data approach will not get it so drastically wrong but will never feel great.

rororournouh 16 hours ago
I’m given to understand that Anthropic uses something called Constitutional AI, where there is a central document of desirable and undesirable qualities (as well as reinforcement learning) whereas OpenAI relies more heavily on direct human feedback and rating with human trainers evaluating responses and the model conforming to those preferences.
I also much prefer the output of Claude at present.
- kccqzy 15 hours ago
  
  Yeah and for much of the HN crowd, we aspire to have better tastes than the average. So if the supervised learning uses average human trainers it will most likely be seen as having poor taste for much of HN.
  
  1 reply →
jimbokun 10 hours ago

I think the “taste” approach at Apple died with Steve Jobs.
tikhonj 11 hours ago

Eh, Facebook today is farther from what anybody "wants" than macOS 26, and Facebook is about as blindly data-driven as they come.
Turns out you can get away with a lot when you have a quasi-monopoly on an addictive product, and you buy out your realistic competitors...
stefan_ 4 hours ago
There was a time when also Claude would absolutely fill code with emojis, which is why now their system prompt has
> Claude does not use emojis unless the person in the conversation asks it to
- tardedmeme 2 hours ago
  
  I think it's funny how we are all tweaking LLM output by adding instructional tokens instead of, say, finding a vector that indicates "user asked for emojis", and forbidding emoji tokens in the sampling unless that vector passes a threshold.

Aurornis 15 hours ago

I first noticed it when Notion became popular.

All of the PMs I interacted with across companies started using Notion for everything at the same time. Filling Notion documents with emojis was the style of the time.

This slightly pre-dated AI tools becoming entirely usable for me.

ex-aws-dude 10 hours ago
Was going to say the same
Notion-core
- mikestorrent 7 hours ago
  
  Insert the grug IQ curve meme here. Some folks really like to hyper-optimize on tooling side quests.

xmcqdpt2 13 hours ago

It's the style of "blazing fast library made with :heart: in rust :crab:" that was popular in github README.md. My guess is that because the models are told to use md they overfit to the style of md documents too.

hilariously 14 hours ago

First saw it in overly peppy Rails libraries and using gitmoji more than 10 years ago.

mikestorrent 7 hours ago

Imagine how much work that all took... carefully colourizing your CLI.... and now it just gets spat out

dspillett 13 hours ago

Both predate common use of LLMs, unless my memory is even more shaky than usual on this. I'm sure I saw them appear a fair amount on GitHub and related project pages, but I couldn't tell you more specifically how they started & grew.

Somehow they must have been over-represented in the training data (or something in the tokenising/training/other processes magnifies the effective presence of punctuation) because I don't remember them being that common and LLMs seem to love spewing them out. Or perhaps it is a sign of the Habsburg problem: people asked LLMs to produce README files like that because they'd seen the style elsewhere, it having spread more organically at first, and the timing was just right for lots of those early examples to get fed back into training data for subsequent models.

dwedge 14 hours ago

It was an annoying way of writing on places like LinkedIn and marketing copy for 3 or 4 years before LLMs appeared on the scene. I remember realising that I can't read them (my brain jumps between the words and the picture making it hard to focus on the content) before AI appeared.