Comment by alex43578

7 days ago

To massively oversimplify, they are all boxes that predict the next token based on material they’ve seen before + human training for desirable responses.

You’d have to have a very poorly RLHF’d model (or a very weird system prompt) for it to draw you a Terminator, pastoral scene, or pelican riding a bicycle as its self image :)

I think that’s what made Grok’s Mechahitler glitch interesting: it showed how astray the model can run if you mess with things.

> You’d have to have a very poorly RLHF’d model (or a very weird system prompt) for it to draw you a Terminator, pastoral scene, or pelican riding a bicycle as its self image :)

How about a pastoral scene with a terminator pelican riding a bike? Jokes aside I get what you're saying, and it obviously makes total sense.