Yes, that is a serious skill. How many of the woes that we see is because people don't know what they want or are unable to describe it in such a way that others understand it.
I believe prompt engineer to properly convey how complex communication can be, when interacting with a multitude of perspectives, world views, assumptions, presumptions etc.
I believe it works well to counter the over-confidence that people have, from not paying attention to what gaps exist between what is said and what is meant.
Yes, obviously a role involving complex communication while interacting with a multitude of perspectives, world views, assumptions, presumptions, etc needs to be called "engineer."
That is why I always call technical writers "documentation engineers," why I call diplomats "international engineers," why I call managers "team engineers," and why I call historians "hindsight engineers."
I believe you're joking here, but I do think it'd be useful to have some engineering background in each of these domains.
The number of miscommunications that happen in any domain, due to oversight, presumptions and assumptions is vast.
At the very least the terminology will shape how we engage with it, so having an aspirational title like prompt engineer, may influence the level of rigor we apply to it.
Most designers can't, either. Defining a spec is a skill.
It's actually fairly difficult to put to words any specific enough vision such that it becomes understandable outside of your own head. This goes for pretty much anything, too.
… sure … but also no. For example, say I have an image. 3 people in it; there is a speech bubble above the person on the right that reads "I'A'T AY RO HERT YOU THE SAP!"¹
I give it,
Reposition the text bubble to be coming from the middle character.
DO NOT modify the poses or features of the actual characters.
Now sure, specs are hard. Gemini removed the text bubble entirely. Whatever, let's just try again:
Place a speech bubble on the image. The "tail" of the bubble should make it appear that the middle (red-headed) girl is talking. The speech bubble should read "Hide the vodka." Use a Comic Sans like font. DO NOT place the bubble on the right.
DO NOT modify the characters in the image.
There's only one red-head in the image; she's the middle character. We get a speech bubble, correctly positioned, but with a sans-serif, Arial-ish font, not Comic Sans. It reads "Hide the vokda" (sic). The facial expression of the middle character has changed.
Yes, specs are hard. Defining a spec is hard. But Gemini struggles to follow the specification given. Whole sessions are like this, and absolute struggle to get basic directions followed.
You can even see here that I & the author have started to learn the SHOUT AT IT rule. I suppose I should try more bulleted lists. Someone might learn, through experimentation "okay, the AI has these hidden idiosyncrasies that I can abuse to get what I want" but … that's not a good thing, that's just an undocumented API with a terrible UX.
(¹because that is what the AI on a previous step generated. No, that's not what was asked for. I am astounded TFA generated an NYT logo for this reason.)
Case in point, the final image in this post (the IP bonanza) took 28 iterations of the prompt text to get something maximally interesting, and why that one is very particular about the constraints it invokes, such as specifying "distinct" characters and specifying they are present from "left to right" because the model kept exploiting that ambiguity.
Hey! The author, thank you for this post! QQ, any idea roughly how much this experimentation cost you? I'm having trouble processing their image generation pricing I may just not be finding the right table. I'm just trying to understand if I do like 50 iterations at the quality in the post, how much is that going to cost me?
Yes, that is a serious skill. How many of the woes that we see is because people don't know what they want or are unable to describe it in such a way that others understand it. I believe prompt engineer to properly convey how complex communication can be, when interacting with a multitude of perspectives, world views, assumptions, presumptions etc. I believe it works well to counter the over-confidence that people have, from not paying attention to what gaps exist between what is said and what is meant.
Yes, obviously a role involving complex communication while interacting with a multitude of perspectives, world views, assumptions, presumptions, etc needs to be called "engineer."
That is why I always call technical writers "documentation engineers," why I call diplomats "international engineers," why I call managers "team engineers," and why I call historians "hindsight engineers."
I believe you're joking here, but I do think it'd be useful to have some engineering background in each of these domains. The number of miscommunications that happen in any domain, due to oversight, presumptions and assumptions is vast. At the very least the terminology will shape how we engage with it, so having an aspirational title like prompt engineer, may influence the level of rigor we apply to it.
3 replies →
[flagged]
It IS a skill. And most often it is disregarded by those who did not yet conquer it ...
We understand now that we interface with LLMs using natural and unnatural language as the user interface.
This is a very different fuzzy interface compared to programming languages.
There will be techniques better or worse at interfacing.
This is what the term prompt engineering is alluding to since we don’t have the full suite of language to describe this yet.
Not all models can actually do that if your prompt is particular
Most designers can't, either. Defining a spec is a skill.
It's actually fairly difficult to put to words any specific enough vision such that it becomes understandable outside of your own head. This goes for pretty much anything, too.
… sure … but also no. For example, say I have an image. 3 people in it; there is a speech bubble above the person on the right that reads "I'A'T AY RO HERT YOU THE SAP!"¹
I give it,
Now sure, specs are hard. Gemini removed the text bubble entirely. Whatever, let's just try again:
There's only one red-head in the image; she's the middle character. We get a speech bubble, correctly positioned, but with a sans-serif, Arial-ish font, not Comic Sans. It reads "Hide the vokda" (sic). The facial expression of the middle character has changed.
Yes, specs are hard. Defining a spec is hard. But Gemini struggles to follow the specification given. Whole sessions are like this, and absolute struggle to get basic directions followed.
You can even see here that I & the author have started to learn the SHOUT AT IT rule. I suppose I should try more bulleted lists. Someone might learn, through experimentation "okay, the AI has these hidden idiosyncrasies that I can abuse to get what I want" but … that's not a good thing, that's just an undocumented API with a terrible UX.
(¹because that is what the AI on a previous step generated. No, that's not what was asked for. I am astounded TFA generated an NYT logo for this reason.)
2 replies →
https://habitatchronicles.com/2004/04/you-cant-tell-people-a...
Yep, knowing how and what to ask is a skill.
For anything, even back in the "classical" search days.
1 reply →
Used to be called Google Fu
... and then iterating on that prompt many times, based on your accumulated knowledge of how best to prompt that particular model.
Case in point, the final image in this post (the IP bonanza) took 28 iterations of the prompt text to get something maximally interesting, and why that one is very particular about the constraints it invokes, such as specifying "distinct" characters and specifying they are present from "left to right" because the model kept exploiting that ambiguity.
Hey! The author, thank you for this post! QQ, any idea roughly how much this experimentation cost you? I'm having trouble processing their image generation pricing I may just not be finding the right table. I'm just trying to understand if I do like 50 iterations at the quality in the post, how much is that going to cost me?
1 reply →
right? 15 months ago in image models you used to have to designate rendering specifications, and know the art of negative prompting
now you can really use natural language and people want to debate you about how poor they are at articulating a shared concepts, amazing
it's like the people are regressing and the AI is improving
"amenable to highly specific and granular instruction"