Comment by heyitsguay

1 day ago

The two outputs (an image and a string sequence) are learned from entirely different data, even when they're generated with the same model, which is not always the case. There's nothing intrinsic to the learning process that connects image generation and procedural graphics code.

If one wanted to try that, it would likely take the standard LLM finetuning route - scrape a bunch of data pairs at some expense, write some prompts, and have at it.