Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it needed to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.
I've also found it's helpful to have it keep an "experiment log" at the bottom of the original spec, or in another document, which it must update whenever things take "a surprising turn"
Honest question: what do you do when your spec has grown to over a megabyte?
Some things I've been doing:
- Move as much actual data into YML as possible.
- Use CEL?
- Ask Claude to rewrite pseudocode in specs into RFC-style constrained language?
How do you sync your spec and code both directions? I have some slash commands that do this but I'm not thrilled with them?
I tend to have to use Gemini for actually juggling the whole spec. Of course it's nice and chunked as much as it can be? but still. There's gonna need to be a whole new way of doing this.
If programming languages can have spooky language at a distance wait until we get into "but paragraph 7, subsection 5 of section G clearly defines asshole as..."
What does a structured language look like when it doesn't need mechanical sympathy?
YML + CEL is really powerful and underexplored but it's still just ... not what I'm actually wanting.
Do you plan on writing about the other lessons you learned, which you mentioned in the README? As a big fan of your software and writing for many years, I would deeply appreciate your perspective using these tools!
There're multiple task solutions for Claude or other llms that let it define tasks, add implementation notes and (crucially) add sub-tasks and dependencies. I'm using Beads (https://github.com/steveyegge/beads) and I think it really improves the outcome; especially for larger projects.
Yes, Opus could check the image to see if it matched the prompt, but I adviced the model to stop and ask the human for a better check and a description of what the cause of the corrupted image could be. But the fact it could catch obvious regressions was good.
People can say what they want about LLMs reducing intelligence/ability; The trend has clearly been that people are beginning to get more organized, document things better, enforce constraints, and think in higher-level patterns. And there's renewed interest in formal verification.
LLMs will force the skilled, employable engineer to chase both maintainability and productivity from the start, in order to maintain a competitive edge with these tools. At least until robots replace us completely.
This development workcycle pattern lends nicely to Antigravity, which kind of does 80% this out the box, and can be nudged to do the rest with a little bit of prompting.
A suggestion born of experience: besides printing the seed for an image, add it to the image file as metadata. Otherwise, if you're me, you'll lose it.
Thanks for sharing this — I appreciate your motivation in the README.
One suggestion, which I have been trying to do myself, is to include a PROMPTS.md file. Since your purpose is sharing and educating, it helps others see what approaches an experienced developer is using, even if you are just figuring it out.
One can use a Claude hook to maintain this deterministically. I instruct in AGENTS.md that they can read but not write it. It’s also been helpful for jumping between LLMs, to give them some background on what you’ve been doing.
In this case, instead of a prompt I wrote a specification, but later I had to steer the models for hours. So basically the prompt is the sum of all such interactions: incredibly hard to reconstruct to something meaningful.
Isn't the "steering" in the form of prompts? You note "Even if the code was generated using AI, my help in steering towards the right design, implementation choices, and correctness has been vital during the development." You are a master of this, let others see how you cook, not just taste the sauce!
I only say this as it seems one of your motivations is education. I'm also noting it for others to consider. Much appreciation either way, thanks for sharing what you did.
This steering is the main "source code" of the program that you wrote, isn't it? Why throw it away. It's like deleting the .c once you have obtained the .exe
Regarding the meta experiment of using LLMs to transpile to a different language, how did you feel about the outcome / process, and would you do the same process again in the future?
I've had some moments recently for my own projects as I worked through some bottle necks where I took a whole section of a project and said "rewrite in rust" to Claude and had massive speedups with a 0 shot rewrite, most recently some video recovery programs, but I then had an output product I wouldn't feel comfortable vouching for outside of my homelab setup.
I depends on the situation. In this case the agent worked only using the reference code provided by Flux's Black Forest Labs which is basically just the pipeline implemented as a showcase. The fundamental way for this process to work is that the agent can have a feedback to understand if it is really making progresses, and to debug failures against a reference implementation. But then all the code was implemented with many implementation hints about what I wanted to obtain, and without any reference of other minimal inference libraries or kernels. So I believe this just is the effect of putting together known facts about how Transformers inference works plus an higher level idea of how software should appear to the final user. Btw today somebody took my HNSW implementation for vector sets and translated it to Swift (https://github.com/jkrukowski/swift-hnsw). I'm ok with that, nor I care of this result was obtained with AI or not. However it is nice that the target license is the same, given the implementation is so similar to the C one.
This is pretty great. I’ve gone and hacked your GTE C inference project to Go purely for kicks, but this one I will look at for possible compiler optimizations and building a Mac CLI for scripting…
I have a set of prompts that are essentially “audit the current code changes for logic errors” (plus linting and testing, including double checking test conditions) and I run them using GPT-5.x-Codex on Claude generated code.
It’s surprising how much even Opus 4.5 still trips itself up with things like off-by-one or logic boundaries, so another model (preferably with a fresh session) can be a very effective peer reviewer.
So my checks are typically lint->test->other model->me, and relatively few things get to me in simple code. Contrived logic or maths, though, it needs to be all me.
I ran a similar experiment last month and ported Qwen 3 Omni to llama cpp. I was able to get GGUF conversion, quantization, and all input and output modalities working in less than a week. I submitted the work as a PR to the codebase and understandably, it was rejected.
The refusal because often AI writes suboptimal GGML kernels looks very odd, to me. It means that who usually writes manually GGML kernels, could very easily steer the model into writing excellent kernels, and even a document for the agents can be compiled with the instructions on how to do a great work. If they continue in this way, soon a llama.cpp fork will emerge that will be developed much faster and potentially even better: it is unavoidable.
The refusal is probably because OP said "100% written by AI" and didn't indicate an interest in actually reviewing or maintaining the code. In fact, a later PR comment suggests that the AI's approach was needlessly complicated.
Some projects refuse for copyright reasons. Back when GPT4 was new, I dug into pretraining reports for nearly all models.
Every one (IIRC) was breaking copyrights by sharing 3rd-party works in data sets without permission. Some were trained on patent filings which makes patent infringement highly likely. Many breaking EULA's (contract law) by scraping them. Some outputs were verbatim reproductions of copyrighted works, too, which could get someoen sued if they published them.
So, I warned people to stay away from AI until (a) training on copyrighted/patented works was legal in all those circumstances, (b) the outputs had no liability, and (c) users of a model could know this by looking at the pretraining data. There's no GPT3- or Claude-level models produced that way.
On a personal level, I follow Jesus Christ who paid for my sins with His life. We're to be obedient to God's law. One is to submit to authority (aka don't break man's law). I don't know that I can use AI outputs if they were illegally trained or like fencing stolen goods. Another reason I want the pretraining to be legal either by mandate or using only permissible works.
Note: If your country is in the Berne Convention, it might apply to you, too.
If I asked Claude to do the same can I also just put MIT license on it with my name? https://github.com/black-forest-labs/flux2 uses Apache License apparently. I know it doesn't matter that much and as long as it's permissive and openly available people don't care it's just pedantics but still.
The reference code shows how to setup the inference pipeline. It does not implement 99% of what the C code does. That is, the inference kernels, the transformer and so forth.
i would love if you took the time to instruct claude to re-implement inference in c/c++, and put an mit license on it, it would be huge, but only if it actually works
As someone who doesn’t code in C and does more analytics work (SQL), is the code generated here “production grade?” One of the major criticisms I hear about llms is they tend to generate code that you wouldn’t want to maintain, is that the case here?
Those statements are mostly out of date and symptomatic of pre-agent-optimized LLMs. Opus 4.5 with clarifying rules in the CLAUDE.md does a good job at following idiomatic best practices in my experience.
That said, I'm mixed on agentic performance for data science work but it does a good job if you clearly give it the information it needs to solve the problem (e.g. for SQL, table schema and example data)
Not my experience. All frontier models I constantly test, agentic or not, produce code less maintainable than my (very good) peers and myself (on a decent day).
Plus they continue to introduce performance blunders.
Crying wolves, on day maybe there will be a wolf and I may be the last of us to check whether that's true.
> I believe that inference systems not using the Python stack (which I do not appreciate) are a way to free open models usage and make AI more accessible.
What you're saying here is that you do not appreciate systems not using the Python stack, which I think is the opposite of what you wanted to say.
I am an ESL speaker but I don't see why the sentence fragment in parentheses couldn't be parsed as relating only to "Python stack" as opposed to "systems not using the Python stack". I read it that way, but again, as an ESL speaker, I might be missing intuition or actual grammatical knowledge that would tick off a native speaker such as, presumably, yourself.
It is based upon context, you are correct that it is ambiguious, as is the problem of most natural language.
-I believe that <inference systems not using the Python stack> (which I do not appreciate) are a way to free open models usage and make AI more accessible.
This reading of the text would lead one to believe they don't appreciate inferences systems not written in python. Given the inference system produced by the author is also not using the python stack (it is in C), we can assume this is not the correct reading.
-I believe that inference systems not using the <Python stack> (which I do not appreciate) are a way to free open models usage and make AI more accessible.
This reading says that the author does not like the python stack for inference, which given the author has produced this inference in C, would support the statement.
That is we have to take both readings and think which one fits the context around it, hopefully this helps :)
> I wanted to see if, with the assistance of modern AI, I could reproduce this work in a more concise way, from scratch, in a weekend.
I don't think it counts as recreating a project "from scratch" if the model that you're using was trained against it. Claude Opus 4.5 is aware of the stable-diffusion.cpp project and can answer some questions about it and its code-base (with mixed accuracy) with web search turned off.
The two projects have literally nothing in common. Not a line of code, not the approach, nor the design. Nothing. LLMs are not memorization machines that recall every project in the cut & paste terms you could think of.
I supported Redis against Valkey because I felt software should not be appropriated like that.
Now that the Redis author supports broad copyright violations and has turned into an LLM influencer, I regret having ever supported Redis. I have watched many open source authors, who have positioned themselves as rebels and open source populists, go fully corporate. This is the latest instance.
One of the most important thing to do right now to redistribute something to the society, is to use AI to write free software: more free software than ever. If AI will be hard to access in the future, the more software it is released free, the better. If instead things go well (as I hope), there will be just a multiplication of the effect of OSS using today and tomorrow AI. In any way, writing free software using AI is a good idea, IMHO. I believe LLMs are the incarnation of software democratization, which aligns very well with why I used to write OSS. LLMs "steal" ideas, not verbatim code, you can force them to regurgitate some verbatim stuff, but most of it is ideas, and we humans also re-elaborate things we saw and we avoid (like LLMs are able to do) to emit the same stuff verbatim. But the software can't be patented for very good reasons, and LLMs capture all this value that is not subject to intellectual property, and provides it to the people that don't have the right tools and knowledge. And, it allows people that can code, to code 100x more.
I dont understand, so its just to generate the pic using a model. Isn't that trivial, whats the advantage of doing it in C? Is the model running in C? Readme is overly verbose and It seems like a project that just does one task and it costed the author $80.
I don't do AI coding at that level, so anyone please correct me.
Usually bulk of the AI coding is done via python libs building on pytorch. So if you release anything, you want to get it running on that ecosystem. But the whole ecosystem is really heavy. My python folder is almost 6GB, for ComfyUI, a generative image AI Web GUI. This also has some models in in, so it's not just raw python code and it's much more then you need then just for raw inference. But you can see that it's messy and huge. Maintaining these installs usually comes with quite some complaints.
The model itself also cannot generate images without some code and you need to plug a few individual components together like text encoding, a VAE and the model to get everything working. Most of the components will live on the GPU but you need code to prepare the data and send it to the GPU to process it.
Pytorch is fairly efficient, but python as an ecosystem can have some blind spots or bottlenecks, so if you want to minmax efficiency you'd prefer native solutions.
Just in case, the author is the guy who made Redis. Even if it's an AI project, you can expect some baseline quality.
This is both awesome and scary. Yes, now we can embed image gen in things like game engines and photoshop or build our own apps. On the other hand, we can include image gen in anything…
It's almost as if this is the first time many have seen something built in C with zero dependencies which makes this easily possible.
Since they are used to languages with package managers adding 30 package and including 50-100+ other dependencies just before the project is able to build.
The Python libraries are themselves written in C/C++, so what this does performance-wise is, at best, cutting through some glue. Don't think about this as a performance-driven implementation.
Something that may be interesting for the reader of this thread: this project was possible only once I started to tell Opus that it needed to take a file with all the implementation notes, and also accumulating all the things we discovered during the development process. And also, the file had clear instructions to be taken updated, and to be processed ASAP after context compaction. This kinda enabled Opus to do such a big coding task in a reasonable amount of time without loosing track. Check the file IMPLEMENTATION_NOTES.md in the GitHub repo for more info.
Very cool!
Yep, a constantly updated spec is the key. Wrote about this here:
https://lukebechtel.com/blog/vibe-speccing
I've also found it's helpful to have it keep an "experiment log" at the bottom of the original spec, or in another document, which it must update whenever things take "a surprising turn"
Honest question: what do you do when your spec has grown to over a megabyte?
Some things I've been doing:
- Move as much actual data into YML as possible.
- Use CEL?
- Ask Claude to rewrite pseudocode in specs into RFC-style constrained language?
How do you sync your spec and code both directions? I have some slash commands that do this but I'm not thrilled with them?
I tend to have to use Gemini for actually juggling the whole spec. Of course it's nice and chunked as much as it can be? but still. There's gonna need to be a whole new way of doing this.
If programming languages can have spooky language at a distance wait until we get into "but paragraph 7, subsection 5 of section G clearly defines asshole as..."
What does a structured language look like when it doesn't need mechanical sympathy? YML + CEL is really powerful and underexplored but it's still just ... not what I'm actually wanting.
3 replies →
Looks like default OpenCode / Claude Code behavior with Claude models. Why the extra prompt ?
1 reply →
Salvatore - this is cool. I am a fan of using Steve Yegge's beads for this - it generally cuts the markdown file cruft significantly.
Did you run any benchmarking? I'm curious if python's stack is faster or slower than a pure C vibe coded inference tool.
Do you plan on writing about the other lessons you learned, which you mentioned in the README? As a big fan of your software and writing for many years, I would deeply appreciate your perspective using these tools!
There're multiple task solutions for Claude or other llms that let it define tasks, add implementation notes and (crucially) add sub-tasks and dependencies. I'm using Beads (https://github.com/steveyegge/beads) and I think it really improves the outcome; especially for larger projects.
Was the LLM using vision capabilities to verify the correctness of it's work? If so, how was that verification method guided by you?
Yes, Opus could check the image to see if it matched the prompt, but I adviced the model to stop and ask the human for a better check and a description of what the cause of the corrupted image could be. But the fact it could catch obvious regressions was good.
It's funny watching people rediscover well-established paradigms. Suddenly everyone's recreating software design documents [0].
People can say what they want about LLMs reducing intelligence/ability; The trend has clearly been that people are beginning to get more organized, document things better, enforce constraints, and think in higher-level patterns. And there's renewed interest in formal verification.
LLMs will force the skilled, employable engineer to chase both maintainability and productivity from the start, in order to maintain a competitive edge with these tools. At least until robots replace us completely.
[0] https://www.atlassian.com/work-management/knowledge-sharing/...
So Codex would do that task with regular spec and no recompacting?
This development workcycle pattern lends nicely to Antigravity, which kind of does 80% this out the box, and can be nudged to do the rest with a little bit of prompting.
A suggestion born of experience: besides printing the seed for an image, add it to the image file as metadata. Otherwise, if you're me, you'll lose it.
Thanks for sharing this — I appreciate your motivation in the README.
One suggestion, which I have been trying to do myself, is to include a PROMPTS.md file. Since your purpose is sharing and educating, it helps others see what approaches an experienced developer is using, even if you are just figuring it out.
One can use a Claude hook to maintain this deterministically. I instruct in AGENTS.md that they can read but not write it. It’s also been helpful for jumping between LLMs, to give them some background on what you’ve been doing.
In this case, instead of a prompt I wrote a specification, but later I had to steer the models for hours. So basically the prompt is the sum of all such interactions: incredibly hard to reconstruct to something meaningful.
I've only just started using it but the ralph wiggum / ralph loop plugin seems like it could be useful here.
If the spec and/or tests are sufficiently detailed maybe you can step back and let it churn until it satisfies the spec.
Isn't the "steering" in the form of prompts? You note "Even if the code was generated using AI, my help in steering towards the right design, implementation choices, and correctness has been vital during the development." You are a master of this, let others see how you cook, not just taste the sauce!
I only say this as it seems one of your motivations is education. I'm also noting it for others to consider. Much appreciation either way, thanks for sharing what you did.
This steering is the main "source code" of the program that you wrote, isn't it? Why throw it away. It's like deleting the .c once you have obtained the .exe
1 reply →
Doesn’t Claude Code allow to just dump entire conversations, with everything that happened in them?
3 replies →
Regarding the meta experiment of using LLMs to transpile to a different language, how did you feel about the outcome / process, and would you do the same process again in the future?
I've had some moments recently for my own projects as I worked through some bottle necks where I took a whole section of a project and said "rewrite in rust" to Claude and had massive speedups with a 0 shot rewrite, most recently some video recovery programs, but I then had an output product I wouldn't feel comfortable vouching for outside of my homelab setup.
I depends on the situation. In this case the agent worked only using the reference code provided by Flux's Black Forest Labs which is basically just the pipeline implemented as a showcase. The fundamental way for this process to work is that the agent can have a feedback to understand if it is really making progresses, and to debug failures against a reference implementation. But then all the code was implemented with many implementation hints about what I wanted to obtain, and without any reference of other minimal inference libraries or kernels. So I believe this just is the effect of putting together known facts about how Transformers inference works plus an higher level idea of how software should appear to the final user. Btw today somebody took my HNSW implementation for vector sets and translated it to Swift (https://github.com/jkrukowski/swift-hnsw). I'm ok with that, nor I care of this result was obtained with AI or not. However it is nice that the target license is the same, given the implementation is so similar to the C one.
This is pretty great. I’ve gone and hacked your GTE C inference project to Go purely for kicks, but this one I will look at for possible compiler optimizations and building a Mac CLI for scripting…
This repo has Swift wrappers, not a rewrite of hnsw.c, which apparently you weren't the only author of.
1 reply →
I have a set of prompts that are essentially “audit the current code changes for logic errors” (plus linting and testing, including double checking test conditions) and I run them using GPT-5.x-Codex on Claude generated code.
It’s surprising how much even Opus 4.5 still trips itself up with things like off-by-one or logic boundaries, so another model (preferably with a fresh session) can be a very effective peer reviewer.
So my checks are typically lint->test->other model->me, and relatively few things get to me in simple code. Contrived logic or maths, though, it needs to be all me.
I ran a similar experiment last month and ported Qwen 3 Omni to llama cpp. I was able to get GGUF conversion, quantization, and all input and output modalities working in less than a week. I submitted the work as a PR to the codebase and understandably, it was rejected.
https://github.com/ggml-org/llama.cpp/pull/18404
https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF
The refusal because often AI writes suboptimal GGML kernels looks very odd, to me. It means that who usually writes manually GGML kernels, could very easily steer the model into writing excellent kernels, and even a document for the agents can be compiled with the instructions on how to do a great work. If they continue in this way, soon a llama.cpp fork will emerge that will be developed much faster and potentially even better: it is unavoidable.
The refusal is probably because OP said "100% written by AI" and didn't indicate an interest in actually reviewing or maintaining the code. In fact, a later PR comment suggests that the AI's approach was needlessly complicated.
2 replies →
I wonder if some of the docs from https://app.wafer.ai/docs could be used to make the model be better at writing GGML kernels. Interesting use case.
Some projects refuse for copyright reasons. Back when GPT4 was new, I dug into pretraining reports for nearly all models.
Every one (IIRC) was breaking copyrights by sharing 3rd-party works in data sets without permission. Some were trained on patent filings which makes patent infringement highly likely. Many breaking EULA's (contract law) by scraping them. Some outputs were verbatim reproductions of copyrighted works, too, which could get someoen sued if they published them.
So, I warned people to stay away from AI until (a) training on copyrighted/patented works was legal in all those circumstances, (b) the outputs had no liability, and (c) users of a model could know this by looking at the pretraining data. There's no GPT3- or Claude-level models produced that way.
On a personal level, I follow Jesus Christ who paid for my sins with His life. We're to be obedient to God's law. One is to submit to authority (aka don't break man's law). I don't know that I can use AI outputs if they were illegally trained or like fencing stolen goods. Another reason I want the pretraining to be legal either by mandate or using only permissible works.
Note: If your country is in the Berne Convention, it might apply to you, too.
1 reply →
If I asked Claude to do the same can I also just put MIT license on it with my name? https://github.com/black-forest-labs/flux2 uses Apache License apparently. I know it doesn't matter that much and as long as it's permissive and openly available people don't care it's just pedantics but still.
The reference code shows how to setup the inference pipeline. It does not implement 99% of what the C code does. That is, the inference kernels, the transformer and so forth.
i would love if you took the time to instruct claude to re-implement inference in c/c++, and put an mit license on it, it would be huge, but only if it actually works
FWIW stable-diffusion.cpp[0] (which implements a lot more than just stable diffusion, despite the name) is already a MIT licensed C++ library.
[0] https://github.com/leejet/stable-diffusion.cpp/
As someone who doesn’t code in C and does more analytics work (SQL), is the code generated here “production grade?” One of the major criticisms I hear about llms is they tend to generate code that you wouldn’t want to maintain, is that the case here?
It's not bad. Skimming the code I'd say it's not enterprise quality but it's definitely better than an amateur throwaway project.
Those statements are mostly out of date and symptomatic of pre-agent-optimized LLMs. Opus 4.5 with clarifying rules in the CLAUDE.md does a good job at following idiomatic best practices in my experience.
That said, I'm mixed on agentic performance for data science work but it does a good job if you clearly give it the information it needs to solve the problem (e.g. for SQL, table schema and example data)
Not my experience. All frontier models I constantly test, agentic or not, produce code less maintainable than my (very good) peers and myself (on a decent day).
Plus they continue to introduce performance blunders.
Crying wolves, on day maybe there will be a wolf and I may be the last of us to check whether that's true.
> I believe that inference systems not using the Python stack (which I do not appreciate) are a way to free open models usage and make AI more accessible.
What you're saying here is that you do not appreciate systems not using the Python stack, which I think is the opposite of what you wanted to say.
I am an ESL speaker but I don't see why the sentence fragment in parentheses couldn't be parsed as relating only to "Python stack" as opposed to "systems not using the Python stack". I read it that way, but again, as an ESL speaker, I might be missing intuition or actual grammatical knowledge that would tick off a native speaker such as, presumably, yourself.
It is based upon context, you are correct that it is ambiguious, as is the problem of most natural language.
-I believe that <inference systems not using the Python stack> (which I do not appreciate) are a way to free open models usage and make AI more accessible.
This reading of the text would lead one to believe they don't appreciate inferences systems not written in python. Given the inference system produced by the author is also not using the python stack (it is in C), we can assume this is not the correct reading.
-I believe that inference systems not using the <Python stack> (which I do not appreciate) are a way to free open models usage and make AI more accessible.
This reading says that the author does not like the python stack for inference, which given the author has produced this inference in C, would support the statement.
That is we have to take both readings and think which one fits the context around it, hopefully this helps :)
> I wanted to see if, with the assistance of modern AI, I could reproduce this work in a more concise way, from scratch, in a weekend.
I don't think it counts as recreating a project "from scratch" if the model that you're using was trained against it. Claude Opus 4.5 is aware of the stable-diffusion.cpp project and can answer some questions about it and its code-base (with mixed accuracy) with web search turned off.
The two projects have literally nothing in common. Not a line of code, not the approach, nor the design. Nothing. LLMs are not memorization machines that recall every project in the cut & paste terms you could think of.
I supported Redis against Valkey because I felt software should not be appropriated like that.
Now that the Redis author supports broad copyright violations and has turned into an LLM influencer, I regret having ever supported Redis. I have watched many open source authors, who have positioned themselves as rebels and open source populists, go fully corporate. This is the latest instance.
One of the most important thing to do right now to redistribute something to the society, is to use AI to write free software: more free software than ever. If AI will be hard to access in the future, the more software it is released free, the better. If instead things go well (as I hope), there will be just a multiplication of the effect of OSS using today and tomorrow AI. In any way, writing free software using AI is a good idea, IMHO. I believe LLMs are the incarnation of software democratization, which aligns very well with why I used to write OSS. LLMs "steal" ideas, not verbatim code, you can force them to regurgitate some verbatim stuff, but most of it is ideas, and we humans also re-elaborate things we saw and we avoid (like LLMs are able to do) to emit the same stuff verbatim. But the software can't be patented for very good reasons, and LLMs capture all this value that is not subject to intellectual property, and provides it to the people that don't have the right tools and knowledge. And, it allows people that can code, to code 100x more.
> redistribute something to the society
with a proprietary black box tool you pay a subscription for? that's nonsense
I dont understand, so its just to generate the pic using a model. Isn't that trivial, whats the advantage of doing it in C? Is the model running in C? Readme is overly verbose and It seems like a project that just does one task and it costed the author $80.
I don't do AI coding at that level, so anyone please correct me.
Usually bulk of the AI coding is done via python libs building on pytorch. So if you release anything, you want to get it running on that ecosystem. But the whole ecosystem is really heavy. My python folder is almost 6GB, for ComfyUI, a generative image AI Web GUI. This also has some models in in, so it's not just raw python code and it's much more then you need then just for raw inference. But you can see that it's messy and huge. Maintaining these installs usually comes with quite some complaints.
The model itself also cannot generate images without some code and you need to plug a few individual components together like text encoding, a VAE and the model to get everything working. Most of the components will live on the GPU but you need code to prepare the data and send it to the GPU to process it. Pytorch is fairly efficient, but python as an ecosystem can have some blind spots or bottlenecks, so if you want to minmax efficiency you'd prefer native solutions.
Just in case, the author is the guy who made Redis. Even if it's an AI project, you can expect some baseline quality.
Yes, the model runs in C, you just provide the model weights to the program.
The main advantage is that you don't need the python interpreter to run the program.
While not revolutionary, it is definitely not trivial and its main purpose is to demonstrate Claude code abilities in a low level, non trivial task.
The author of this project is also the author of redis. He knows what he is doing.
Running inference for a model, even when you have all the weights, is not trivial.
because of the principle: you only understand what you can create. You think you know something until you have to re-create it from scratch.
This is both awesome and scary. Yes, now we can embed image gen in things like game engines and photoshop or build our own apps. On the other hand, we can include image gen in anything…
This was possible before, though
Yes, it was always possible.
It's almost as if this is the first time many have seen something built in C with zero dependencies which makes this easily possible.
Since they are used to languages with package managers adding 30 package and including 50-100+ other dependencies just before the project is able to build.
1 reply →
how fast is this compare to python based?
Very slow currently, I added the benchmarks in the README. To go faster it needs to implement inference faster than the current float32-only kernels.
PyTorch MPS is about 10x faster per the README.md.
The Python libraries are themselves written in C/C++, so what this does performance-wise is, at best, cutting through some glue. Don't think about this as a performance-driven implementation.
No cuBLAS?
Related:
FLUX.2 [Klein]: Towards Interactive Visual Intelligence
https://news.ycombinator.com/item?id=46653721