Comment by pavelstoev
2 days ago
The hallucinated code was reusing memory buffers filled with previous results so not performing the actual computations. When this was fixed the AI generated code was like 0.3x of the baseline.
2 days ago
The hallucinated code was reusing memory buffers filled with previous results so not performing the actual computations. When this was fixed the AI generated code was like 0.3x of the baseline.
It is mentioned on section "Limitations and Bloopers" of the page [0]:
> Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have Twitter user @main_horse help test our CUDA kernels, to identify that The AI CUDA Engineer had found a way to “cheat”. The system had found a memory exploit in the evaluation code which, in a small percentage of cases, allowed it to avoid checking for correctness (...)
0. https://sakana.ai/ai-cuda-engineer
As I write this (after the updates to the evaluation code), https://pub.sakana.ai/ai-cuda-engineer/kernel/2/23/optimize-... is on their top of their list of speedups, with a claim of 128x speed up on a fused 3D convolution + groupnorm + mean.
The generated implementation doesn’t do a convolution.
The 2nd kernel on the leaderboard also appears to be incorrect, with a bunch of dead code computing a convolution and then not using it and writing tanhf(1.0f) * scaling_factor for every output.