They don't verify the correctness of their kernels. They expect you to pick the working ones from their kernel junkyard yourself.
The very idea is also dumb as hell. They could have done CUDA -> HIP/oneAPI/Metal/Vulkan/SYCL/OpenCL. Then they wouldn't need to beat the performance of anything, just the automatic porting would be worth an acquisition by AMD or Intel.
The hallucinated code was reusing memory buffers filled with previous results so not performing the actual computations. When this was fixed the AI generated code was like 0.3x of the baseline.
It is mentioned on section "Limitations and Bloopers" of the page [0]:
> Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have Twitter user @main_horse help test our CUDA kernels, to identify that The AI CUDA Engineer had found a way to “cheat”. The system had found a memory exploit in the evaluation code which, in a small percentage of cases, allowed it to avoid checking for correctness (...)
What do you mean?
They don't verify the correctness of their kernels. They expect you to pick the working ones from their kernel junkyard yourself.
The very idea is also dumb as hell. They could have done CUDA -> HIP/oneAPI/Metal/Vulkan/SYCL/OpenCL. Then they wouldn't need to beat the performance of anything, just the automatic porting would be worth an acquisition by AMD or Intel.
Problem with startups like Devin (AI sw engineer) and Sakana (AI research scientist) is that they are full of hot-air.
They get caught up in the hype, and focus on the marketing and not the essential engineering.
The hallucinated code was reusing memory buffers filled with previous results so not performing the actual computations. When this was fixed the AI generated code was like 0.3x of the baseline.
It is mentioned on section "Limitations and Bloopers" of the page [0]:
> Combining evolutionary optimization with LLMs is powerful but can also find ways to trick the verification sandbox. We are fortunate to have Twitter user @main_horse help test our CUDA kernels, to identify that The AI CUDA Engineer had found a way to “cheat”. The system had found a memory exploit in the evaluation code which, in a small percentage of cases, allowed it to avoid checking for correctness (...)
0. https://sakana.ai/ai-cuda-engineer
1 reply →
[dead]