Comment by menaerus

24 days ago

Do you use agentic AI yet for this type of optimization work or no?

For my work personally, agentic AI usage is pretty standard SWE fare (Cursor/CC). Even within the engine, optimizations are often centered around things like increasing communication/compute overlap (this is called Dual-Batch Overlap in vLLM).

Probably there are more interesting/easily verifiable agent loops you could try for kernel optimizations. At this point, the best are still written by hand, though. Ex: DeepEP kernels https://github.com/deepseek-ai/DeepEP