For my work personally, agentic AI usage is pretty standard SWE fare (Cursor/CC). Even within the engine, optimizations are often centered around things like increasing communication/compute overlap (this is called Dual-Batch Overlap in vLLM).
Probably there are more interesting/easily verifiable agent loops you could try for kernel optimizations. At this point, the best are still written by hand, though. Ex: DeepEP kernels https://github.com/deepseek-ai/DeepEP
For my work personally, agentic AI usage is pretty standard SWE fare (Cursor/CC). Even within the engine, optimizations are often centered around things like increasing communication/compute overlap (this is called Dual-Batch Overlap in vLLM).
Probably there are more interesting/easily verifiable agent loops you could try for kernel optimizations. At this point, the best are still written by hand, though. Ex: DeepEP kernels https://github.com/deepseek-ai/DeepEP