Comment by megadragon9
1 day ago
I'm continuing to expand my own deep learning library [1] built with numpy-primitives to support LLM post-training techniques like supervised fine-tuning (SFT) and reinforcement learning with GRPO. It's a good learning experience to work without all the high-level abstractions to "build a wheel" and "use that wheel to build a car".
I'm also looking into coding harness self-improvement [2]. An inner LLM (raw LLM request) + harness solves coding tasks, an outer agent like Claude or Codex that proposes harness changes. I experimented with many things in the past few months that made me realize this self-improvement thing that everyone is talking about is just an experiment design problem. I wrote about it here [3]. I'm continuing to improve the infra around the self-improvement loop, to increase signal-to-noise ratio per experiment. I'm also generalizing the infra to expand beyond terminal bench tasks and to collect some data across different models (harness-bound vs model-bound).
[1] https://github.com/workofart/ml-by-hand
[2] https://github.com/workofart/harness-experiment
[3] https://www.henrypan.com/blog/2026-05-25-self-improvement-ha...
No comments yet
Contribute on Hacker News ↗