Comment by meander_water

1 day ago

Agree with the other commenters here that this doesn't feel like engineering.

However, Anthropic has done some cool work on model interpretability [0]. If that tool was exposed through the public API, then we could at least start to get a feedback loop going where we could compare the internal states of the model with different prompts, and try and tune them systematically.