Comment by benob

14 days ago

A variant I have been thinking of: each parameter matrix (or block) is the sum of a random matrix (generated from a seed) and a low rank matrix (a LoRA). I'd like to experiment training from scratch in that setting.