← Back to context

Comment by dmichulke

1 month ago

Forgive my ignorance but aren't they already on huggingface?

I assumed turboquant optimizations are already everywhere - in llama-cpp, or the quantization machinery of unsloth and the likes.

I forked it to also add rotorquant. This is a specific optimization that uses clifford rotors instead of static compile time random purmutation to store the activations. Reduces space and parameter count for the storage.