Comment by ipsum2
2 days ago
GGUF is easy to implement, but you'd probably find better performance with tflite on mobile for their custom XNNPACK kernels. Performance is pretty critical on low-power devices.
2 days ago
GGUF is easy to implement, but you'd probably find better performance with tflite on mobile for their custom XNNPACK kernels. Performance is pretty critical on low-power devices.
We are writing our own backend, but tflite (now called LiteRT) was not faster than GGML when we tested and GGML is already well supported. But we are moving away completely anyway.