Comment by deoxykev

1 year ago

Yeah, there is a clear bottleneck somewhere in llama.cpp. Even high end hardware is struggling to get good numbers. The theoretical limit should be higher, but it's not yet.