← Back to context Comment by danielabinav160 3 days ago Would love to see these numbers reproduced on consumer GPUs, not just A100s. 3 comments danielabinav160 Reply wolttam 3 days ago This is an efficiency improvement that significantly lowers the amount of RAM you have to look at, on average, during decode.It should improve performance on most hardware because most LLMs are memory bandwidth bound during decode. tommica 3 days ago Maybe somaday an 8gb videocard can be used for coding... romanusrome 3 days ago [dead]
wolttam 3 days ago This is an efficiency improvement that significantly lowers the amount of RAM you have to look at, on average, during decode.It should improve performance on most hardware because most LLMs are memory bandwidth bound during decode.
tommica 3 days ago Maybe somaday an 8gb videocard can be used for coding... romanusrome 3 days ago [dead]
This is an efficiency improvement that significantly lowers the amount of RAM you have to look at, on average, during decode.
It should improve performance on most hardware because most LLMs are memory bandwidth bound during decode.
Maybe somaday an 8gb videocard can be used for coding...
[dead]