← Back to context Comment by bodegajed 3 hours ago 1.5B models can run on CPU inference at around 12 tokens per second if I remember correctly. 1 comment bodegajed Reply moffkalast 3 hours ago Ingesting multiple code files will take forever in prompt processing without a GPU though, tg will be the least of your worries. Especially when you don't append but change it in random places so caching doesn't work.
moffkalast 3 hours ago Ingesting multiple code files will take forever in prompt processing without a GPU though, tg will be the least of your worries. Especially when you don't append but change it in random places so caching doesn't work.
Ingesting multiple code files will take forever in prompt processing without a GPU though, tg will be the least of your worries. Especially when you don't append but change it in random places so caching doesn't work.