Comment by rs38
1 month ago
my latest experiments with local LLM (mistral coder variations) fitting in older 6 GB GTX1060 were disappointing as long as you try to hook Copilot (CLI or VScode) to it and are used to provide a lot tooling. this seems to bloat initial prompt to 20k and more which seems the bottleneck if I did not completely misconfigured things. output tokens/s are more than fine, but PP is frustrating / unusable.
No comments yet
Contribute on Hacker News ↗