← Back to context

Comment by rs38

1 month ago

my latest experiments with local LLM (mistral coder variations) fitting in older 6 GB GTX1060 were disappointing as long as you try to hook Copilot (CLI or VScode) to it and are used to provide a lot tooling. this seems to bloat initial prompt to 20k and more which seems the bottleneck if I did not completely misconfigured things. output tokens/s are more than fine, but PP is frustrating / unusable.