← Back to context

Comment by tarruda

2 years ago

I get what you mean, but what would such "qualitative evaluation" look like?

4 comments

tarruda

Reply

carbocation 2 years ago

I think my ideal might be as simple as a few people who spend a lot of time with various models describing their experiences in separate blog posts.

tarruda 2 years ago
I see.
I can't give any anecdotal evidence on ChatGPT/Gemini/Bard, but I've been running small LLMs locally over the past few months and have amazing experience with these two models:
- https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B (general usage)
- https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instr... (coding)
OpenChat 3.5 is also very good for general usage, but IMO NeuralHermes surpassed it significantly, so I switched a few days ago.
- fasttransients 2 years ago
  
  Thank you for the suggestions – really helpful for my hobby project. Can't run anything bigger than 7B on my local setup, which is a fun constraint to play with.
- carbocation 2 years ago
  
  Thanks! I’ve had a good experience with the deepseek-coder:33b so maybe they’re on to something.