Comment by rusk
1 day ago
Spent a week trying to get sensible results out of llama 3.3 At one point it even simulated doing the work, log output and everything and when I challenged it about the missing artefacts it actually started questioning my intelligence. Seems appropriate for a Zuck enterprise.
Qwen on the other hand got straight to work with astonishing competency on the same system.
From what I read llama3 needs beefier compute to reliably invoke tools, which I presume relates to it focussing more on simulating AGI rather than being a useful tool.
You might find this helpful. llama is not anywhere near the Pareto distribution (performance vs cost)
https://arena.ai/leaderboard/code/webdev/pareto?license=open...
https://arena.ai/leaderboard/text/pareto?license=open-source
Llama3.1 instruct seems to be doing okay on that page, mostly because it's dirt cheap.
llama 3? Are you from 2023?