Comment by lllllm
10 months ago
benchmarks: we provide plenty in the over 100 page tech report here https://github.com/swiss-ai/apertus-tech-report/blob/main/Ap...
quantizations: available now in MLX https://github.com/ml-explore/mlx-lm (gguf coming soon, not trivial due to new architecture)
model sizes: still many good dense models today lie in the range between our small and large chosen sizes
Thank you! Why are the comparisons to llama3.1 era models?
we compared to GPT-OSS-20B, Llama 4, Qwen 3, among many others. Which models do you think are missing, among open weights and fully-open models?
Note that we have a specific focus on multilinguality (over 1000 languages supported), not only on english
How did it compare with Gemma 3 models? I’ve been impressed with Gemma 27b - but I try out local models frequently and I’m excited to boot up your 70b model on my 128gb MacBook Pro when I get home!
ah im sorry, I missed that - im not that blind usually..