Comment by shoffmeister

3 months ago

Switzerland, through EPFL, ETH Zurich, and the Swiss National Supercomputing Centre, has released a complete pipeline with all training data - that is "fully open", to my understanding.

See https://www.swiss-ai.org/apertus for details.

https://ethz.ch/en/news-and-events/eth-news/news/2025/07/a-l... was the press release.

1 comment

shoffmeister

YetAnotherNick 3 months ago

All the data used by Apertus is just data processed or generated by American companies(NVidia, Apple and huggingface mostly). They didn't release any new data.

Olmo and HF not only processed the data to address language bias, they also publish lot of data augmentation results including European language performance. European LLMs just claim that language bias is the motivator.