Justin's Linklog

A language model built for the public good

ETH Zurich are releasing a fully-open AI-Act-compliant large language model:

The model will be fully open: source code and weights will be publicly available, and the training data will be transparent and reproducible, supporting adoption across science, government, education, and the private sector. This approach is designed to foster both innovation and accountability.

A distinctive feature of the model is its capability in over 1000 languages. [...]

The LLM is being developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. In a external page recent study, the project leaders demonstrated that for most everyday tasks and general knowledge acquisition, respecting web crawling opt-outs during data acquisition produces virtually no performance degradation.

In late summer, the LLM will be released under the Apache 2.0 License. Accompanying documentation will detail the model architecture, training methods, and usage guidelines to enable transparent reuse and further development.

“As scientists from public institutions, we aim to advance open models and enable organiations to build on them for their own applications”, says Antoine Bosselut.

“By embracing full openness — unlike commercial models that are developed behind closed doors — we hope that our approach will drive innovation in Switzerland, across Europe, and through multinational collaborations. Furthermore, it is a key factor in attracting and nurturing top talent,” says EPFL professor Martin Jaggi.

Tags: switzerland transparency llm opensource llms ml ai open-source open-data models data-protection scraping

Archives

A language model built for the public good