In a significant move for Europe’s digital sovereignty, large language models (LLMs) have taken center stage with the announcement of OpenEuroLLM. This initiative aims to develop a series of “truly” open-source LLMs that will support all official languages of the European Union, currently numbering 24, as well as languages from countries like Albania that are in the process of negotiating EU membership. The overarching goal? To future-proof Europe’s digital landscape.
A Collaborative Venture
OpenEuroLLM is the result of a collaborative effort involving approximately 20 organizations, co-led by Jan Hajič, a computational linguist from Charles University in Prague, and Peter Sarlin, CEO of Finnish AI lab Silo AI, which was acquired by AMD for $665 million last year. This project is part of a broader European strategy to enhance digital sovereignty, bringing critical infrastructure and tools closer to home. Major cloud providers are investing heavily in local infrastructure to ensure that EU data remains within the region, while AI leader OpenAI has recently introduced offerings that allow customers to process and store data in Europe.
In a parallel effort, the EU has signed an $11 billion deal to establish a sovereign satellite constellation to compete with Elon Musk’s Starlink, further underscoring the region’s commitment to digital independence.
Funding and Feasibility Challenges
The budget allocated for the development of these models stands at €37.4 million, with around €20 million sourced from the EU’s Digital Europe Programme. While this may seem modest compared to the vast sums invested by corporate giants in AI, the total budget increases when considering funding for related initiatives. OpenEuroLLM partners with EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands, which collectively boast a budget of around €7 billion.
However, the diverse array of participating organizations—spanning academia, research, and industry—raises questions about the feasibility of achieving the project’s ambitious goals. Anastasia Stasenko, co-founder of LLM company Pleias, expressed skepticism about whether a large consortium could maintain the focused approach of smaller, agile AI firms like Mistral AI and LightOn.
Building on Existing Foundations
Whether OpenEuroLLM is starting from scratch or building on existing work is a matter of perspective. Since 2022, Hajič has been coordinating the High Performance Language Technologies (HPLT) project, which aims to develop free and reusable datasets, models, and workflows using high-performance computing (HPC). This project, set to conclude in late 2025, can be viewed as a precursor to OpenEuroLLM, as many partners from HPLT are also involved in this new initiative.
Hajič anticipates that the first versions of the models will be released by mid-2026, with final iterations expected by 2028. However, the project is still in its infancy, having only recently begun, and currently lacks substantial public-facing outputs.
Diverse Participation
The OpenEuroLLM consortium includes organizations from Czechia, the Netherlands, Germany, Sweden, Finland, and Norway, alongside corporate partners like Silo AI, Aleph Alpha, Ellamind, Prompsit Language Engineering, and LightOn. Notably absent is French AI unicorn Mistral, which has positioned itself as an open-source alternative to established players like OpenAI. Hajič attempted to engage Mistral in discussions, but those efforts did not yield results.
The project may still attract new participants, but only EU organizations can join due to funding restrictions, excluding entities from the U.K. and Switzerland.
Goals and Deliverables
The primary objective of OpenEuroLLM is to create “a series of foundation models for transparent AI in Europe,” while preserving the linguistic and cultural diversity of all EU languages. The deliverables are still being finalized, but the project aims to produce a core multilingual LLM for general-purpose tasks, along with smaller, efficient versions for edge applications.
Hajič emphasized the importance of quality, stating, “We don’t want to release something which is half-baked, because from the European point-of-view this is high-stakes, with lots of money coming from the European Commission — public money.”
Navigating Open Source Challenges
The OpenEuroLLM initiative faces the ongoing debate surrounding the definition of “open source” in AI. While the goal is to make everything open, Hajič acknowledged that some limitations may arise due to copyright restrictions. The project may need to keep certain training data confidential while ensuring compliance with EU AI regulations.
Collaboration and Competition
The emergence of OpenEuroLLM has sparked discussions about its similarities to another recent initiative, EuroLLM, which launched its first model in September. EuroLLM, co-funded by the EU, shares similar goals of developing an open-source European LLM. Andre Martins, head of research at Unbabel, highlighted these parallels on social media, urging the different communities to collaborate and share expertise rather than reinventing the wheel with each new project.
Hajič described the situation as “unfortunate,” expressing hope for potential cooperation, although he noted that OpenEuroLLM’s funding from the EU restricts collaborations with non-EU entities, including U.K. universities.
Funding Gaps and Future Prospects
The arrival of China’s DeepSeek, which promises an impressive cost-to-performance ratio, has raised questions about the true costs involved in building competitive AI systems. Peter Sarlin, technical co-lead on the OpenEuroLLM project, acknowledged the uncertainty surrounding DeepSeek’s development but remains optimistic about OpenEuroLLM’s funding sufficiency, primarily to cover personnel costs. A significant portion of AI system development costs is attributed to computing resources, which will largely be supported through partnerships with EuroHPC centers.
Sarlin emphasized that OpenEuroLLM is not aimed at creating consumer or enterprise-grade products. Instead, it focuses on developing foundational models that can serve as the AI infrastructure for European companies. “What we’re contributing is an open-source foundation model that functions as the AI infrastructure for companies in Europe to build upon,” he stated.
A Vision for Digital Sovereignty
As critics have noted, OpenEuroLLM involves numerous moving parts, a fact that Hajič acknowledges but views positively. “I’ve been involved in many collaborative projects, and I believe it has its advantages versus a single company,” he remarked. While acknowledging the successes of companies like OpenAI and Mistral, he hopes that the combination of academic expertise and corporate focus can yield innovative results.
Ultimately, the goal of OpenEuroLLM is not merely to compete with Big Tech or billion-dollar AI startups; it is about achieving digital sovereignty—creating open foundation LLMs that are built by and for Europe. Hajič concluded, “If, in the end, we are not the number one model, and we have a ‘good’ model, then we will still have a model with all the components based in Europe. This will be a positive result.”
Conclusion
OpenEuroLLM represents a significant step towards establishing a robust AI ecosystem in Europe, one that prioritizes transparency, linguistic diversity, and digital sovereignty. As the project unfolds, it will be crucial to monitor its progress, the challenges it faces, and the collaborative spirit it fosters among the diverse stakeholders involved. The success of OpenEuroLLM could set a precedent for future AI initiatives in Europe, paving the way for a more independent and innovative digital landscape.
For more information on the OpenEuroLLM project and its implications for AI in Europe, you can visit the official EU Digital Europe Programme page or explore the EuroHPC initiative for insights into high-performance computing in the region.