Open Euro LLM: Open LLMs for Transparent AI in Europe

9 months ago (openeurollm.eu)

288 comments

joecobb

I'm HIGHLY sceptical. The academics will love it, because they get money. But look at that list of parties involved. More than twenty parties supplying people; none of them will have this initiative on the top of their list of loyalties and priorities.

Meaning, everyone will talk, noone will take charge, some millions change hands and we continue with business as usual.

Instead this should have been a single new non-profit or whatever with deep pockets that convinces smart people to give their 100% for a while.

Death by committee. And I say this as someone who was in a multi-million research program across ~8 universities, that was going to do "groundbreaking" research. After a few months everyone was back to pushing their own lines of research, there was almost zero collaboration let alone common language or goal setting.

bayindirh 9 months ago
I can see that you're unfamiliar with how EU grants and how these project collections work, but I don't have much time to address this with great detail.
As a person who's in this type of projects for a long time, what I can say is "it works", because people do not compete with each other, but will build it together.
What I can say is, if they have came this far, there's already plans about what to do, and how to do, and none of the parties are inexperienced in these kinds of things.
- misiek08 9 months ago
  
  "It works!" is the only thing that will be visible on web page after hundreds of milions will be burned. I’m observing few of such „unprecedented” cooperation projects from EU funds. A lot of meetings, a lot of managers, plenty of very unskilled people creating mess and few names doing presentations so companies will believe everybody know what are they doing. Same from company side - they need being in those projects to comply with stupid EU rules about being eco.
  Ball of mud.
  
  37 replies →
- badlogic 9 months ago
  
  As someone who has lived through Eurostar and Horizon 2020, and who has participated as both a researcher and corporate partner, I can say: it does not work.
  Unless by work you mean "successfully passed the post-project review by non-experts based on a bunch of slides"
  Point at a single project of this sort that had any tangible output that's still in use.
  
  7 replies →
- hedgew 9 months ago
  
  My experience from these projects is the opposite. The projects are always secondary priorities for participants, and the difficulty of coordinating some dozen entirely separate organisations towards something actually productive is immense. In practice each participant independently spends the money they get on something lightly relevant, and the occasional coordination meetings are spent on planning how to fulfill the reporting requirements of the grant.
  Business and research are difficult enough even when done by tightly knit teams and constantly tested against real world systems and customer feedback. The idea that a hodgepodge of organisations can achieve poorly defined yet aspirational goals on a low budget is massively misguided.
- closewith 9 months ago
  
  > I can see that you're unfamiliar with how EU grants and how these project collections work, but I don't have much time to address this with great detail.
  This is a take that can only come from someone who is dependent on Horizon, because I don't think any independent observer could look at Horizon projects and say they just work.
  
  8 replies →
- ngcazz 9 months ago
  
  Having worked on an FP7 programme myself and having a family member involved in project audits, I’d say some skepticism is warranted—particularly regarding the incentives that attract private sector partners and the talent they actually allocate once funds are secured.
  Funding is tied to employee qualifications and effectively subsidises salaries, which creates room for misalignment. No-shows of allocated employees were not uncommon, since a company willing to accept lower-quality deliverables can assign junior employees to do the work at a fraction of the cost, while the salary difference for their PhDs simply becomes added margin.
- lifty 9 months ago
  
  Can you tell me please what you worked on and where I can see the output? I’ve been adjacent to these kind of efforts and the only thing I can say is that I’m highly skeptical of your claims.
- jansan 9 months ago
  
  In the case of Quaero [1] "it did not work". Sure, all involved parties were praising the project and by constantly shifting goalposts they could label it a success, but in the end it was a huge waste of money, sucked in by the usual suspects.
  [1] https://en.wikipedia.org/wiki/Quaero
  
  1 reply →
- Maken 9 months ago
  
  While I do think EU grants are a good thing, I'm sceptic about these too-big-to-fail multi-national projects. I still remember the Human Brain Project.
- menaerus 9 months ago
  
  I largely see this type of collaboration as a very inefficient form of a distributed company (team) where members of that team do not have other incentive but to (mostly) collect points on research papers. There is no incentive to actually build a product in such a setting and there is no incentive to remain competitive since you cannot be fired, or penalized in some other form. And generally speaking, as an individual you don't care about the industry (market) competition since you mostly care about remaining relevant within your very narrow scope of your research topic. So, this is why this doesn't work. There is no coherent mass toward the same goal. Seemingly there is but there isn't.
  
  16 replies →
- DocTomoe 9 months ago
  
  When has it ever worked?
  Remember the EU Search Engine project, Quaero, and its equally failed successor, Theseus? No? I thought so.
- ShareTheVista 9 months ago
  
  Can you tell us about a project like this that worked?
  
  10 replies →
- CrimsonRain 9 months ago
  
  All talk. No show. That's EU.
  I'll believe it works after they finally have one success
  
  21 replies →
rhubarbtree 9 months ago

Ah, so you’ve been an academic before, then.
The problem is academic culture is corrupt, and it’s very hard to reverse the decay.
Simple example: one Russell Group UK university (like many others) was admitting students who couldn’t speak English. A lecturer on a technical subject found they were struggling to understand his course, in part due to the language barrier. Come the exam, most of the students failed. He was told to make the exam easier so they would pass. The lecturer involved is a well meaning kindly man who would consider himself very ethical. But he did what he was told and the students passed.
In such a system it’s hard to see how an individual can fix it. If he had protested, he’d have been gently moved aside and the exam would have been rewritten by someone else.
Research is similarly corrupt. Grants are written to match a call, and they promise the earth. Friends review them and score highly. Pals on the grant committee favour their friends. And it’s implicitly agreed that the outcomes don’t have to be achieved. You go back to doing your original research, or not doing much at all, or more likely figuring out how to get some papers published and writing more grant proposals.
The idealistic, actually interested in progressing the field, leave or are squeezed out, looked over for lectureships in favour of folks who bring in grants via bs and politics.
Choose a topic you know about. Go on the EPSRC website. Look at grants ten years ago and see what their promised outcomes were.
My only answer is that a project like this must be done by people hired from outside of academia, which at this point is probably corrupt beyond repair. I look back at previous generations and wonder how the hell so much advancement was achieved.
tmikaeld 9 months ago
” The models will be developed within Europe's robust regulatory framework, ensuring alignment with European values while maintaining technological excellence.”
They may release something, but i doubt it will be more useful than what already exists.
- bayindirh 9 months ago
  
  > They may release something, but i doubt it will be more useful than what already exists.
  I wouldn't put such prejudice in this thing. I'm not implying that you're wrong, but I'm highly skeptical that the model will be incompetent or inferior.
  Also, don't forget. They'll open source it end to end. From data to training/testing code and everything in between.
  
  2 replies →
rcarmo 9 months ago

As someone who worked in several Eurescom research projects back in the early 90s and watched it all get steamrolled by actual pragmatic work done in telcos and US manufacturers, I have zero faith in this even as a political/independence gesture.
There are loads of people who think "there is no moat and Europe can do this" (including the Portuguese government, which announced a Portuguese LLM at WebSummit--which, hilariously, is being trained on a research "supercomputer" in Spain), and they have no idea how far (politically, economically and pragmatically) Europe's tech scene is from the US. Other than Mistral, of course.
londons_explore 9 months ago

> Death by committee.
This is how the EU works. It's the reason the EU has very little innovation compared to the USA.
FranzFerdiNaN 9 months ago

> The academics will love it, because they get money.
Nice way to frame this.
yread 9 months ago
I'm involved with IMI-BIGPICTURE, a similarly sized EU initiative (~70M funding). It's not that bad. Things will take a while to start moving but as long as all the players stay on the same page shit will get done. 10x slower than with a small team but some things can't be done in small teams
- closewith 9 months ago
  
  > The project aims to create a repository of digital copies of around 3 million slides covering a range of disease areas. This repository will then be used to develop artificial intelligence tools that could aid in the analysis of slides.
  €70 MM to get the digital copies of 3 million slides. Speaks for itself.
raffraffraff 9 months ago

Can't talk specifics but I worked with a perpetually failing startup that spun out of a very prestigious university. The company was lined with way too many professors. Their burn rate must have been incredible, based on the huge investments they got. Their product was already "meh" before the AI boom made it utterly obsolete. They made huge promises but delivered poor results (in an area where 90% accuracy was basically useless). They never seemed to iterate on the product. Suddenly (like almost overnight) we got word that they were out of money and were likely to cease operating. At the 11th hour some idiot bailed them out, likely because of their academic credentials. (Certainly not because of their IP, product or output capability). Or maybe it was sunken cost fallacy. Idk.
Anyway, they're still failing along, burning through a seemingly infinite runway. Academia FTW!
bboygravity 9 months ago

You just described all EU tech subsidies.

jpdus 9 months ago

As someone who is in general skeptical of programs like this (and an European) there are 2 remarkable / timely things about this:

- This project doesn't just allocate money to universities or one large company, but includes top research institutions as well as startups and GPU time on supercomputing clusters. The participants are very well connected (e.g. also supported by HF, Together and the likes with European roots) - Deepseek has just shown that you probably can't beat the big labs with these resources, but you can stay sufficient close to the frontier to make a dent.

Europe needs to try this. Will this close the Gap to the US/China? Probably not. But it could be a catalyst for competitive Open source models and partially revitalize AI in Europe. let's see..

PS: on Twitter there was a screenshot yesterday that in a new EU draft, "accelerate" was used six times. Maybe times are changing a little bit.

Disclaimer: Our company is part of this project, so I might be biased.

riedel 9 months ago
I wish you the best of luck. However, this is basically a still just a European joint research project (admittedly compatibly well funded) with similar partners that have been also connected before in other research projects. To really compete in the space it will require new ideas, great talent and good leadership towards a common goal. I have myself been part of many EU funded projects and know the difficulty of realizing this within such a project. Public funding sadly has adversarial effects sometimes.
As for computing cost: as EuroHPC gives resources to research for free there can be more budget for computing. The EuroHPC joint undertaking has just decided to invest hundreds of millions of Euro in new AI clusters and supporting services. So this can come on top. Actually projects like this are much needed to also make good use of the money.
Disclaimer: my lab is involved in one of the new AI Factories.
- menaerus 9 months ago
  
  So, if one has a well thought-through idea, what is the process of getting the resources ($$$) from OpenEuroLLM and the compute from EuroHPC? How do I become a partner as a long-standing engineer with plenty of industry practice in research and development?
  I am asking this because I never really understood how EU funds are working, they always seemed to me as there's a lot of gate keeping.
  
  3 replies →
FanaHOVA 9 months ago
The problem is that: - These are not really super computing cluster in LLM terms. Leonardo is a 250 PFlops cluster. That is really not much at all. - If people in charge of this project actually believe R1 costs $5.5M to build from scratch, it's already over.
- jpdus 9 months ago
  
  I think no one believes that R1 costs $5.5m from scratch. People in this project (most, not all) are very aware of the realities in training and are very well connected in the US as well. Besides Leonardo there are JUWELS, LUMI & other which can be used for ablations and so on.
  This will never compete with what the frontier labs have (+ are building) but might be just enough for something, that is close enough to be a useful alternative :).
  PS: Huge fan of Latent Space :)
  
  1 reply →
- whimsicalism 9 months ago
  
  > If people in charge of this project actually believe R1 costs $5.5M to build from scratch, it's already over.
  wdym?
MR4D 9 months ago
The money doesn’t matter.
The goals don’t matter.
The people don’t matter.
The only thing that matters is how much regulatory red tape is involved.
My guess is that the paperwork will kill this. Read the announcement. Too much discussion about regulatory framework. In the US or China, all you need is some money and smart people. That’s a very low barrier to getting moving forward.
- askonomm 9 months ago
  
  In other words, to be successful you need to be able to break the law and lobby the government? That is indeed the USA mindset, or should I say United Corporations of America? I'm happy EU is not USA.
  
  6 replies →
- jpdus 9 months ago
  
  I agree that the announcement should´ve talked more about goals and performance than regulatory stuff ;-).
  But I think there is a new understanding among the bureaucracy that regulation (alone, without innovation) will kill Europe´s competitiveness and that some acceleration and cutting of red tape is necessary.
  Can't say with certainty that this will be successful. But that we, as a very young startup that is barely known outside of our AI Open Source niche, are part of this, is already a sign in itself - a year ago I´d have never believed that this might be an option (and also probably would've declined if someone asked us to join a EU-funded project).
  We will have engineers without a degree (but hundreds of thousands of HF downloads) working side-by-side with some of the top researchers + HPC centers.
  
  1 reply →
- permanent 9 months ago
  
  > China, all you need is some money and smart people
  No way
oytis 9 months ago

What I don't understand is the big plan. Say, you manage bring about something that works in the lab on par with DeepSeek R1. What happens with it next? In the market LLMs are being improved continuously based on feedback - in terms of usage data etc. and new versions are being released multiple times a year. If we want to stay sovereign, we need a similar engine started in Europe, but I can't see how a research project relying on a walled garden system of supercomputer centres can start it.
intelVISA 9 months ago

What route(s) did you go through for funding? As an outsider the bureaucracy fascinates me, I trust it's all open and transparent like the EU?
2-3-7-43-1807 9 months ago
> Deepseek has just shown that you probably can't beat the big labs with these resources
is that a new take? cause so far deepseek was considered as proof for small companies being able to compete with big players like openai ...
- jpdus 9 months ago
  
  might be debatable - but I tend to agree with Dario Amodei on this; my guess is that R1 is 7-10 months behind the internal frontier at the big labs, while having a few small novel tricks. (But i might err, will be interesting to see the development going forward)
  
  1 reply →

sarusso 9 months ago

They allocated €37.4 million [1]. As an European, I truly don’t understand why they keep ignoring that the money required for such projects is at least an order of magnitude more.

[1] https://digital-strategy.ec.europa.eu/en/news/pioneering-ai-...

esperent 9 months ago
Deepseek's release has shown that there's no great risk in getting left behind. All the info is out there, people with skills are readily available, creating a model that will match whatever current model is considered frontier level is not that hard for an entity like the EU.
For everyone here shouting that the EU needs to do something, be a leader, what have they lost so far by choosing to lead in legislation instead of development?
They've lost nothing. They've gained a lot.
They can use the same frontier level open source model as everyone else, and meanwhile, they can stay on top of harmful uses like social or credit scoring.
Also speaking as a European, legislation is kind of the point of a government in the first place. I do think the EU goes too far in many cases. But I haven't seen anything that makes me think they're dealing with this particular hype train badly so far. Play the safe long game, let everyone else spend all the money, see what works, focus on legislation of potentially dangerous technology.
- lolinder 9 months ago
  
  > legislation is kind of the point of a government in the first place
  I would personally consider legislation to be but one means to an end, with the point of a (democratic) government actually being to ensure stability and prosperity for its citizens.
  In that framework, "leading with legislation" doesn't make any sense—you can lead with results, but the legislation is not itself a result! Lead with development or lead with standard of living or lead with civil rights, but don't lead with legislation.
  Your formulation sounds like politician's logic: "something must be done, this is something, therefore we must do it". Legislation as an end in itself. Very interesting.
  https://www.youtube.com/watch?v=vidzkYnaf6Y
  
  1 reply →
- glooglork 9 months ago
  
  > They can use the same frontier level open source model as everyone else, and meanwhile, they can stay on top of harmful uses like social or credit scoring.
  We are dependent on models created by USA and Chinese companies for access to the technology that seems to be the next internet - while the entire world is accelerating hard towards protectionism and tariff wars.
  What could possibly go wrong
  
  1 reply →
- amunozo 9 months ago
  
  I partially agree with you. The only problem is that these markets are highly monopolistic, and we will be creating another technological dependency on the US.
- YetAnotherNick 9 months ago
  
  Deepseek didn't show anything except the compute cost of final model. We don't know how much data collection costed, how much unethical data like copyrighted data or OpenAI's data is needed, the cost of experiments etc.
  > Creating a model that will match whatever current model is considered frontier level is not that hard for an entity like the EU.
  If they have this as their top priority and allotted few billion dollars then sure. Not in the current form where the people involved are only involved for publication, not doing hard engineering things that takes months or years and they could do the same thing in OpenAI or Deepseek for like $1 million salary which both of them pay.
- tesch1 9 months ago
  
  > lead in legislation
  > legislation is kind of the point of a government
  As an American, most of this post reads like doublespeak satire. I guess it's not, but just to put a transatlantic pov here.
  I'll add a sports metaphor for good measure: in order to become expert football players, we'll get tickets to watch the best teams play.
  
  19 replies →
Etheryte 9 months ago
Personally I'm rather happy that the allocation was not too large at first, even that is quite a sizeable sum. The EU is great at kickstarting projects that sound like a panacea, but end up not leading to anything. Once they have something to show, by all means, throw more money at them.
- nradov 9 months ago
  
  The trap that these EU projects typically fall into is that they burn all of the grant funding on paying politically connected consultants to write reports. No one gets around to building an MVP.
  
  1 reply →
riedel 9 months ago
As said before in another comment. The project can likely make use of 'free' EuroHPC resources, which will also be funded simultaneously with hundreds of millions. Still not Stargate, but if they can actually innovate something beyond the obvious (like R1) I think the money is still useful.
- sarusso 9 months ago
  
  On what basis are you are stating this? I'm asking because I have been involved in another project like these (15M budget) and the main issue was the lack of computing resources allocation, because no one thought about it (true story).
  
  1 reply →
miohtama 9 months ago
Because Europe does not have enough money. This comes from taxes i.e. as an EU citizen you pay for the fun.
Private sector often does not fund projects like these as they have bad return on investment.
- Cumpiler69 9 months ago
  
  >Because Europe does not have enough money.
  They seem to have enough to send overseas and to spend on illegal economic migrants.
  >Private sector often does not fund projects like these as they have bad return on investment.
  Then why does the private sector in the US fund projects like these?
  
  7 replies →
egorfine 9 months ago

These millions were allocated for business-class tickets and accommodations for the talking members of this task force. So, 37.4m is plenty.
I have zero doubt that nothing else will come out of this.
Source: have been working with major UN and international bodies on the software side.
everyone 9 months ago
Not anymore, with Deepseek's stuff right? Which is open.
- sarusso 9 months ago
  
  DeepSeek had plenty of R&D expertise which were not included in the (declared) model training cost. Here we are talking about building something nearly from scratch, even if there is an open source starting point you still need the infrastructure, expertise and people to make it work, which with that budget are going to be hard to secure. Moreover these projects take months and months to get approved, meaning that this one was conceived long before DeepSeek, thus highlighting the original disalignment between the goal and the budget. DeepSeek might have changed the scenario (I hope so) but it would be just a lucky ex-post event… not a conscious choice behind that budget.
  
  1 reply →
- blackeyeblitzar 9 months ago
  
  DeepSeek probably spent closer to two billion on hardware. And then there’s the energy cost of numerous runs, staff costs, all of that. The 5.5m cost was basically misleading info, maybe used strategically to create doubt in the US tech industry or for DeepSeek’s parent hedge fund to make money off shorts.
  https://semianalysis.com/2025/01/31/deepseek-debates/
  
  2 replies →
rsynnott 9 months ago

I mean, I get that the current strategy by most participants seems to be burning billions on models which are almost immediately obsoleted, but it's... unclear whether this is a _good_ strategy. _Especially_ after deepseek has just shown that there _are_ approaches other than just "throw infinite GPUs at it".
Like, insofar as any of this is useful, working on, say, more techniques for reducing cost feels a lot more valuable than cranking out yet another frontier model which will be superseded within months.

thatguymike 9 months ago

So for €52mn you'll get... a worse Llama? But don't worry, it'll be "transparent and compliant" which will make people want to use it. Very European.

jamil7 9 months ago

> a worse Llama?
As someone who lives here, I'd actually be surprised if we even got that. I expect lots of taxpayer funded websites, manifestos, PowerPoints and numerous discussions and ultimately nothing.
dailykoder 9 months ago

That'd be very very good actually. I'd be happy if institutions would use that where one could TECHNICALLY (maybe just a miniscule amount of people would do that) verify data from end to end, instead of some "open" model that is actually not open at all. A little worse performance is a good trade-off imo
askonomm 9 months ago

What's with this American mentality that everything needs to always be the best, and if it isn't, it should't even exist? I know USA is alright with breaking the law, invading people's privacy and lobbying its government to the point where it's really the corporations that elect politicians into power, but why do you also need Europe to be the same way? I thought us Europeans have made it pretty clear we don't like your way of governing, so stop forcing it on us. I'd much rather use a less capable LLM if it meant that the LLM isn't driven on top of mountains of illegally collected data.
malthaus 9 months ago

you get money to sustain a bunch of academics and startups past their good-by date
the eu gets some publicity
and the public gets nothing but another bite out of their taxes

NunoSempere 9 months ago

They just shipped a frontpage; there is no model yet.

podgorniy 9 months ago

url says "Press release" https://openeurollm.eu/launch-press-release. They delivered what they promised.

beernet 9 months ago

The actual top EU AI labs like Mistral, Black Forest Labs, or Stability AI are nowhere to be seen. Same goes for potent, established companies like SAP, Schwarz Group and the like. They likely made the right move here as this is doomed to fail, as correctly elaborated by the top comments.

tobyhinloopen 9 months ago

Better late than never. I can't wait to get my hands on a mediocre AI that's 2 generations behind!

CrimsonRain 9 months ago
2 is generous. 5 is more likely.
Also can't wait to get bombarded with cookie popup, ai bias popup, then ai accuracy popup etc.
- Lionga 9 months ago
  
  Anything more then a powerpoint coming out of this would be a generous expectation.
- sofixa 9 months ago
  
  > ai bias popup
  It's all fun and games until AI models decide your type of people (blonde/brown/from that zip code/with that type of last name/went to that school/worked there in the past/have those facial features) are "bad" or "untrustworthy" or don't deserve healthcare or to be hired for that job or get a mortgage.
  "AI" bias has existed for as long as we have had "AI" in its various forms. Remember ML algorithms classifying black people as monkeys? And the "solution" was to make them unable to find monkeys or primates. That one got big because of the implication.. when it's "people with the last name Smith being dumb", nobody will care
  
  1 reply →
- sirsinsalot 9 months ago
  
  The alternative is businesses are not held to account. I'd much rather have a cookie pop-up and GDPR notices than businesses have no guard rails against moves that are not in the interest of the user/customer.
  
  3 replies →

voidr 9 months ago

> The models will be developed within Europe's robust regulatory framework

I'm sure that all AI research needs is "robust regulation".

As a European, it annoys me to no end that Brussels bureaucrats think they know and understand everything and they can regulate everything, the only thing they are achieving is making sure that AI companies will avoid forming in the EU, because nobody wants to be at a disadvantage compared to the rest of the world, sure eventually they will provide service to the EU countries, but we will never have our own industry.

The EU needs to stop having pencil pushers make decisions on things they have no clue about and somehow get people who know what they are talking about to make the choices.

tessitore 9 months ago

I see plenty of pessimism in the comments, & talk about how unqualified the organizations people are expected to be, without addressing how important this initiative is to EU & how necessary it is they succeed.

USA has just proven they are economically unpredictable & so unstable they have become fiscally volatile with control in the hands of the lobbiests. This is why Open LLM has the starting support it does already, & for soverign nations is seen as mission critical so as to avoid long term digital services taxations being leveraged like tarrifs against anyone who does not cooperate with whomever is leading the USA.

So to me it feels as though the project is impressive, & quite likely to succeed where others have failed because so few do understand the technology enough to get in the way of progress towards openly standardizing decentralization of AI compute across soverign cloud infrastructures. Even if Aleph Alpha is not able to lead development fast enough, organizations such as OUMI (Open Universal Machine Intelligence) will be working alongside them in attempts to build out the Linuxs of AI frontier modelling.

If nothing else, Open LLM guarantees a raise in the social standards of what it takes to succeed in AI long term. At worst it provides a measure bar of success of global AI innitiatives to be compared against while introducing new organizations & people to the open source ecosystem who would never of otherwise invested in it without the EU stamp of approval.

moffkalast 9 months ago

> The models will be developed within Europe's robust regulatory framework, ensuring alignment with European values while maintaining technological excellence

As a European, that's practically an oxymoron. The more one limits oneself to legally clean data, the worse the models will be.

I hate to be pessimistic from the get go, but it doesn't sound like anything useful will be produced by this and we'll have to keep relying on Google to do proper multilinguality in open models because Mistral can't be arsed to bother beyond French and German.

Fnoord 9 months ago
I've been using Mistral past week due to changes in geopolitics, and Mistral works absolutely great in English. I haven't bothered in my native language yet, but in English it worked great. Better than my first experience with ChatGPT (GPT 3.5), actually.
- Fnoord 9 months ago
  
  Update: tried a couple of Dutch (my native language) queries, and it worked well. No issues whatsoever. Which is no surprise, given Dutch <-> English and vice versa translations often work very well.
- moffkalast 9 months ago
  
  Ok I see we're very far from being on the same page.
  Multilingualism in context of language models means something more than English, because that's what every model trained on the internet already knows. There aren't any I'm aware of that don't, since it would be exceedingly hard to exclude it from the dataset even if you wanted to for some reason. This is like the "what about men's rights" when talking about women's rights... yes we know, they're already entirely ubiquitous.
  But more properly I would consider LLM multilingualism straight up knowing all languages. We benchmark models on the MMLU and similar collections that contain all fields of knowledge known to man, so I would say it's reasonable to expect fluency of all languages as well.
- lnauta 9 months ago
  
  I've been using mistral for most of January at the same rate as chatgpt before. I decided to pay for it as its per token (in and out) and the bill came yesterday... A whopping 1 cent. Thats probably rounded up.
  
  1 reply →
simion314 9 months ago
>As a European, that's practically an oxymoron. The more one limits oneself to legally clean data, the worse the models will be.
Train an LLM with text books and other legal books, you do not need to train it on pop culture to make it intelligent.
For face generations you might need to be more creative, you should not need milions of images stolen from social media to train your model.
But makes sense that tech giants do not want to share their data set and be transparent about stuff.
- jampekka 9 months ago
  
  > Train an LLM with text books and other legal books
  Without licenses to the books, they are just as illegal (and maybe even moreso) than web content.
  
  3 replies →
- idunnoman1222 9 months ago
  
  Open AI fed their original model Anna’s archive for breakfast.
htrp 9 months ago
>Mistral can't be arsed to bother beyond French and German.
Any more details here or a writeup you can link to?
- moffkalast 9 months ago
  
  My own experience mainly, only Gemma seems to have been any good for Slavic languages so far, and only the 27B when unquantized is reliable enough to be in any way usable.
  Ravenwolf posts tests on his German benchmarks every so often in locallama and most models seem to do well enough, but I've heard some claims from people about Mistral's being their favorite models in German anyhow. And I think Mistral-Large scores higher than Llama-405B in French on lmsys and that's at least something one would expect from a French company.
  
  1 reply →
jampekka 9 months ago
What do you mean by relying on Google?
Llama 3.1 and DeepSeek v3/R1 largest models are rather good at even a niche language like Finnish. The performance does plummet in the smaller versions, and even quantization may harm multilinguality disproportionally.
Something like deliberately distilling specific languages from the largest models could work well. Starting from scratch with a "legal" dataset will most likely fail as you say.
Silo AI (co-lead of this model) already tried Finnish and Scandinavian/Nordic models with the from-scratch strategy, and the results are not too encouraging.
https://huggingface.co/LumiOpen
- moffkalast 9 months ago
  
  Yes I think small languages which have a total corpus of maybe a few hundred million tokens total have no chance of producing a coherent model without synthetic data. And using synthetic data from existing models trained on all public (and less public) data is enough of a legally gray area that I wouldn't expect this project to consider it, so it's doomed before it even starts.
  Something like 4o is so perfect in most languages that one could just make an infinite dataset from it and be done with it. I'm not sure how OAI managed it tbh.

rixed 9 months ago

This article stayed on the front page of HN for a couple of days: https://timsh.org/tracking-myself-down-through-in-app-ads/

The author was in Europe.

Aparently, all the rules protecting the privacy of european citizens make no difference in practice.

I wonder why, but I believe the EU will look into this soon, since it would be so unconfortable if the king were bad.

microtonal 9 months ago
Or it just takes time to enforce the regulations. As an EU citizen, the recent regulations have already helped me a lot - a lot of companies provide data takeout now, it has become easier to remove accounts, many more websites ask specific consent, etc. Or even small things, our daughter's school has to ask specific consent if they can make photos and where they can post them (of group activities, etc.). Does everyone play according the rules? Not yet, but we will get there.
- 627467 9 months ago
  
  > our daughter's school has to ask specific consent if they can make photos and where they can post them
  The result of this is we don't see anything out daughter does in school because school decides to comply with draconian regulation by saying "fuck it". The same applies to having parents present daily: we don't touch the grounds of the school unless we make a formal request, we don't see the teacher everyday, we don't hear how the day went from professionals who actually spent time with them. This is all 100% the opposite of our experience outside of Europe before moving and I'm comparing public school system in a third world country to an European one. It's just an anecdote but it hasn't been more clear to me how much in a death spiral the EU is than the experience we currently are having
  
  3 replies →
rsynnott 9 months ago

Enforcement of the GDPR has been _grindingly_ slow; the first really significant fines weren't issued until 2022, when the Irish regulator finally pulled the finger out.
Presumably based on this experience, more recent internet-y laws (DMA, DSA, AI Act) are _not_ dependent on national regulators, and enforcement is getting off the ground more or less immediately. I'd expect that when the GDPR's successor shows up it'll follow suit.

fforflo 9 months ago

In the EU, we need some social contracts for those things. Multiple EU-funded projects are launched, consortiums between Unis and Private sector companies are created, deliverables are delivered, and grants are allocated, but the cumulative results haven't been that great, have they? That has happened for every "trend," from nuclear physics to "expert systems" in the 1990s to green tech, now AI, etc.

websap 9 months ago

Here's the simple question - who gets fired when the deliverables aren't met?
This is the single greatest motivators for American companies in our exhausting capitalistic society.

DocTomoe 9 months ago

> Europe's leading AI companies

Followed by a list of five EU AI companies I have never heard of before and which seem to have little to no market penetration.

Frieren 9 months ago

Answering some questions:

- Good data is available already for the project.

- There are already previously existing models, it is not starting from scratch.

- Companies like Red Hat, Volvo, SAAB are part of the initiative thru partners like AI Sweden. So, this is a initiative to support the public sector and universities but also will have commercial output.

All that is public information.

> The project, which has been awarded the STEP (Strategic Technologies for Europe Platform) seal, leverages support from previous European projects and the experience of the partners and their results, including large repositories of high-quality data and pilot LLMs developed previously. The consortium commences its work on February 1st, 2025, with funding from the European Commission under the Digital Europe Programme. - https://sciencebusiness.net/network-updates/charles-universi...

wg0 9 months ago

United we stand, divided we fall.

More than LLMs, Europe needs chip autonomy. ASAP.

Own fabs, own IP.

preisschild 9 months ago

We dont need "total autonomy" when we have friends that already produce chips, for example.
Rather, invest in defense and protect trade with Taiwan.
We can do what we are best at and produce lithography equipment at ASML and the Taiwanese at TSMC produce the chips with it and send them back to us.
flanked-evergl 9 months ago
Europe as a project is over, it's all downhill from here. You can't regulate yourself into economic growth, and there is nothing else the Eurocrats care to do. There is still lots of fat in the land that they can leech off before Europe kicks the bucket.
- kubb 9 months ago
  
  If I got an Euro for every time someone has expressed this opinion, I'd be a rich man. Save it until the EU really doesn't exist.
- rsynnott 9 months ago
  
  I mean, people have been saying this about the EU and its predecessor organisations since about 1950. If you believe the right-wing media, the EU should collapse a few times a year from various causes; it's as bad as the Roman Empire!
dailykoder 9 months ago

There are companies like infineon and elmos in germany (and i think a handful more). So it is happening, just not (yet) with consumer products
lodovic 9 months ago
Europe has ARM, AMD and ASML, but everything is produced elsewhere. Time to build giant automated chip fabs.
- thorianus 9 months ago
  
  "automated" is the issue when producing chips. You can automate some to a certain level of nm. Have you seen the labor needed for maintaing a single modern ASML machine? Also we got a lot, but Japan has the tools to verify results.

seydor 9 months ago

translation: a few finetunes of llama plus a lot of travel grants

flanked-evergl 9 months ago

Academia in Europe is a black hole which turns money into pipe dreams that has no practical application. It's really sad and frustrating, because so many of the pipe dreams sound cool, but it's clear that the people working in academia has no sense for what is needed for industry. The quality of the software they write is shockingly bad, and they have no interest in improving it because they only care about using it for academia anyway. It's like a walled garden where people get fat of poor people who keep everything going, just how it has always been.

glooglork 9 months ago

Even without the problem of insufficient financing, you probably need to bend some rules to be at the top of LLM game (not just how you collect the training data, we all sort of forgot that Altman was kicked out of OpenAI because the board thought he was prioritizing features over security).

With all the regulations and paperwork around EU projects, I can't really see them competing against private sector.

sirsinsalot 9 months ago
The losers in the "rule bending" are humans, whose creative works are turned into weights, whose livelihoods are diminished by indiscriminate greed.
I'm all for the advancement of AI, but not at the cost of humility and compassion for those enabling the models to be built.
Model building is already a community project, you just weren't asked if you wanted to contribute. You just did. Without compensation.
"rule bending" is putting it lightly.
- glooglork 9 months ago
  
  Data is getting scraped, models are being built, nobody is really going to stop that now.
  You may wish it wasn't like that (not you, but all of us), but there's no way China or USA block their companies in development of key technology like this, and I think we (EU countries) should act in the same way.

machiaweliczny 9 months ago

Do they have something tangible now? At least PWr (from Poland) have fine tuned Mistral and a lot of language data AFAIK - https://bielik.ai/

Anyway I think some novel training method is needed and I’m with Yann on this. They could luck into good idea but with this budget chances are < 1%

dailykoder 9 months ago

> will ensure that the models, software, data and evaluation will be fully open

I am VERY curious about that. Will they open up ALL the training data? Which would be a massive amount I'd guess, but I'd be curious where they got it and how they got it. Inb4 they just take some meta model and retrain it and then only publish the data from the fine tuned training.

openrisk 9 months ago

Just the extensive list of academic partners is a major difference with any existing effort on LLM's.

topazas 9 months ago

No name scientists will create an actual product. Plain funny or just sad?...

m3kw9 9 months ago

They want transparency but they give small peanuts, are they thinking they can just take llama and then distill other models into it can call it transparent? I’m not sure if they understand how that works

cynicalsecurity 9 months ago

> will build a family of performant, multilingual, large language foundation models for commercial, industrial and public services.

The commercial aspect must be implemented ASAP, or it's going to flop.

ranguna 9 months ago

With this kind of money, I have a feeling we'll only come up with a good dataset. LLM will be a completely different beast with completely different requirements.

alecco 9 months ago

This money would be better spent in producing good training data. But quality datasets don't generate the same administrative overhead or consulting fees.

mt_ 9 months ago

maybe 5 million for the html frontend

KolmogorovComp 9 months ago

Will they release the training data? That would really set them apart from other models.

Havoc 9 months ago

Im all for more models. Go for it

dcreater 9 months ago

No Mistral?

ein0p 9 months ago

Just take R1 base model and fine tune it "for Europe". Done.

tumetab1 9 months ago
They just to "fine tune it" for the EU 24 official languages :)
- ein0p 9 months ago
  
  It's already multilingual in most of those languages, if not all.

ur-whale 9 months ago

> The models will be developed within Europe's robust regulatory framework, ensuring alignment with European values while maintaining technological excellence.

translation: weakest competitor in the contest enters the fight with both hands tied behind its back and a budget akin to what OpenAI spends in a week on compute.

But hey, more power to you Europe: the more models around, the better.

Eventually, we'll be able to bind all those censored models worldwide into one giant mixture of expert to get rid of the built-in censorship of each individual component.

Xelbair 9 months ago

No model, so irrelevant because EU is extremely late to the party, as i say that as someone from EU.

octacat 9 months ago

Show us results, please

zero_k 9 months ago

"Europe's leading AI companies and research institutions" -- order should be reversed, if the EU is serious about AI. It should say: "Europe's leading research institutions and AI companies..."

intellectronica 9 months ago

History repeats itself as farce.

karol 9 months ago

Nonsense, got something then just share a link to eurogpt. Fed up with this academic BS.

phatnguyen 9 months ago

[flagged]

isoprophlex 9 months ago

Is this slop? You're posting slop, aren't you?
It's bad. Stop doing that. You're not doing anyone a service.
calmoo 9 months ago

Nobody wants to read your LLM generated garbage.