← Back to context

Comment by Wikipedianon

9 days ago

Some English Wikipedia (enwiki) editors are striking. They are predominantly non-technical that are forced to maintain their own shadow IT-style infrastructure that Wikimedia (nonprofit owners of Wikipedia) doesn't provide. It is very difficult to be a productive editor without custom tooling at this point.

The reason why is because the laid off team maintained the Community Wishlist, the main way for editors to feature request for "professional" solutions.

The Wikimedia Foundation also deweighted popularity as a metric for tackling feature requests on the Community Wishlist. This pisses off enwiki as the largest editor base.

From the WMF's perspective, though, enwiki is a cash cow on the BCG matrix.[1] It has been in seemingly terminal decline for over a decade[2], accelerated by LLMs, yet still drives the majority of donations/clicks.

As a result, WMF prioritizes investing in emerging markets over enwiki. This means outreach to indigenous languages in the Global South and developing supporting infrastructure. e.g. "Abstract Wikipedia" which aims to use a language-neutral syntax that can be automatically translated into any language.

These currently form a tiny segment of the editor population but have much larger potential TAM and are growing. So it's the correct strategy even if it pisses off editors.

[1] https://en.wikipedia.org/wiki/Growth%E2%80%93share_matrix

[2] https://en.wikipedia.org/wiki/Wikipedia:Why_is_Wikipedia_los...

> As a result, WMF prioritizes investing in emerging markets over enwiki. This means outreach to indigenous languages in the Global South and developing supporting infrastructure. e.g. "Abstract Wikipedia" which aims to use a language-neutral syntax that can be automatically translated into any language.

I'd disagree that there is a causal relationship here. I think most of the outreach to indigneous languages has more to do with politics and ideology than anything else (Wikimedia sees itself as a global movement to collect all knowladge. Can't exactly claim that if its all english).

As for abstract wikipedia. I think that is more a moonshot project driven by people wanting to make the next wikidata. I suspect a major part of support for it is that they can use alternative sources of funding for it (grants).

  • The "abstract Wikipedia" just seems like a solved problem with LLMs.

    However sceptical of "AI" you are, "give me the information on this page in my preferred language" is the kind of task they excel at. (I won't use the word translate). It wouldn't even require prioritising the English Wikipedia: any agent today could one shot a task like "check the Wikipedia pages in all languages for X, summarize the results and note any disagreements between them".

    • Abstract wikipedia is taking a symbolic AI approach instead of an LLM or other statistical approach. The hope is (as i understand it) that this will provide reliability, predictability and better extend to languages that don't have a large corpus of text to train things on.

      Personally i think its a bit of a wild bet, that seems especially surprising in the modern context. Guess we'll have to see if it pans out.

      1 reply →

    • > However sceptical of "AI" you are, "give me the information on this page in my preferred language" is the kind of task they excel at.

      Except for the 90% or more of the world's 7000-ish languages which have barely any data online.

      E.g. the huge CommonCrawl corpus has stats https://commoncrawl.github.io/cc-crawl-statistics/plots/lang... for only 160 languages. English takes up nearly half the corpus, and after the top 16 or so all languages have <1% of the corpus, over half of those 160 have <0.1% and the other 6000+ languages are distributed amongst the <unknown> category. The long tail is very long.

      (You'll see people use the term "low-resource language" and then talk about Finnish or Macedonian – if you're not a linguist and you've heard of the language, it's most likely not low-resource ;-))

      1 reply →

    •   > give me the information on this page in my preferred language
      

      I'm sure that works great for European languages and other languages with huge corpus. Those are not the target languages of the program in question.

      4 replies →

    • It's not a good idea for common languages like German or English or French.

      But it is a great idea for indigenous languages that aren't in the training data but many people speak, which was the original purpose.

      I am hopeful that it'll create synthetic training data for those groups.

    • > "give me the information on this page in my preferred language" is the kind of task they excel at.

      ...So long as you don't mind it introducing random hallucinations into the information.

      1 reply →

> It is very difficult to be a productive editor without custom tooling at this point

this is extremely reminiscent of the stackexchange situation

>> Why is Wikipedia losing contributors

Perhaps because their message to new contributors is a consistent "stop trying to make corrections, and go away"?

  • I've made a significant number of edits to Wikipedia over the years. I probably have an account but generally don't even bother to sign in because I don't care about credit or a dynamic IP that will change in a week being recorded in the edit history, which they've apparently stopped doing anyway.

    My most recent edit (a minor addition to a technical article) was instantaneously reverted as "suspected vandalism" by a bot, an unambiguous false positive. The bot seemed to think I was going to follow its instructions if I thought it was a false positive instead of finding that irritating and concluding that I should stop making edits if having them actually go through requires me to fight with a broken AI.

    • You don't have to report the false positive, the link to the place to do it is just included in explanatory edit summary so that you can conveniently use it if you want to. (The reported ones eventually are reviewed by multiple other editors, and then, true or false, are included in the training data to improve the accuracy). The retrained bot is measured against the human-verified vandalism and non-vandalism data so that the bot is expected to generate 1% false positives of all the reverts it does.

      By the way, the bot will only revert an edit once, so you can undo that revert and the edit goes back in (at least until a human editor decides it should be reverted). The bot has available to it not just the change text and its placement in the existing article text, but also meta information such as the editor's account information (and I believe logged-out edits happen to get dinged more often simply because those are the major source of vandalizing edits).

  • Maybe I'm special, but as someone who doesn't have an account and just occasionally fixes errors or adds more context I've never had that happening for me. Or actually, once where I correct a fact to something that did not seem obvious, and it got reverted, but by adding it back with a long explanation and references it stuck. Ever since then I kept writing good "commit messages" just like for code and made sure to have reference to back up my claims and it works.

    To be fair I try to stay away from pop culture and politically sensitive topics.

  • That's the English Wikipedia community in a nutshell. The WMF knows it's an issue but can't do anything about it.

    There isn't enough work anymore in a monopolized but declining market. A shrinking pie forces cliquey political slugfests. It happened to IBM and can happen to StackOverflow/Wikipedia.

    I hate it now. There's so much doxxing and meanness. There's also sizable contingents of propagandists in anything controversial. Most famously, pro-Israel Icewhiz, who creates hundreds of sockpuppets and harassed people IRL, but now more recently r/Palestine's sock farm. There's similar farms in trans issues or India-Pakistan.

    The saddest part is that Wikipedia's original purpose was unbiased copyleft-style free knowledge.

    LLMs have the potential to democratize access to knowledge more than any other technology. But they are an existential threat to editors that previously did this deep research manually and served as gatekeepers with the attendant social status.

    As a result, there's a vitriolic hatred of any attempt to integrate LLMs into Wikipedia. Even if it's open-weights stuff running locally.

    So, Google will continue to eat Wikipedia alive with AI summaries.

    I hope Wikipedia is replaced by something AI-native run by a non-profit that has the interests of readers at heart.

    • > There isn't enough work anymore in a monopolized but declining market.

      What's the relevance? Wikipedia contributors aren't employed by Wikipedia. Their work is volunteered, and nobody asks them to do it.

      A lot of people do ask them not to do it.

      11 replies →