Comment by bbor
6 months ago
Random unprompted fun fact: Articles are the main type of "Page" on wikipedia, but not the only type! Buried deep in their docs is the full list of 'namespaces', which you need to parse their XML dumps:
class Namespace(IntEnum):
MEDIA = -2
SPECIAL = -1
ARTICLE = 0
TALK = 1
TEMPLATE = 10
PORTAL = 100
PORTAL_TALK = 101
TEMPLATE_TALK = 11
DRAFT = 118
DRAFT_TALK = 119
HELP = 12
MOS = 126
MOS_TALK = 127
HELP_TALK = 13
CATEGORY = 14
CATEGORY_TALK = 15
USER = 2
USER_TALK = 3
WIKIPEDIA = 4
WIKIPEDIA_TALK = 5
FILE = 6
FILE_TALK = 7
TIMEDTEXT = 710
TIMEDTEXT_TALK = 711
MEDIAWIKI = 8
MODULE = 828
MODULE_TALK = 829
MEDIAWIKI_TALK = 9
Wikipedia is a donwright fascinating technical environment once you find the rabbit hole. Shoutout to their purpose-built version control site[1] and their brand-new SWE-focused project "WikiFunctions"[2], the first new wikimedia project in a decade!
...which, while we're at it, brings the total to 18: wikipedia, wikibooks, wikinews, wikisource, wiktionary, wikiquote, wikiversity, wikivoyage, wikidata, wikifunctions, mediawiki, commons, species, foundation, meta, incubator, and phabricator. Ok I'm done with fun facts, I swear!
Phabricator was built by Facebook, not Wikimedia.
https://en.wikipedia.org/wiki/Phabricator