Mid-year, we passed the major milestone of 100,000 video game (Q7889) − exactly two years after hitting 50,000. As of February 1st 2024, we stand at 110K − a 12,5% growth (12,3K items) over the year.
As always, let’s have a look on how well these items are described (using, as always, integraality dashboards). The numbers are not great compared to last year− neither in absolute, nor in proportion:
The trend is somewhat expected for country and genre, but is more concerning for platform and publication date: it means that even for the most basic properties, we have not been able to keep up with the additions. However, after digging further into it, this seems mostly accounted for by a handful of batch-creation − one of ~3500 items with only a handful of statements, another one of ~140 basically empty items.




Regarding external identifiers, the number I have been tracking is the count of items without any identifier property maintained by WikiProject (P6104): WikiProject Video games (Q8485882) − excluding properties which use Wikidata itself, like vglist (P8351). The QLever query I used last year now returns 2,329 (2,1%) − but that did not exclude GamerProfiles (P12001), also sourced from Wikidata. The new, actual figure is thus 6,261 (5,7%), which is still fairly good. Let’s keep it up!
We have now reached 533 video-game related external identifiers − compared to 462 external last year, so 70 more identifiers − and this year, it’s mostly not me, as I only proposed 2 properties − thanks to Matthias M., Lewis Hulbert, Trade and Kirilloparma for leading the charge here!
The additions are not very diverse language-wise this year, with a lot of English ; but also quite a few in German (PCGames.de product ID (P13165) or Atarimuseum ID (P12743)), Russian (KupiGolos game ID (P12580)) and one French (Génération Nintendo game ID (P12626)).
These new identifiers specialize in various ways:
That’s for games; but we also have new identifiers covering other entity types:
We continued making new Mix’n’match catalogues, which we use to align an external database with Wikidata: companies (26→39), genres (14→21), platforms (24→27), series (13→16), sources (7→11) and the default/misc/games (263→301). Overall, that adds up to 423 catalogs (+19%, 68 catalogs).
It was a highlight last year, and also this year. A specialized wiki with DOS Game Modding Wiki article ID (P12960), many (independent) wikis dedicated to particular series/franchises (Doom Wiki ID (P12547), Portal Wiki ID (P12543), Sonic Retro ID (P12664)…), one wiki-network (Weird Gloop article ID (P12473)) and one emulator compatibility wiki (Cemu Wiki article ID (P12625))
The VG Resource is a network of websites “dedicated to the collection, archival, and appreciation of materials from video games” − with separate websites for different asset types. We now have properties for several of these: The Spriters Resource game ID (P12624), The Models Resource entity ID (P12373), and The Sounds Resource game ID (P12698).
I mentioned several times before how I believe in creating Wikidata properties for discontinued databases − well we had 3 this year: TheLegacy game ID (P12709) and TheLegacy company ID (P12734) from TheLegacy.de (gone in 2018, thanks Matthias for proposing them) and AllGame style ID (P12810) from AllGame.com (defunct in 2014).
Looking at which identifiers are used the most, IGDB game ID (P5794) continues to stand proudly at the top, being now present on 74% of our Q7889 items. Lutris game ID (P7597) follows with 59,6% (makes sense, as the Lutris database is seeded with IGDB). Steam application ID (P1733) completes the podium with 57,2%.
New-entrants IsThereAnyDeal ID (P12570) and SteamGridDB ID (P12561) shoot straight to the top-five with 56,2% and 52,4%, respectively. RAWG game ID (P9968) and MobyGames game ID (P11688) are just after at ~50%. The 1/3rd area follows with HowLongToBeat ID (P2816) and PCGamingWiki ID (P6337) (both 36%), and Giant Bomb ID (P5247) (31%)

We gained two new properties for video game staff: creative director (P12617) and game designer (P12969). Neither is used much at the moment − the latter in particular could use a concerted action to move away from designed by (P287).
Throughout the year, we had on the project talk page some data modelling discussions, which in some cases led to some decisions and changes.
We successfully refined some of our instance of (P31), deprecating video game remaster and video game remake (moved to based on (P144)), as well as free and open-source video game and fan game (moved to has characteristic (P1552)).
We discussed business model (P7936), and decided to move crowdfunding (Q348303) to funding scheme (P6195) and Steam Greenlight (Q61905393) to approved by (P790).
Some discussions have not (yet) led to an action: we debated moving away from author (P50) in favour of the unfortunately labelled screenwriter (P58) instead ; and had long discussions on the current use of distributed by (P750) for online stores (eg Steam or GOG).
In previous year-in-reviews, I have often celebrated data imports. This year is a bit different, as I have noticed 3 imports that have left things somewhat for the worse: only a handful of statements added (so little that it’s visible in the global numbers), lots of duplicates… There’s also borderline notability: one import was based on itch.io (Q22905933), and another one on the Flashpoint database (Q120096663) − literally the two examples I had used a few months earlier when saying that we perhaps do not need such bulk imports.
I am not linking to these batches because I’m sure the authors were well-intentioned, and I don’t want to throw them under the bus. But I think we need, as a project, to come up with ideas and strategies to better accompany imports.
Also, the two aforementioned batches were done as part of a datathon supported by a Wikimedia organization. I am not against such projects (I myself have helped organized some, like the 2022 DACH Culture Contest), but it does raise questions of the kind of incentives we are setting. And whether, ultimately, we are helping the regular domain editors or just giving them more work. (None of that is particularly new, as it has come back here and there in years of Wiki Loves Monuments.)
Browsing through Facenapalm’s WikidataBot repository, many more scripts were added: match SteamGridDB ID (P12561) (a contribution by Lewis Hulbert), IsThereAnyDeal ID (P12570) and VGTimes ID (P10453) based on Steam, match PlayGround.ru ID (P10354) based on Epic Games store & PlayStation Store
Matthias M. also continued developing in his wikidata-scripts repository, with too many new features to list:
The highlight of the year, to me, was our work on video game genres.
We created 7 new external identifiers: AllGame style ID (P12810), GameFAQs genre ID (P12947), VideoGameGeek genre ID (P12957), GameSpot genre ID (P12958), Steam tag ID (P13084), GamersGlobal genre ID (P13120), Play:Right genre ID (P13199). We expanded RetroAchievements ID (P11393) and PCGamingWiki ID (P6337) to also include both of their highly detailed genre ontology. As mentioned earlier, we added 7 Mix’n’match catalogues for genres.
I’m also happy that we are bringing some multilingual perspective: out of these seven new properties, we have one in German and one in Danish ; and two of the new Mix’n’match catalogues are in Czech (Databáze her and VisionGame.cz).
Technically-wise Facenapalm imported genres from PCGamingWiki ID (P6337) for some 4-5k items (example) ; I updated my ExLudo user-script to display subgenres on genre pages (example).
According to Solidest’s new “Genres by external identifier” dashboard, we went from 324 to 394 video game genre (Q659563), and from 38 to 58 video game theme (Q42907216). There’s also some usage to it: according to this “by genre” integraality dashboard, the count of values used as genre on at least 5 video games went from 318 to 360. And that’s after some clean-up, such as moving dōjin game (Q906556) and indie game (Q2762504) to has characteristic (P1552).
Here is a SPARQL query for these new genres − for example action roguelike (Q125693245) or arena fighter (Q130270654), management simulation game (Q124695018) or horde shooter (Q129695809), the split of 2D fighting game (Q127602751) & 3D fighting game (Q130270628)… Some was Wikipedia-driven, following the creation of articles (eg cozy game (Q124973056)) or categories (physics puzzle video game (Q130404370)). Not all of these are used anywhere (yet) though, so some work to do there.
Three years ago, I was fondly remembering the days of collaboration of the month/quarter on French Wikipedia, feeling that there were little focused collaboration for video games on Wikidata. But this time it did feel to me like a wide collaborative effort, with various people contributing in different ways (research, item creation, edits, dashboard-making, automation coding…) on the same topic.
My usual Google Scholar search on Wikidata + Video games returned something interesting: Observing the Coming of Age of Video Game Graphics (Q133294477). Adrian Demleitner (whom you may remember from last year) built a dataset of >100K screenshots (of games published between 1960–1990) from MobyGames, and analysed it using computational methods. On the method of building the dataset, Adrian explains:
While browsing [MobyGames] is free of charge, some advanced features, such as exporting search results as structured data files or API access, are only available to paying members. Further, the API imposes rate limits and does not allow filtering by date of publication. To circumvent these issues, I instead queried Wikidata for video games with a MobyGames ID […] Although taking the route via Wikidata enabled easier access to the screenshots and profiting from Wikidata’s metadata, it also limited the number of video games in the dataset. Whereas MobyGames returned roughly 22’200 video games for the chosen timespan, Wikidata only little over 4300, amounting to only a fifth of what would be available.
I am delighted to see such a reuse. It vindicates our approach of linking together databases, and more generally, this idea of distributed data − no database has all the answers, and some questions require a collaboration of several databases − and of Wikidata as the entry-point. At the same time, it shows our limits: having to truncate the dataset to a fifth of what it could have been highlights how far we still have to go in terms of breath of coverage.
This year-in-review is again published in March, booo ;-þ. While I keep accumulating unfinished drafts, I did celebrate last June our 100,000 item milestone ; and in October I published my SNAIL manifesto for “small data, slow data”, of which I am quite proud of.
I hoped so last year, and it did come to pass: as detailed above, we did have more discussions − and some decisions − on some finer data modelling points, whether on the project’s talk page (see our project’s log for a decision record) or our Telegram group. That was great, let’s have more of that!
I had some plans to prepare the conversation for modelling things like art styles, perspective, camera… Inspired by Abigail Chapman’s paper “Trials of Metadata: Emerging Schemas for Videogame Cataloguing” (Q117071360), I have been working on a vocabulary crosswalk from various databases, which I never finished (here’s the raw data, if that interests you) ; meanwhile Glitchwave has published this year their “Descriptors” ontology, a great input for that discussion. Now that we are done with genres and themes (sure, totally ⸮), that sounds like a great 2025 project!
Would you like to join us?
This work is licensed under a Creative Commons Attribution 4.0 International License.
]]>To be fair, that was always the case with Wikimedia projects − Wikipedias have always celebrated their millionth article, or Wikimedia Commons its milestones.
And it’s something I have been prone to just as much as anyone: one of my first contributions to the Wikimédia France blog ages ago was for Wikimedia Commons’ 8th million file ; and in recent times I have celebrated on this blog 50,000 and then 100,000 video games items on Wikidata.
Is there such a thing as too big? Big numbers are often “dizzying” − but sometimes, I find them borderline triggering existential dread: can a handful of editors really maintain, say, 300,000 music albums on Wikidata? How can we possibly tackle the two-thirds without a language? Only 7000 matches left to do in that Mix’n’match catalogue? Sounds tractable − if I would do 10 every day, it will only take me two years.
To deal with big, we needed to go fast. Developing, using and advertising highly powerful editing tools that bring the bot to the masses (I once read this referred to as “cyborgs” − human editors augmented by machines − how poetic, very cyberpunk).
This mood got to my thinking too: when I was regularly presenting Wikidata at events in Vienna, I eventually caught myself telling people that, sure, editing via the website is fine, but if you want to get Really Serious
then you use QuickStatements or OpenRefine − ignoring the fact that most of my own editing is done with the good’old website.
Is there such a thing as too fast? While Listening to Wikipedia is a relaxing symphony, I find listening to Wikidata a stressful cacophony.
As the saying goes, we move fast and break things: Wikidata is getting too big, too fast. The Wikidata Query Service graph got so huge it will be split, and the growth of the core database is deemed unsustainable.
The thing is − we don’t always have to go big. Yes, we imported twenty thousand games from Steam − this was a good one. What’s next? How can we top that? 200,000 Flash games, one million from itch.io, even more from Google Play / App Store?
My answer is − we don’t really need a next big one. We don’t need to top that. We can rather go small. For example, by working on more modest platforms − 388 games on Nintendo 64, 251 on 3DO − or even niche ones − 22 games on Virtual Boy, 28 on Vectrex, 15 on Gizmondo? I’m in!
A good way to go small is to go local. The 2022 DACH Culture Contest got me working on games from Austria and from Switzerland (an interest later boosted by my curiosity for the CH Ludens research project) − currently some 200 games each.
With smaller sets, you can see the end. You don’t have that existential dread. You can afford to be thorough, to really flesh out items, to go deep. You can afford to take your time.
We can go slow.
The key ideas of the slow movement include prioritizing quality over quantity […]. It encourages a more intentional approach to daily activities, promoting sustainable practices and mindfulness. The movement spans various domains such as food, cities, education, fashion, and more.
English Wikipedia article on the Slow movement
Let’s have it for slow data. Let’s make edits that take minutes rather than racking tens per second. (And sure, “It’s not the quantity, it’s the quality” is an old Wikipedia trope, at times ill-used, but not wrong per se.)
As an example, I have recently enjoyed working on video game genres. I mentioned before our genre (P136) problem: missing on over 60% of our video game items, and no way in sight to automatically enrich these. Looking away from the abyss, I have found it rather pleasant to work on smaller genres like collect-a-thon platformer (Q104819482) or bullet heaven (Q122791012) − or even straight-up niche, like one-move game (Q112278052): adding these statements one by one, with a solid reference from a reliable source. It ain’t much, but it’s honest work. It will make the tiniest, most beautiful dent in the mountain.
This whole philosophy is built into my Integraality tool. For the most part, the numbers it presents are relative, not absolute − as the colours are based on percentages, not counts. The end-game, ostensibly, is to get the whole table dark blue ; but you do that cell by cell. It encourages you to work piece-meal, to pick the scopes as narrow as you feel like. Sometimes, all it takes to turn a cell blue is to edit a handful of items.


This does not mean not creating new items. But it does mean creating them more intentionally. The old, somewhat hyperbolic jest is that Wikipedia is full of articles that « no-one will ever read » (and I have done my fair share of that!) ; but on Wikidata, we sure have items that were not even created by someone, merely by something.
Does such an approach mean we will never be complete? And if we are not complete, are we useful at all? What’s the point of a video game database if we don’t have all the video games?
There’s an answer that touches on the Wikibase ecosystem, federation etc. There’s another one which would debate what is a video game. For now, I’ll just say that we can have the most complete of datasets of Austrian one-move games for the Vectrex. That’d be something! But joke aside, there is tremendous value in smaller but in-depth datasets. The CH Ludens project shows how a complete dataset of all Swiss video games would be highly useful in its own right − and that’s already ambitious enough (if very tractable).
There’s space for big and fast, for the QuickStatements batches and the OpenRefine imports, for bots running all day-long enriching data − it’s an important and necessary part of Wikidata.
And there’s time for small and slow, making one thorough edit at a time on narrower scopes.
I like a good acronym so let’s play a bit with Slow, NArrow, In-depth, Local − and call it SNAIL.
When your scope is getting too big for you, then consider re-scoping it down.
When it’s so big that you don’t know where to start, start small.
When you are tempted to “go big or go home” with your next QS batch, perhaps go home and sleep on it.
Let it SNAIL.

(The examples in that post were taken from my current main topic of interest − video games. I’m sure you’ll find your own examples in your own area of predilection.)
]]>Let’s look at how these items are described along some basic properties − asking the Wikidata Query Service for some pretty graphs, and using my trusted inteGraality for some more advanced statistics.
Over 89% of the items have a platform (P400) statement (which does not mean that we have 89% completion on that topic, since many games are published on several platforms, and we may only have recorded one or a couple of them).
85% of the items have a publication date (P577)
37% have a genre (P136) − we have a very long tail of 730 distinct values as genres, which we still should clean-up (minus indie game (Q2762504), which we recently moved to has characteristic (P1552)).
Almost 33% have a country of origin (P495)
About 34% of the items have a developer (P178) and 37% a publisher (P123).
42% of the items are linked to an article in at least one language-version of Wikipedia − English comes first (27%), then French (15%), Ladin (14%) and Japanese (13%).
What I also find interesting is to look at items linked to only one Wikipedia language version: some 5K (5%) only have an article in the English-language or Japanese-language Wikipedia, then comes French-language and Ladin-language Wikipedias with 1K (1%) of items.
Over at Wikidata we link to hundreds of other video game databases.
On top is still Internet Game Database game ID (P5794), used on 74% of items. Lutris game ID (P7597) follows with 60% (makes sense, as the Lutris database is seeded with IGDB). Steam application ID (P1733) completes the podium with 56%. The new entrant SteamGridDB ID (P12561) snatches the fourth place in barely 3 months, with 56%. RAWG game ID (P9968) and MobyGames game ID (P11688) stand at 51%. PCGamingWiki ID (P6337) is at 38%. Both Giant Bomb ID (P5247) and HowLongToBeat ID (P2816) at 33%. OGDB game title ID (P7564) and GameFAQs game ID (P4769) at 15%, speedrun.com game ID (P6783) and Mod DB game ID (P6774) at 13%. StopGame ID (P10030) and myabandonware.com game ID (P12652) at 11%… and a very very long tail of over 360, sometimes highly specialized, databases.
(The most represented are English-language databases, but the list above includes one database in German and two in Russian).
1/ By the time of writing this, we already reached 102,577 items.
2/ Last time, I had cautioned that looking strictly at instance of (P31)=video game (Q7889) items does not tell the full story, as we have a long tail of subclasses also used as P31: some refer to distinct concepts (the 956 DLCs or 3242 expansion packs), while others are indeed games. I’m happy that we have successfully culled out the hundreds of instances of video game remaster and video game remake (moved to based on (P144)) ; as well as free and open-source video game (moved to has characteristic (P1552)). Still some work to do to refine our P31s, but going in the right direction.
3/ With 100,000 items, we are going somewhere (surpassing the 85,000 of GiantBomb or the 63,000 of OGDB, for example). But this is still under the 153,000 games in Metacritic, far from the 278,000 entries in Mobygames or the 281,000 entries in IGDB (and dwarfed by the 868K entries in RAWG).
4/ The astute reader may have noticed that some data points went down compared to two years ago: a developer (P178), publisher (P123) or country of origin (P495) on a third of items (down from half), and a genre (P136) from two-thirds to a mere one-third. So while we should celebrate this significant milestone in breadth of coverage, we should keep in mind the depth of coverage on the road ahead of us.
We close the year just shy of the 100K mark: as of February 1st 2024, we stand at 97.7K − a whopping 75% growth (+41,6K items) over the year.
As always, let’s have a look on how well these items are described (using, as always, integraality dashboards): 7K have no platform (P400), 11,3K no publication date (P577): while higher number than last year in absolute, the proportion is better: 12% → 7,2% and 19% → 11,7%. Conversely, 64.9K have no country of origin (P495), and 57,5K have no genre (P136) which is a not so good trend (respectively 47% → 66,8% and 34% → 59,2%).




Regarding external identifiers, the number I have been tracking is the count of items without any identifier property maintained by WikiProject (P6104): WikiProject Video games (Q8485882). Wikidata Query Service gives up trying to pull that number, but Qlever helpfully tells me it’s 2,341 (2,4%) at time of writing. Down from 11%, this is amazing progress − getting to zero unidentified games is in sight!
Data-model wise, we gained one new property: demo of (P12050), used to link 37 game demos (Q1755420) to 36 games.
In other news, since August we have a “Wikidata WikiProject Video Games” Telegram group! Thanks to Nicereddy and Facenapalm to get it started. There are now 12 of us in there, it’s a nice place to discuss things 
We have now reached 462 video-game related external identifiers − compared to 356 external last year, so more than 100 new identifiers. (I may have contributed quite a bit to that, especially as I made myself -again- an Advent Calendar of property proposals 
Again, the additions cover various languages: a lot of English as well, but also German (Gameswelt ID (P11877), GamersGlobal game ID (P11771)…), Japanese (GameGear.jp ID (P12246)), Czech (Visiongame.cz game ID (P11991)), Spanish (DoblajeVideojuegos game ID (P12290)), French (HistoriaGames game ID (P12281)) − and two brand new languages: Dutch with GamesMeter game ID (P12245) and Hebrew with old-games.org ID (P12134).
These new identifiers specialize in various ways:
That’s for games; but we also have new identifiers covering other entity types:
As always, we have the usual mix of community-driven databases and commercial/news websites ; but also one scholarly database with Black Games Archive ID (P12231
We continued making new Mix’n’match catalogues, which we use to align the external database with Wikidata: companies (20→26), genres (10→14), platforms (23→24), series (9→13), sources (6→7) and the default/misc/games (236→263), and a new category for people (8). Overall, that adds up to 355 catalogs (+15%, 50 catalogs).
There are a lot of wikis out there related to video games, and we got quite a few new properties for them. That includes specialized wikis like Chess Programming Wiki ID (P11768), wikis dedicated to a game/series, like Fallout Wiki (P11589) − but also Wiki farms/networks: Gaming Wiki Network (P12143), Nintendo Independent Wiki Alliance (P12253), Paradox Wikis (P12189), wiki.gg (P11988) − kudos to Shisma who drove a lot of that work.
Some emulators also maintain a so-called compatibility database − indexing how well games run under the emulator. I believe emulators are a key part of video game preservation − and also these databases are often one of the best resource for the particular platform they emulate.
So when Matthias M. opened the year with DOSBox (P11572), I thought it would be fun to follow suit with all the ones I could find: Xemu (P12244) for Xbox, Flashpoint (P11875) for Flash games and ScummVM (P12255) for classic graphical adventure games (although not an emulator strictly speaking). And following on the wikis of the above point, PCSX2 Wiki (P12227) for PS2 and RPCS3 Wiki (P12230) for PS3.
We also gained several external properties related to the demoscene (Q824340) − demos (Q5610543) (Pouët demo ID (P11974), Demozoo demo ID (P12013)), demogroups (Q5256141) (Pouët group ID (P11972)) and and their members (Pouët group member ID (P11973), Demozoo group member ID (P12014)) − thanks to YotaMoteuchi for the work here!
In August, Sanqui joined Wikidata, with an interest in multi-user dungeons (MUDs) (Q751424). We did not have and identifiers specialized in MUDs, and Sotho Tal Ker found some: MUDlistings ID (P11989) and The Mud Connector ID (P11990).
Looking at which identifiers are used the most, IGDB game ID (P5794) continues to stand proudly at the top, being now present on 76,4% of our Q7889 items. Lutris game ID (P7597) follows with 62,5% (makes sense, as the Lutris database is seeded with IGDB). Steam application ID (P1733) completes the podium with 58,5% − keep reading for the explanation. RAWG game ID (P9968) and MobyGames game ID (P11688) complete the top-five with ~50%.

We got two example this year of identifier migration.
In February, Mobygames switched to their-long worked-on redesign − which also finally moved from URLs with slugs to numeric identifiers! (the identifiers were long used in the database and available via the API, but not easily accessible in the UI nor resolvable by URL, as far as I know). We had to migrate, and I like how this was a real team effort: Kirilloparma started and led the whole discussion, I made the proposal for new properties, Facenapalm imported the new-style IDs… We also talked to the MobyGames folks on their Discord and they were kind enough to guide and help us.
In September, Metacritic removed the platform prefix from their URLs, and also switched to numeric identifiers. Up until that point, we had a single Metacritic ID (P1712) property ; we decided to split out games to Metacritic game ID (P12054) and added Metacritic numeric game ID (P12078). Publications were also spun out to Metacritic publication ID (P12079). As Metacritic discontinued their company pages, but we had some 472 such links and they were reasonably well-archived − we also created the posthumous Metacritic company ID (P12080), pointing to the Wayback Machine. I have long mused on creating external IDs for discontinued databases, and that’s a first example. Kirilloparma did pretty much all the work here, researching the topic, proposing the property, migrating the identifiers, and updating the Wikipedia templates. Bravo!
In September, Connor Shea (aka Nicereddy) and Facenapalm collaborated to create 20,800 new video game items sourced from Steam application ID (P1733) and Internet Game Database (Q20056333). Connor tells the story in his blog-post Mass-importing games from Steam into Wikidata, which I invite you to read. In particular, the “Further Data Enrichment” section details how a dozen other datapoints were further inferred, adding close to 100K identifiers.
As of today, we have 43,944 Steam IDs on Wikidata. Out of the 71,845 games on Steam, that makes for 61% coverage of the games on the Steam store. We’ve still got a ways to go, but this import added more than 28% of the entire storefront to Wikidata.
Connor Shea, Mass-importing games from Steam into Wikidata,
In the weeks and months that followed, the pair further imported 15,200 items from Steam. This work certainly explains why 60% of Wikidata’s video games are available on Steam 
Browsing through Facenapalm’s WikidataBot repository, many more scripts were added: adding platform (P400) qualifiers to TheGamesDB game ID (P7622), SMS Power ID (P5585), Gaming-History ID (P4806) ; and match identifiers for StopGame ID (P10030), IndieMag game ID (P9870), tuxDB game ID (P11307) , Adventure Gamers video game ID (P7005), UVL game ID (P7555) and PCGamingWiki ID (P6337).
Matthias M. has also been developing some scripts, available in his wikidata-scripts repository − such as matching Schnittberichte.com based on OGDB, Adventure-Treff based on stores, or Kultboy based on Hall of Light.
End of June, I was invited to give a presentation at the 2023 worklab organized by mur.at, in their “Make Archiving (un)sexy again?” track. In my talk ‘Describing video games, the Wikidata way’, I approached the topic of archiving video games through the lens of cataloguing and metadata − a perhaps narrow aspect, but I believe an important one nonetheless.
In May, the WikiProject was approached by Alex Jung (University of Toronto Libraries) to present at one of the biweekly LD4 Wikidata Affinity Group Calls − I thought this could be fun so I volunteered. I had to call in sick and cancel, but not before Alex’s suggestion to also present at the associated LD4 Conference 2023 (Q117466379) made its way…
So in July, I gave a talk titled “Wikidata and the sum of all video games: putting the « linked » in video game metadata”. The slides are available on Wikimedia Commons, and the talk was recorded and is published on YouTube. I’m pretty happy with the turn-out and the lively reactions on the conference’s Slack channel. I have a draft blog post retelling the talk, which I hope to finish One Day Soon
.
It had been a while since I had given any talks or presentations, and while it was a fair amount of work to prepare, I enjoyed developing and sharpening my thoughts, communicating them to an audience, designing slides… I had missed all that 
You may remember the Pixelvetica project from last year? Well, cool things continue to happen in Switzerland: the research project “Confoederatio Ludens: Swiss History of Games, Play and Game Design 1968-2000” (or CH Ludens) started in February 2023. As part of this multi-faceted project, Eugen Pfister and his colleagues built and published a database of video games from the DACH region (Germany, Austria, Switzerland) − noting how there is no database that can be filtered by country (and which would had decent DACH coverage). They had in mind to eventually publish that data in Wikidata − which was done and continuously enriched over the year by Adrian Demleitner (I was happy to help him here and there). For more information, the interested reader may refer to their presentation “Why we thought it was a good idea to build a DACH games database” at the DLA Marbach, and the paper (in German) Warum wir es für eine gute Idee gehalten haben, eine DACH-Spieledatenbank aufzubauen (Q124326253) (which quotes my humble underground mushroom metaphor).
We had one example of external reuse: GamerProfiles (Q124398839), “a social media platform for gamers”, has launched this year, and it turns out that “the games were originally exported from Wikidata”. I don’t endorse the service (which I have not used nor plan to), but I’m pleased to see that Wikidata can underpin such an undertaking.
I had one goal for the year, which was to publish this year-in-review before March, which obviously did not happen :-þ. I also wanted to blog more − I did start several drafts, but finished none. To a degree, I think the presentations I gave scratched that itch of developing and sharing ideas (and also sucked some time away).
Putting that aside − I know I said that most of the past years, but this time I do believe we will have more discussions − and decisions − on data modelling. It has started already these past weeks on the project’s talk page (see our project’s log for a decision record). I think this is fostered in part by this new Telegram group, which makes it easy and quick to share thoughts and gather feedback − I’m already very grateful for my talks there with Solidest (of WikiProject Music’s ‘herder of genres’ fame). I look forward to many more inspiring discussions in 2024! Feel free to join 
This work is licensed under a Creative Commons Attribution 4.0 International License.
]]>Il faut reconnaître qu’on est allés assez loin niveau organisation : des réunions d’avancement en visio, un journal de contribution, un suivi des publications de sources sur le domaine. Nous sommes aussi assez productifs, que ce soit sur les articles sélectionnés mais aussi sur le travail connexes. Mon but ici est de donner des clés de comment je suis arrivée à ce niveau.
Nous sommes en septembre 2009. Pour les plus vieux sages d’entre nous, c’est le moment où I Know You Want Me (Calle Ocho) de Pitbull laisse sa place de tube de moment à Sexy Bitch de David Guetta et Akon. C’est ma période Buffy contre les vampires sur Wikipédia, et je suis très en confiance sur ce sujet : j’ai pas mal de sources publiées chez moi (dont une partie est listée sur mon inventaire.io) et il existe un journal en ligne regroupant les publications sur la série, Slayage ; je viens de passer l’article sur le personnage de Faith en Bon article, et le vote sur l’épisode Un silence de mort, lancé par Mérôme Jardin et auquel j’ai aussi participé est en cours.
Et pourtant, notre tentative de désébaucher le maximum d’articles sur la série échoue. Je prends le sens d’échec non pas comme une mesure absolue et objective de ce qui est un bon ou un mauvais Wikiconcours, mais comme le jugement personnel que je porte sur ma propre contribution par rapport aux attentes du début du concours. Nous sélectionnons une vingtaine d’articles (saisons, épisodes, comics, personnages) et n’arrivons à ne contribuer que sur deux d’entre eux, celui sur la saison 6 et l’épisode Cauchemar. Je dis nous, mais la réalité c’est que je n’arrive même pas à travailler sur cette sélection pendant les quinze jours du concours.
Je retente l’expérience l’année suivante, cette fois sur mon domaine d’expertise académique , puisqu’avec Pierre-Selim nous nous lançons dans l’amélioration d’une sélection d’articles sur la programmation par contrainte. Je ne dirais pas que ça a échoué, puisque pour échouer il faut encore commencer : hors, je ne fais qu’une poignée de modifications mineures sur Wikipédia lors de la période du concours, et aucune en rapport avec le sujet. Nous ne créons même pas la page de discussion d’équipe.
À la surprise de personne, je ne touche plus le Wikiconcours pendant des années après ces deux échecs, convaincue que ce modèle de contribution n’est pas pour moi. C’est aussi la période où Wikipédia arrête d’être l’unique forme de ma contribution aux projets Wikimédia, et je m’investis aussi en parallèle dans Commons et Wikimédia France, particulièrement auprès des institutions culturelles de Toulouse.
Je retente l’expérience désébauchage en octobre 2015, cette fois seule. Sur ma sélection, à nouveau, je n’arrive qu’à améliorer deux articles, et un seul assez pour être désébaucher.
Nous sommes en mars 2018 et j’ai un immense besoin de revivre des expériences positives liées à Wikipédia après le traumatisme qu’a été la période fin 2016 – 2017 pour moi. Je ne sais plus exactement comment, mais avec Exilexi on se retrouve à bosser ensemble sur les émeutes de 2005 et l’histoire de l’immigration en France. Et cette fois, ça marche. Je ne fais pas beaucoup de contributions, mais je me frotte à un article complexe, polémique et avec de nombreux points de vue et éléments à concilier, et si je n’écris « que » 15 000 octets en deux mois, c’est une expérience qui me servira beaucoup par la suite.
Six mois après, j’enchaîne avec un format sélection sur l’école de Nancy : pour moi, j’arrive à désébaucher 15 articles, mais le jury n’en valide que 8. Je suis contente de ce travail même si le rythme de changer d’article régulièrement et ne pas « perdre » du temps à trop développer un seul article est un peu frustrant.
07 Septembre 2019 ; Bruxelles. Première édition de la Wikiconvention francophone hors de France, et j’assiste à Labels et wikiconcours : la carotte pour une démarche wiki-qualité ? Un point me marque : le Wikiconcours est un concours (prenez le temps de digérer cette révélation), donc « par nature » compétitif ; hors, les présentateurs et présentatrice parlent toujours de collaboration et d’entre-aide.
Ce point me travaille tellement que, deux semaines plus tard, je me lance dans un Wikiconcours sur le thème du thé avec à nouveau Exilexi mais aussi Harmonia Amanda et Zunkir avec un seul objectif en tête : collaborer. Avec Exilexi et Harmonia nous nous réunissons pour discuter du plan de l’article thé ainsi que de la structuration des articles par pays. Cette séance est inspirée de l’atelier de contribution réalisé cinq ans plus tôt au Festival d’Avignon : nous avions consacré une partie de la journée à définir le plan de l’article du festival, pour ensuite se repartir le travail entre chacune et chacun. Cette séance où nous avons discuter à trois sans ajouter une ligne à l’encyclopédie est à mon avis le moment le plus productif de toute ma contribution à Wikipédia. Et les résultats sont là : on multiplie la taille de thé par près d’un et demi, passant de 55ko à 123ko ; thé en Russie, et thé au Japon qui n’existaient pas au début du concours, est BA deux semaines après sa fin et thé en Chine passe de 14ko à 105ko. Nous gagnons le premier prix par équipe, et thé au Japon est reconnu comme le deuxième article de ce Wikiconcours. Surtout, cette équipe permet de jeter les bases de ce qui fonctionne pour moi en terme de Wikiconcours : les discussions en visio, le partage de bibliographie, le passage systématique par la PDD de l’équipe dès que je ne suis pas sûre de faire le bon choix lors d’une contribution.

C’est bon, on tient la formule gagnante ! L’année suivante, en 2020 donc, on refait une formule sélection sur le thème du thé, Heriboux remplaçant Harmonia. Et le résultat est… pas ouf. Exilexi arrive à avancer sur thé au Maroc, je contribue sur thé oolong et… c’est à peu près tout. Est-ce qu’il y a une raison particulière à cela ? Pas vraiment, d’autant plus qu’en 2021, on se relance dans le thé, cette fois-ci à trois seulement, Heriboux ne renouvelant pas l’expérience et on parvient à désébaucher 12 articles et en créer 6 autres, le tout en l’espace de trois semaines, remportant au passage le prix de la meilleure thématique.
Ayant eu l’impression d’avoir fait le tour du sujet concernant le thé, ou en tout cas d’avoir réussi à amener le niveau global des articles sur le thé à un niveau correct, l’urgence de contribuer sur ces thématiques s’efface. Ça ne m’empêche pas de créer thé à Taïwan et thé en Nouvelle-Zélande en 2022, mais ce sujet n’est plus au premier plan de mes contributions.
J’ai écrit spécifiquement sur mon expérience de mars 2023, effectuée seule et pour laquelle j’ai remporté le prix Wikimédia France qui avait été précédée par une équipe avec Nattes à chat et Skimel où nous avions améliorés 10 articles et reçu le prix « émulation collective ».
Je parlais d’urgence à contribuer, et c’est ce sentiment qui m’anime face à l’état de Transidentité et Transition de genre alors ; ces sujets font beaucoup trop l’actualité pour les laisser dans un état aussi brouillon. Par contre, c’est hors de question pour moi de partir seule : trop de choses à faire, trop de décisions, trop ambitieux. Je réunis une équipe de choc (Exilexi, à nouveau, et La Grande Feutrelle) et on se lance.
Petit apparté : ce qui me permet de payer mon loyer et mes courses, c’est mon travail de développeuse informatique. Mon travail, c’est d’organiser de l’information et de la compréhension d’un sujet, sous une forme lisible par d’autres. Toute ressemblance avec l’édition de Wikipédia est volontaire.
Depuis la pandémie de covid-19, le télétravail s’est massivement répandu dans les entreprises d’édition de logiciel. En parallèle s’est développée la pratique du software craftmanship, qui met notamment l’accent sur le partage de connaissances entre développeurs. Concrètement, dans mon environnement professionnelle, j’interagis en permanence avec des outils de travail collaboratif à distance, que ce soit en synchrone comme le travail à deux sur le même code en échangeant de vive voix, où chaque modification de l’un est visible en direct chez l’autre (pair-progamming en liveshare pour le terme technique) ou en asynchrone avec la relecture de modification où il est possible de proposer des suggestions ou des commentaires (code review).
On ne peut que constater la pauvreté des outils de collaboration sur Wikipédia, qui ne disposent que de l’équivalent de ce qui existait dans l’industrie logicielle il y a plus de 15 ans, et ce qui est actuellement standard. On ne peut pas lancer une discussion sur une portion d’article, il faut le faire dans la page de discussion. On ne peut pas simplement retrouver l’auteur du passage d’un article. Il n’y a pas d’outil intégré à Wikipédia pour avoir une discussion de vive voix, ou travailler réellement à deux à la même modification.
La première chose que nous avons fait, c’est travailler ensemble au plan de Transidentité. Ce travail avait déjà été effectué en 2017 lors d’un précédent Wikiconcours, mais à la fois l’article et la manière de penser la transidentité ont évolué depuis. Ça été une discussion longue (plus de trois heures), et qui n’a été possible que parce qu’on était ensemble sur le même outil de manipulation de post-it, en l’occurence Miro (je suis preneuse d’alternatives libres en commentaires !).

Bien entendu, ce n’est pas exactement le plan avec lequel nous finissons le Wikiconcours, mais cela a été une base très précieuse pour à la fois sortir de l’article les développements qui n’avaient pas leur place dans l’article général (comme un historique de l’inclusion ou non des femmes trans dans le concours Miss France, déplacées dans Transidentité en France) mais aussi d’avoir toujours sous les yeux une série de tâches à faire via les sections vides.
La tenue d’un journal semaine par semaine est aussi très utile, elle permet de voir le travail avancer et est un grand facteur de motivation, mais aussi de ne pas trop se disperser : étant donné le sujet, nous avons besoin de faire pas mal d’aller-retour entre l’article principal et les articles détaillés, et de mettre les choses par écrit permet de ne pas trop se disperser.
Nous faisons aussi un point de mi-parcours (sur 8 semaines, je pense qu’on aurait pu viser de faire plutôt de faire deux points), qui lui aussi dure autour de 3h et permet à la fois de se rassurer, de demander de l’aide et de décider de nous lancer dans une demande de label pour Transition de genre.
Malheureusement, les huit semaines de concours plus la charge liée à la labellisation font qu’on finit épuisé-es, avec wikibreak/wikislow généralisé pour l’avant-dernière semaine. Personnellement, j’ai très envie à la fois de « finir » l’article Lesbianisme, c’est-à-dire au moins ne plus avoir de sections vides, mais surtout de contribuer autrement : sur d’autres sujets, en papillonnant au grès des sources, sans objectif, bref sans pression, et j’ai déjà décidé de ne pas participer au prochain Wikiconcours histoire de me recharger.
En revanche, ce qui me manquera, c’est cette énergie collective, où paradoxalement à trois je ne me suis jamais aussi entourée. À part les sans pagEs, les projets Wikipédia sont morts, leurs pages de discussion se limitent à une liste de débats d’admissibilité. Faire des Wikiconcours, c’est bien, mais la collaboration devrait être au cœur de nos pratiques contributives de tous les jours, et c’est encore loin d’être le cas.
]]>
Minima Gesté lors d’un bingo drag. Photographie par Xavier Héraud, CC-by-SA
De manière amusante, ma participation aux Wikiconcours et ma contribution sur les sujets LGBT+ sur Wikipédia suit la même trajectoire : un intérêt très tôt (septembre 2009, pour être exact), suivie d’une longue période d’inactivité jusqu’au milieu des années 2010, pour ensuite enchainer les tentatives plus ou moins réussies avant de finalement trouver un rythme de croisière et un sentiment de légitimité et donc de confiance à partir de 2019.
C’est donc pleine de naïveté et d’enthousiasme que je me lance pour ces mois de mars et d’avril 2023 dans un projet très ambitieux : améliorer les articles sur la culture LGBT. Je suis en confiance : l’article éponyme, que j’avais créé en aout 2021, est plutôt bien structuré, cela fait deux ans que je travaille à la structuration des différents articles LGBT sur Wikipédia, notamment sur l’articulation entre local (et particulièrement français) et global et ma bibliothèque personnelle d’ouvrages de référence s’est beaucoup enrichie depuis que la pandémie de Covid-19 m’a fait fuir mon 30m² parisien pour les joies d’un balcon et d’espace dans l’Essonne.
Au début, je suis très ambitieuse : je me dis que je vais non seulement travailler sur culture LGBT, mais sur d’autres articles tels que la littérature lesbienne. Je commence par faire une cartographie de correspondance entre les différents ouvrages de référence et les articles à enrichir ou créer et papillonne allègrement entre les différents niveaux de focus : culture LGBT, littérature lesbienne, littérature LGBT, littérature trans, LGBT dans les littératures de l’imaginaire…
Premier obstacle : la limite entre ce qui relève de la littérature LGBT et de la représentation de thèmes LGBT dans la littérature est floue et fluctue d’autrice à autrice, d’auteur à auteur. Cette difficulté est d’autant plus complexe qu’une des caractéristiques principale de la culture LGBT, c’est sa capacité à remixer, parodier et subvertir l’ensemble des productions culturelles : impossible de comprendre la place des vampires dans la culture queer et lesbienne du 20ème et 21ème siècle sans connaître l’histoire des représentations lesbophobes des vampires saphiques du 19ème siècle. Les limites sont toujours floues et mouvantes, ce qui en fait un sujet passionnant… et très difficile à découper sur Wikipédia.

Second obstacle : le concept lui-même de culture. L’article Wikipédia sur « la » culture montre bien cette complexité, et la lecture des sources montre un aspect très important de la culture LGBT : si les productions culturelles (romans, œuvres visuelles, etc) ont le double avantage d’être à la fois facilement identifiables et structurables par type, les choses se compliquent dès qu’on parle d’autres éléments, tels que l’identité butch, le camp, ou la « Culture Sida ». C’est ce qui m’amène par exemple à créer (en 2023 !) l’article lesbianisme séparatiste, parce qu’il est impossible de comprendre ce qu’est la culture lesbienne sans savoir ce qu’est le séparatisme.
À ce moment, je me rends compte que je risque très fortement de me disperser, à rajouter un ou deux paragraphes à des dizaines d’articles quand le principe du format du Wikiconcours de mars 2023 est de se focaliser sur une sélection d’articles. Alors que je m’apprête à me recentrer uniquement sur l’article principal, Culture LGBT, l’article LGBT dans les littératures de l’imaginaire est passé en débat d’admissibilité. Cette discussion, en particulier le fait qu’on soit passés très près de la suppression de l’article, me déconcerte. Depuis 2005, je suis habituée aux divergences sur l’encyclopédie et je sais à peu près quelques sont, parmi mes diverses opinions, celles qui sont consensuelles et celles qui sont minoritaires. Ce qui est, à mes yeux, un sujet qui est évidemment admissible, en particulier vues les nombreuses sources qui en parlent et dont je fournis une liste partielle, est vu pour de nombreux autres comme manquant de références et donc n’ayant pas assez de contenu pour en rédiger un article.

À mon grand âge, je ne le vis pas très bien : si même un sujet aussi évident à mes yeux peut si facilement se retrouver supprimé, est-ce que cela signifie que tout le reste de mon travail sur Wikipédia est aussi vain ? Comme en parallèle, un débat a lieu entre administrateurs sur l’application d’un des principes fondateurs de Wikipédia, le respect de règles de savoir-vivre, où je me retrouve à nouveau à soutenir une position minoritaire que je croyais pourtant très consensuelle et, par ricochet, à traverser une mini crise de foi en l’encyclopédie.
Je baisse quelque temps ma contribution (oui, on est en période de Wikiconcours, mais en réalité, il n’y a pas urgence) pour me consacrer, ironiquement, à la littérature de l’imaginaire avec thématique LGBT (je vous recommande Celle qui devint le soleil, Iron Widow ainsi que Gideon la Neuvième et vous déconseille Le Prioré de l’Oranger). En revenant sur culture LGBT, je me retrouve confrontée à mon quatrième obstacle : la conscience d’écrire quelque chose de médiocre. Je ne cherche pas à faire de la fausse modestie, je sais que le travail fourni depuis deux ans est énorme et que l’article apporte de nombreuses informations aux personnes qui le lisent. Je suis juste consciente qu’aussi expérimentée que je le sois, une wikipédienne ne pourra jamais résumer la mode LGBT en quatre paragraphes et un week-end de lectures de sources.
Le résultat, qui a bien entendu le mérite d’exister, ne peut que souffrir de l’inadéquation des moyens par rapport à un sujet qui aurait en réalité besoin d’être une thèse, voir un thème de recherche sur de multiple carrières universitaires. Ce que j’écris souffrira toujours de mes biais, en particulier celui de ne fréquenter des librairies LGBT qu’en France et de lire que l’anglais et le français, ainsi que de ceux de mes sources, qui, en particulier, ont été écrites à des instants précis (certaines datent de 20 ans et, pour les sujets LGBT, c’est énorme) et à confondre la France et les États-Unis avec l’universel.
Une des solutions serait de ne pas écrire seule : car si Wikipédia se veut l’encyclopédie collaborative, l’étendue de la collaboration est en réalité très faible par rapport à ce qui peut exister dans d’autres domaines, tels que (mon expérience de) la programmation informatique. La réalité, et cela semble paradoxal quand on voit les kilomètres de pages de discussion, c’est qu’on parle très peu des articles. Quand c’est le cas, c’est le plus clair du temps dans une optique de problème à régler : typiquement, quelqu’un va relire un article, le retrouvé trop (militant, déséquilibré, à charge, publicitaire, centré sur la France, etc) et effectué une correction ; une autre personne va être en désaccord avec la modification, et cela va créer un débat en page de discussion visant, au mieux, à trouver un compromis qui soit un intermédiaire entre les deux positions, au pire, à convaincre le camp d’en face qu’il a tord puis à voter pour sélectionner la proposition qui a le plus de partisans. On manque cruellement de discussions où l’on travaille ensemble à créer, sans idée a priori, par exemple en se demandant, sur une article très court, quel serait le meilleur plan à adopter, qu’est-ce qu’il faudrait idéalement comme illustrations, où trouver des sources, etc. Ce n’est pas une coïncidence si la seule foi où j’ai pu expérimenter cette collaboration, c’est-à-dire quand, avec l’équipe 7 du Wikiconcours de l’automne 2019, nous nous sommes posés autour d’une table pour réfléchir à comment écrire sur le thé, nous avons remporté le concours.
En conclusion : malgré ses plus de 20 ans d’existence, il reste de très nombreux sujets sur Wikipédia où tout reste à faire. Après deux mois de travail, de nombreuses sections de culture LGBT restent vides, incomplètes, ou superficielles. On a jamais trop de volontaires et le chantier est immense !
]]>Mid-year, we passed the major milestone of 50,000 video game (Q7889). As of February 1st 2023, we stand at 55.5K − a whopping 22.5% growth (10,2K items) over the year.
As always, let’s have a look on how well these items are described (using, as always, integraality dashboards): 6,6K have no platform (P400), 10.6K no publication date (P577): while higher number than last year in absolute, the proportion is better: 14% → 12% and 23% → 19%. 23,4K (47%) have no country of origin (P495), which is stable. Conversely, 19K have no genre (P136) which is a not so good trend (27% → 34%).



Regarding external identifiers: only 570 items do not have any (1%, down from 1.8%). We know by now that this number is a bit meaningless − and so is the 1.32K items excluding vglist video game ID (P8351) (2.4%, down from 2.6%).
The number I will be tracking from now on is the count of items without any identifier property maintained by WikiProject (P6104): WikiProject Video games (Q8485882) − which is 6003 (10,8%) at time of writing. Compared to the fairly-comparable 15% of last year (same idea, but slightly different methodology), this is a good trend.
We have now reached 356 video-game related external identifiers (compared to 274 external last year).
Again, the additions cover various languages: English of course, but also Japanese (Tagoo video game ID (P10368), Refuge.tokyo video game ID (P10424)), French (JeuxActu ID (P10455)), many Russian (LKI ID (P10309) or Cybersport.ru ID (P10501)) quite a few German (ntower ID (P11340) or Kultboy video game ID (P10850)), Italian (Adventure’s Planet ID (P11361)), Spanish (amstrad.es ID (P11426)) − and some new languages so far barely represented (or not at all): Chinese (A9VG game ID (P10371), TGbus ID (P10996)) and Korean (Naver game lounge ID (P11058)).
These new identifiers specialize in various ways:
That’s for games; but we also have new identifiers covering other entity types:
In terms of origin, we have the usual mix of fan databases, commercial/news websites and online stores ; but also one institutional database with International Computer Game Collection work ID (P11295).
Mix’n’match catalogues, which we use to align the external database with Wikidata, got again a big boom, going from 235 to 305 − so much so that I split the collection in 6: companies (20), genres (10), platforms (23), series (9), sources (6) and the default/misc/games (236). If the Mix’n’match categories are anything to go by, then video games are by far the most represented domain on the tool.
Looking at which identifiers are used the most, the situation has changed since June: with 48,6% of our Q7889 items, MobyGames game ID (P1933) is dethroned by IGDB game ID (P5794), standing at the top with 58,6%. The Lutris game ID (P7597) joins the podium with a whopping 45%. While only created end of 2021, RAWG game ID (P9968) climbs to 6th place with 29%. (these progresses can be attributed in large part to some automation, which will be discussed later).

I continued my interest in discontinued databases, creating Mix’n’match catalogues for a couple of them − as long as they were reasonably well indexed in the Internet Archive’s Wayback Machine: HChistory.de, Personal Computer Museum magazines, LGDB, CoCo Site, CPC-Zone. Still have not made the step to propose properties for these − perhaps when they reach decent matching coverage.
On a sad note, in August the Japan PlayStation Software Database ID (P9636), which covers all games released in Japan on PlayStation systems, vanished from Sony’s website. I had never found a good way to index it in Mix’n’match, so our coverage is pretty low ; and I have since discovered that the Wayback Machine only had a partial snapshot of it (I noticed several pages not archived). I had (unfinished) plans to turn the Wayback Machine dump into a Mix’n’match, I should get to it.
I feel a breakthrough was made with content rating systems and their databases.
First, the American Entertainment Software Rating Board (ESRB): ESRB game ID (P8303) was around since 2020, but [[User:Nicereddy]] created a Mix’n’match catalogue for it in August − since then, the usage went from 789 to 7300. Then NicereddyBot would come along to add the ESRB rating (P852) (example).
Second, the German Unterhaltungssoftware Selbstkontrolle (USK): I finally figured out the resolvable IDs of its database, and thus was born USK ID (P11063). Kirilloparma compiled a Mix’n’match catalogue, with already close to 700 matches.
In previous year-in-reviews, I have often showcased bulk data imports − QuickStatements batches or bot runs that have populated a bunch of data points (often identifiers) in one go. There’s been some of these, some of which will be listed on the project activity log.
But I feel like a shift was made this year from one-off imports to sustained data-enrichment:
(These are only examples, WikiProject users compiled a more comprehensive list ; I started to map them in a diagram but I gave up for now in front of the complex web drawn ^_^)
This can lead to very elegant dances of bots and humans passing the ball to each other − see for example the edit history of The Last Hero of Nostalgaia (Q114772057) or Cat Cafe Manager (Q111602956).
Some of these were ideas (identifier annotation & addition) I was toying with 4 years ago already, in my very first Year-in-review blogpost − ideas I never followed upon for lack of time and skill. I am very happy to see others independently formulate similar ideas, and more importantly execute on them. A big big thank you to Facenapalm, Nicereddy and Josh404 here! The interested reader can learn more by browsing their programs on Github: Facenapalm’s WikidataBot, Nicereddy’s random-scripts repo, Josh404’s P444_Q21039459.py.
Also worthy of note is Facenapalm’s script to easily create items based on a Steam ID: created in September, it has been used by its author to create over 3300 items − and also picked the interest of Nicereddy and Poslovitch who created another 700 items (see this database query). (EDIT: the author corrected me that the tool existed under an earlier form since March ; that accounts for another 2000 item creations)
On Wikidata, we often establish relationships one-way: for example, we link expansion packs to the main game using expansion of (P8646), and not the other way around. That means that by default, on the StarCraft (Q165929) item page, you would not see any mention of Brood War (Q840409).
There are generic solutions for that, such as the RelatedItems gadget, but I wanted something tailored to our domain. Jealous of inspired by the ExMusica.js UI-enhancement script made by [[User:Nikki]] for WikiProject Music, I wrote ExLudo.js: a user-script that enhances the display of video-game related item pages:
I’m pretty proud of it, even though all the work really had been done by Nikki and I was merely tweaking it here and there. I see a lot of potential for WikiProjects developing their own domain-specific UI-enhancers.




In April, Twitter user Catel69 published “the first version of his complete list of French adventure games since 1982” (on Google Sheets). The list is impressive for its exhaustiveness, and I strongly believe such extensive data truly shine in an open, connected database (like Wikidata) and not in a close system like Google Docs. Folks like Catel should of course use the tools they prefer, and it’s up to us (me) to then bridge the result over to Wikidata. I thus loaded the Google Sheet into Mix’n’match to further our own coverage.
As part of the celebrations around Wikidata’s tenth birthday in October 2022, Wikimedia Austria organized the “DACH Culture Contest” to add and improve data about culture in Austria, Germany and Switzerland. I modestly contributed a few thousand edits on the topic of DACH video games, improving the coverage nicely.
In June, I was invited by the German Literature Archive Marbach (Q1205813) to moderate a panel about video game metadata at their workshop “Games: Collecting, archiving, accessibility”. The speakers were Malina Riedl and Winfried Bergmeyer from the Stiftung Digitale Spielekultur (Q76632568) and Tracy Arndt and Tobias Steinke from the Deutsche Nationalbibliothek (Q27302) (I had met Winfried at a workshop in 2020, and collaborated with Tracy many times in the last years). It was my first time moderating a panel, and I hope I would do a better job next time over :-þ, but I am happy to have Wikidata part of such institutional and academic discussions.
In August was published the paper A practice of cataloging based on community-generated data as authorities: A case of a video game catalog (Q116918759) by Kazufumi Fukuda. My Japanese is non-existent, so I cannot really process what it says, but it sure mentions Wikidata a lot :-þ. This appears to be a follow-up to the 2019 Using Wikidata as Work Authority for Video Games (Q70467546) by the same author which I mentioned a few years back.
In April, I was interviewed by Magalie Vetter from Pixelvetica (Q116739051), a pilot project on video game preservation in Switzerland. This meeting came out of first contacts made back in 2021 with the Lausanne-based Gamelab UNIL-EPFL. The project report, “Sauvegarder le jeu vidéo suisse: État des lieux de la préservation du jeu vidéo en Suisse et dans le monde” (Q116770055), was published end of December.
The document is dense and exhaustive. It draws on in-depth interviews to establish the state of the art and current challenges of video game preservation and documentation ; presents an overview of the place of video games in Swiss cultural institutions based on a wide survey ; and concludes with recommendations (to policy makers, institutions, creators…) to develop video game preservation in Switzerland.
Close to my interests, chapter 2.1.2 is dedicated to metadata and description of games − both as artefact and as creative work (here called “panorama”). That section singles out Wikidata as “an interesting resource in which to invest”, emphasizing its openness, interoperability and durability. It points out how “linking one’s database to Wikidata allows to benefit from its multilingualism” (echoing what the ICS does) and to leverage “the research work already done elsewhere” ; while cautioning that this implies to “revamp the structure of one’s database” and to “take part in the life and discussions of the community”.
The appendices are also well worth a read. Appendix 2 is a deeper recount of the 10 interviews underpinning the report, with each time a section discussing the metadata model. Appendix 3 discusses community-driven preservation efforts, including metadata preservation. Appendix 5 is a deep dive through four metadata models for video games.
Finally, one of the final recommendations to archivists and librarians on archiving video games (section 3.4.2) reads as:
Regarding the description of [the document as creative work], we recommend that institutions pool their efforts via collective structures that enable to share the workload, either through participation in Wikidata or a common unified catalog at the national or international level.
I am delighted to see Wikidata mentioned in a report that reads like a Who’s Who of video game preservation − sharing the pages with organisations as established as MO5.com, institutions as prestigious as the French National Library, initiatives as hype as the Embracer Games Archive, tools as ubiquitous as KyroFlux. I hope we can live up to it, and I certainly look forward to working together with Swiss institutions.
My other take-away from this story is that it’s good to make contacts, even if not much happens at first: like seeds thrown in the wind, it may take years for them to sprout − and bear fruit.
This is my fifth year-in-review, so I know better by now than to commit to lead any big data model developments − although I have a couple of ideas of course ;-þ.
But what I will aim to do is pen more of these ideas down on this blog. This year has shown me like no other the power of long-form writing:
And if anything, I can try at least to write the next year in review before March 2024 :-þ
This work is licensed under a Creative Commons Attribution 4.0 International License.
]]>Let’s look at how these items are described along some basic properties − asking the Wikidata Query Service for some pretty graphs, and using my trusted inteGraality for some more advanced statistics.
Over 85% of the items have a platform (P400) statement (which does not mean that we have 85% completion on that topic, since many games are published on several platforms, and we may only have recorded one or a couple of them).
78% of the items have a publication date (P577)
67% have a genre (P136) − we have a very long tail of 600 distinct values as genres (some of which could use a clean-up, granted
)
Just above 53% have a country of origin (P495)
Just under 50% of the items have a developer (P178) or a publisher (P123).
77% of the items are linked to an article in at least one language-version of Wikipedia − English comes first (52%), then French (30%) and then Japanese (25%).
What I also find interesting is to look at items linked to only one Wikipedia language version: some 13% only have an article in the English-language Wikipedia, almost 10% only to Japanese-language Wikipedia, then comes French-language Wikipedia with 3% of items.
Over at Wikidata we link to hundreds of other video game databases.
The king here is MobyGames game ID (P1933), used on over 50% of our Q7889 items. Then come the 34% of Internet Game Database game ID (P5794), 27% of GameFAQs game ID (P4769), 20% of PCGamingWiki ID (P6337), 19% of speedrun.com game ID (P6783), 17,4% of the Media Arts Database ID (P7886), 16% of Giant Bomb ID (P5247), 15,2% of OGDB game title ID (P7564), 14,8% of Igromania ID (P6827)… and a very very long tail of sometimes highly specialized databases.
(The most represented are English-language databases, but the list above includes one Japanese, German and Russian databases)
1/ By the time of writing this, we already reached 50,444 items. Ah well 
2/ We had actually passed the milestone of “50K games” on Wikidata before. Looking strictly at instance of (P31)=video game (Q7889) items does not tell the full story, as we have a long tail of subclasses also used as P31: some refer to distinct concepts (the 850 DLCs or 587 expansion packs), while others are indeed games (192 mobile game, 120 video game remaster, 102 browser game, 100 video game remake…)
Both raise questions on our modelling − which we shall leave for another day and another post.
3/ 50,000 is definitely something to be proud of, but is still far from the almost 300,000 entries in Mobygames, the 80,000 of GiantBomb, the 63,000 of OGDB… and as such, is indeed a milestone on the road we have ahead of us.
As of January 29th 2021, there are 45.3K video game (Q7889) items on Wikidata − a 7,8% growth (3,3K items) over the year.
As always, let’s have a look on how well these items are described: 6,4K have no platform (P400), 10.5K no publication date (P577), 12.3K no genre (P136) − somewhat worse numbers than last year: again the overall proportion of well-described items went down, as we added more items.
Regarding external identifiers: only 800 items do not have any (1.8%). Like last year, this is still slightly misleading, as this figure includes items with vglist video game ID (P8351), which is itself based on Wikidata. Excluding vglist, we arrive at 1.2K items (2.6%). Down from 12.5% a year ago, 22% the year before and 40% the year before that − we are still on a good trend 
…or so I thought: by improving the query to only count properties related to video games (Q28147643), the number goes up to 6557 items. After a couple of spot checks, my guess was that the main contributor to this discrepancy was Freebase ID (P646) (and a quick query does seem to indicate this accounts for over 4.3K items). So I guess we are back at 15% 
We have now reached 274 video-game related external identifiers (compared to 240 external last year). Again, the additions cover various languages: Japanese (Japan PlayStation Software Database ID (P9636) or Famitsu game ID (P10144)), Czech (Databáze her ID (P10096)), Italian (everyeye.it ID (P10248)), Spanish (AbandonSocios ID (P9987)), French (IndieMag game ID (P9870) or Gameblog.fr game ID (P9702)), German (WiiG game ID (P9806) or Adventure Corner video game ID (P9747)) and quite a few in Russian (StopGame ID (P10030), [email protected] ID (P9697) etc.) taken from the Russian-language-Wikipedia list of videogame-related reliable sources − an amazing (and ongoing) work from Russian-language editors!
In terms of specialization, most of this year’s databases are fairly generic ones, besides the occasional platform specialization (Nintendo64EVER ID (P10137) & N64-Database ID (P10169) for the Nintendo 64) ; IndieMag game ID (P9870) focuses on indie games and Adventure Corner video game ID (P9747) on adventures games. I’m however pretty pleased with the inclusion of Games und Erinnerungskultur Datenbank ID (P9709), I believe the first and only scholarly database of video games we have so far.
Most of the identifiers are about games ; but we also got a new identifier for genres (Glitchwave genre ID (P10049), courtesy of Wikiproject Music “herder of genres” [[User:Solidest]]) and for series (Igromania series ID (P9835) − together with some nice OpenRefine batches by [[User:Kirilloparma]]).
Mix’n’match catalogues, which we use to align the external database with Wikidata, went from 209 to 235. Among these are the platforms and genres from the late AllGame (Q18984) database, continuing my interest in defunct databases.

I don’t believe our data model evolved at all this year − no new properties, and no new documented modeling. The only instance I can think of was the failed proposal for an “alternative title” property, which more or less consecrated the use of multiple title (P1476) (with relevant qualifiers) statements.
EDIT: meanwhile, Kirilloparma reminded me of announced at (P9731), which as a qualifier to announcement date (P6949) specifies at which important event a game or hardware was announced. And here is a little query to sort events by number of video games announced.
Dare I say it? We still did not make progress on whether we want to implement a more sophisticated data model for work/releases. Perhaps that will not happen after all, and we will stick to a single, Work-level data model (which is after all what made sense to start with for Wikipedia articles). However, I do think we must eventually expand our grammar and vocabulary to model things like art styles, perspective/viewpoint, pacing, narrative style or gameplay features. Maybe one day
Besides that, and while I try to keep my ambitions contained − there is one area I would like to facilitate: while the project is active with several contributors, it often feels that each one of us works in isolation on their own topics. I miss the days of collaboration of the month/quarter on French Wikipedia, perhaps I will try to facilitate a focused effort − can be on a genre, or a platform, or anything really: just something to make greater strides and see more visible results. Let’s see!
]]>Yet, to me there seems to be a major blindspot when it comes to technical contributors.
By technical contributors, I encompass all people contributing to server-side scripting (templates and modules), client-side scripting (user-scripts and gadgets), autonomous editing programs (bots), desktop and web applications, etc.
Here I will try to draw a (likely non-exhaustive) map of the existing support.
The Wikimedia Foundation (WMF) provides technical infrastructure:
While these platforms have their shortcomings (indeed, many volunteer developers avoid them altogether), I believe they are an amazing proposition. I also believe they are well supported: hang out on IRC asking for help for your Toolforge tool, and you will most certainly get it, whether from WMF staff, volunteers, or WMF staff with their volunteer hat.
The WMF, generally in partnership with an affiliate, has long been organizing hackathons − the Wikimedia hackathon around May, and the Wikimania one around July/August (other such events exist, such as the Dutch TechStorm).
I think hackathons are amazing − as I wrote before, they are great at creating an atmosphere suited to get work done.
A key tenet of the volunteer support, this has seen some welcome progress with the notable creation of the Coolest Tool Award.
Ah, this is often where it ends: a question “What about VeryImportantTool” is likely to elicit an answer along the lines of “This tool is volunteer-developed, and not supported by BigOrganization. We would be happy to consider a grant request to work on it though!”
While grants are useful, and have certainly helped the creation and enhancement of great tools (both in the Wikiverse and in other open movements like OpenStreetMap¹), I think this answer is reductive and inappropriate in many cases. The main reason I see is that most often, when time or motivation are lacking, money will not buy either. Some tool developers may be self-employed freelancers, or be between jobs, with the flexibility to set aside a few weeks/months supported by a grant ; not so much if one is regularly employed. Timing can also be tricky: some grants have cycles, which may not fit the (work-)life of people.
I believe that what is missing in tool development is more collaboration. It takes a village to raise a tool − and various specialties ranging from product ownership, design, development, operations, testing, QA, security, documentation… − yet more often than not, a single person is behind a tool.
It has been suggested before that WMF should take over part of the duties related to a tool − typically the operations (of tools like PetScan) but UX/design had also been suggested.
Historically, I think it fair to say WMF has not been keen on doing so. I can see good reasons to avoid that (different priorities, limited resources, not so easy to adopt foreign and potentially messy codebases, perhaps fear of cannibalizing volunteer efforts), but also bad ones (wrong sense of priorities, Not-Invented-Here syndrome).
I can see how staff taking over operations of a major tool, or having a UX review queue, might not be sustainable. But I am convinced that there is space for a technical counterpart to the Volunteer Support for content contributors.
This crystalized to me when (story time) following the pipenv 2018 release − where Sumana Harihareswara (who once upon a time worked at WMF, interestingly enough) stepped in to help the maintainers put together a well-overdue release of a major piece of the Python ecosystem. Sumana called it “Coaching and cheerleading”:
An external perspective can help a lot. You can be that person. Whether you call yourself a sidekick, a project manager, a cheerleader, a coach, or something else, you can be a supportive accountability partner who helps with the bits that maintainers are not great at, or don’t have time for. And you don’t have to know the project codebase to do this, or be a feature-level developer – the only pipenv code I touched was the docs.
Sumana Harihareswara
We could really use some professional coaches and cheerleaders.
(Speaking only for myself, as the developer of the moderately successful integraality (and ex-co-maintainer of the the monuments database) − no smash hits, but somewhat relied upon tools − I can see how having someone checking in on me once every couple of months, to help me plan/organize work, could be very valuable).
As I outlined above, there are many parts to tool development, many of which are non-technical: handling bug reports, writing documentation, prioritizing feature requests… We are an amazing collaborative community − why isn’t there more collaboration around tools? For example, when building integraality, I experienced first-hand how delegating product decisions helped:
Making the decisions on where to take the product and what red lines to draw scope-wise is sometimes the hardest. This saved me heaps of time, in the couple of instances where I was unsure how to proceed. I wonder if this should be part of the hackathon setup − every hacker being the “owner” or “client” of the another’s project.
This − again − crystalized with a recent story: reading Magnus Manske’s “The Buggregator” blog post. I could not help but feel deeply sad: Magnus, a prolific developer of highly-popular tools, is so overwhelmed with bug reports and feature requests, raised in many different places, that he feels the need to write a tool to help manage all of these. I could not help but think − why on earth does he have to manage this communication influx all by himself? Since his tools are so popular, surely there should be many folks happy to help with this tool, or that one? When someone reports a bug to Magnus on some random talk page or Telegram channel, why isn’t there a volunteer happy to take over, ask the necessary clarifications, file it in the canonical bug tracking place (whichever that might be), prioritize it for later?
Of course, most of us will never churn in so many tools that we need to write a bug aggregation platform ; but I think that Magnus’ problems are everyone’s problems − just magnified (magnufied?) ten times over.
In short, there should be more of a community active around a popular tool − people helping out in particular with the non-technical aspects.
I used to think that the lack of such community was somehow the responsibility of the developer themself − that if you were the single-maintainer on your Toolforge tool account, then you should try a bit harder.
I have come around on this: we set out to develop software, often to scratch our own itch, and not necessarily expecting success ; we don’t sign up to do community building and management on top. But that community does not seem to happen on its own either: it seems that in this movement, we are more prone to ask tool developers for help, than to ask whether they need help.
If so, then I think this should be the first and foremost job of the professional technical volunteer support: helping build up an active support network around a tool. This may include recruiting co-developers, but I see a real opportunity in engaging volunteers in non-technical capacities, and structuring that engagement with models and best practices. Far from cannibalizing volunteer resources, this would rather fit nicely in our goal of capacity building and increasing the sustainability of our movement.
In this post, I tried to map out the existing support for technical contributors, and have made the case for professional volunteer support, mirroring the practices and successes of support for content contributors. Beyond a direct “coaching and cheerleading”, this would entail recruiting and structuring volunteer-based support networks around each tool.
This may find an echo at the existing big software-houses that are WMF and WMDE, who may decide to create dedicated technical volunteer support roles or teams − I would certainly welcome it. But there are so many tools − I don’t think this is a problem that can be solved centrally. After all, no one expects WMF to help out individual content contributors in a particular country or focus area − this is what affiliates have excelled in.
Rather, the idea would rather play well with the Hubs model of support structures that has been developed as part of the 2030 strategy: a hypothetical “GLAM” thematic hub helping out on GLAM-related tools, or a “Wikisource” hub on Wikisource tools, etc.
Until then, in the spirit of decentralization and subsidiarity, I rather hope that the idea might be taken up by all affiliates: expanding their existing volunteer support to technical contributors − one tool at a time.
This piece was in the works for a while ; the impetus to finish out was ahead of the “Please don’t get hit by a bus! Towards a resilient and sustainable Wikidata tool ecosystem” session at WikidataCon 2021.
¹ Pattypan (grant) immediately comes to my mind ; I also think of the StreetComplete project (which I am quite fond of)
]]>