Categorization and its discontents

Wikipedia is going through one of its periodic existential crises, this time over the use of categories. A diligent editor created a category of "American women novelists" and began adding female novelists to it, and then removing them from the category "American novelists". A predictable hue and cry ensued. James Gleick tells the story in some detail in the NY Review of Books, under the title "Wikipedia's Women Problem".

Categorization is hard in a Wikipedia world. The namespace for articles is flat, so there's no implicit categories assigned simply by what a page is named. On the other hand, categories can contain subcategories, and the Wikipedia editors take advantage of this to create elaborate categorical structures. Everything is part of something else, in some extended rhetorical tangle. (The image is from the article Wikipedia:Categorization.)

Categories don't have to have subcategories, of course; that's a convention that's not universal to all wiki software. Localwiki's equivalent structure looks more like tags, and tags are in one big flat namespace with no explicit hierarchy.

Fundamentally the choice is how to best improve findability of articles, and how to incorporate a distinctive feel to a system. Wikipedia has "Category:Doughnuts", with one subcategory "Doughnut shops" which in turn has a subcategory "Tim Hortons" which in turn has 12 pages, including Timbits. Arborwiki has a page Donuts, which includes pages tagged with the tag "donuts", and is in turn tagged as "Pastries that are not strange". (And yes, there is a "Pastries that are strange" tag as well.)

Wikipedia has a problem when the technically correct task of splitting a big category into smaller parts runs smack into the political minefield of deciding which part of the category is to be primary and which is to be secondary. No one wants to be secondary, and no reasonable system would always yield category warring. Perhaps Wikipedia has bumped up against some size limit where it's impossible for any one person to understand the complexity of the classification system it has built.

