Category Archives: Categorization

Categories and their utter impossibility of reflecting the current real world

Screen shot 2013-06-10 at 9.55.51 PMOpen up a "New Post" window, start typing.

Look at the categories on the right side of the screen – my screen, not yours. There's an infinite scrolling list of them, from "Americana" to "zzz Draft Postings". Somehow, categories became the wrong way to describe what is in this blog; aside from "Ann Arbor", there's few of them that have routine postings. But with almost 3000 articles written, it's had to imagine any coherent way to recategorize them.

The problem with categories, buckets, or really any hierarchical system for putting things in one place or another is that there's always the new thing that defies categorization and asks for its own category. I'm not willing to have a big "Other" category, but I am willing to have an "Oklahoma" category that only gets a few posts a year (earthquakes, tornados).

I've used systems that depend heavily on categorization to make them work. You set up some kind of ongoing task depending on how things or (worse) people are sorted into buckets. Suddenly the world needs bright divisions between one kind of thing and another kind of thing, and any ambiguity about status has to collapse into a single value to make the system make sense.

Some tools in my experience have been better than others for recategorization. I've written before about Maxthink, a 1980s era MS-DOS "idea processor" (still available in Windows) that has a concept of "binsort", which is an extraordinarily rapid and keyboard-driven way to shuffle and reshuffle items into a hierarchy. That system never really got enough traction to be copied wholesale by any other system, and so it lives on in the world only through dim memories and hardcore users.

Reshuffling a tree with lots of existing categories is really hard work, because short of wiping out all previous categorization and starting anew you are almost guaranteed to have some vestiges of the old order in place as you're trying to assert a new set of groupings. Even the infinitely flexible Wikipedia has this problem, as it inherited some parts of its classification system from older encyclopedias like Brittanica.

So I struggle with sorting through the contents and give up more often than not, relying instead on search rather than careful categorization to unearth old things and to draw relationships among nominally related works. It's the problem that everything is miscellaneous and too many things are interrelated and that my knowledge of the world too often seems a mile wide and an inch deep.

Compounding the problem of miscellany is the tendency of our Internet to rot out from under us, with old sites and old links disappearing as people redesign or move on or give up. There's hardly any way to refer to something without quoting it nearly in full, for fear that when you want to go back to it there will be nothing there to see. 

I seem to be rambling, what was the point again? Oh, categories and their utter impossibility of reflecting the current real world and the horrific difficulties of rethinking them half way through your efforts to use them. It so happens that I have a categorization category, which hopefully has some relationship to this brief essay.

Advertisements

Categorization and its discontents

image from upload.wikimedia.org
Wikipedia is going through one of its periodic existential crises, this time over the use of categories. A diligent editor created a category of "American women novelists" and began adding female novelists to it, and then removing them from the category "American novelists". A predictable hue and cry ensued. James Gleick tells the story in some detail in the NY Review of Books, under the title "Wikipedia's Women Problem".

Categorization is hard in a Wikipedia world. The namespace for articles is flat, so there's no implicit categories assigned simply by what a page is named. On the other hand, categories can contain subcategories, and the Wikipedia editors take advantage of this to create elaborate categorical structures. Everything is part of something else, in some extended rhetorical tangle. (The image is from the article Wikipedia:Categorization.)

Categories don't have to have subcategories, of course; that's a convention that's not universal to all wiki software. Localwiki's equivalent structure looks more like tags, and tags are in one big flat namespace with no explicit hierarchy.

Fundamentally the choice is how to best improve findability of articles, and how to incorporate a distinctive feel to a system. Wikipedia has "Category:Doughnuts", with one subcategory "Doughnut shops" which in turn has a subcategory "Tim Hortons" which in turn has 12 pages, including Timbits. Arborwiki has a page Donuts, which includes pages tagged with the tag "donuts", and is in turn tagged as "Pastries that are not strange". (And yes, there is a "Pastries that are strange" tag as well.)

Wikipedia has a problem when the technically correct task of splitting a big category into smaller parts runs smack into the political minefield of deciding which part of the category is to be primary and which is to be secondary. No one wants to be secondary, and no reasonable system would always yield category warring. Perhaps Wikipedia has bumped up against some size limit where it's impossible for any one person to understand the complexity of the classification system it has built.

Related articles

Wikipedia bumps women from 'American novelists' category
Wikipedia working to get rid of women in category: American novelists
What's In A Category? 'Women Novelists' Sparks Wiki-Controversy
American novelists are dudes, according to wikipedia
Is Wikipedia Ghettoizing Female Writers?
Wikipedia's Sexism – NYTimes.com
[eim][misc] Too big to categorize
Losing the categorical imperative
What's In A Category? 'Women Novelists' Sparks Wiki-Controversy

Typepad on iPad, edited elsewhere

No support for rich text editing in safari, but otherwise performant. Kind of nice.

I’ll need to really learn markdown for a couple of reasons, not the least of which is that it descends from setext.

Ann Arbor

A2B3 lunch is Thursday as always.
Ann Arbor Parks did trick or treat today, Sunday, noon to 4pm on the Huron River.

Arborwiki makes a good companion as you go for errands around town.
Ann Arbor City Council elections and a millage are coming up. The Ann Arbor Chronicle has characteristically thorough coverage of the League of Women Voters forums.
Some project, not yet identified, has North Main torn up at Catherine. A second project has North Division down to a single lane. Expect delays.

No one was hurt in last week's fire on Harpst.
I'm trying a neighborhood LinkedIn group to see what kind of density I need to get enough people to make a group worthwhile; it might make sense to grab people closest first and then out by distance.

Metro Detroit

Tigers lost in the ALCS, and I’m looking forward to spring training.

Power outages from the Saturday windstorms were worst in Warren.

National

Occupy Chicago has had a lot of protest, via the Chicago Tribune which was on the scene.

Occupy Wall Street took over Times Square.

Living

I am tracking steps with a pedometer again, thanks to Paul Resnick and a research group at UMSI.
Statler and Waldorf have taken over the Muppets twitter account. New movie due for Thanksgiving. Cue the Muppets.

Recipes

The wind on Saturday made farmers market blustery. Squash of all sizes and varieties were there, and there’s nothing like a big old Hubbard squash to keep the corner of a table down. A farmer was doing the frost dance but said they had none at the last full moon. Traditionally, it’s said that the best way to open a Hubbard is to take an axe to it, or to throw it down into the cellar.

Working

It’s hard to have great weird ideas when you are closing trouble tickets.
My new employer Nutshell has an office where my former employer Pure Visibility used to have it's offices.

Obituaries

Steve Jobs, Dennis Ritchie, Einar Steffrud.

Books

Books moved recently include Kawabata’s, Snow Country, to be shelved on the Heikki Lunta shelf to prepare me for winter.

Tech

Pinboard now supports Gopher urls in bookmarks.

Sports

Michigan football lost to State. It was as good an excuse as any to call my aunt who went to East Lansing.

Meta

Wow, I have a lot of categories.

System mapping

It's always good to collect a lot of examples as you look to map out a system. For that you want to have generative tools that suggest the kinds of examples that you want to look for.

The power outage maps I have been collecting started with the generative system of making a blank entry for each of 50 states, and looking to collect one in each state. I've done similar exercises where the generator is the first letter of the alphabet collecting examples A-Z. You are looking for breadth in your view, and something that constrains the search so that you don't spend too much time in one place before going on.

It helps to have some kind of rudimentary ranking algorithm when wrapping up your search, so that you can look for gaps into which another example will fit. The utility map effort looks like it's going to generate a simple checklist for each map, so that I can give each one a score on completeness (0 = no map, add 1 for county by county stats, add 1 for city by city stats, add 1 for zoom to the outage, add 1 for systemwide counts etc).

Put the things you collect into categories that have names, so that you can start working with abstractions instead of concrete instances. The list of categories becomes another place to generate a bounded set of additional elements.

Once you have all of these you can start to think about plausible things that might be in the system that you haven't found yet. My piece on "unknowledge management" (the phrase is from Tom Munnecke) where the task becomes looking for names for things that don't exist yet but that might plausibly exist given the system that you have described. This means that you end up with something peculiar to search for, and either it doesn't exist yet, it can't exist (and your models are off), or you find it.

So the iterative process looks like

describe a search space

collect examples

rank along common attributes

categorize into abstractions

synthesize the undiscovered instances

and that seems to be as good a plan as any for survey of the field.

This food system map of Baltimore looks like an instance of the structure of this work. In academia, the survey article (Google Scholar) is structured in much the same way.

Holdridge vegetation classification system

Another triangle, this one found in a University of Michigan Global Change lecture on the Tropical Rain Forest, by George Kling. 

The triangle is the "Holdridge triangle", dating to 1947. 

  • Holdridge, L.R. (1947) Determination of world plant formations from simple climatic data. Science, 105, 367–368.

image from www.globalchange.umich.edu

All materials © the Regents of the University of Michigan unless noted otherwise.

Alas, poor delicious, you knew us well

The scuttlebutt is that delicious is going to be axed by Yahoo. 

Once upon a time, I bookmarked everything interesting that came across my path to delicious (back when it was del.icio.us).  It was part of my routine, and a daily summary was posted through to this blog.

Delicious was from the tags era of the Internet, where in addition to noting that a thing existed you could add your own tags to describe it. Sometimes these were straightforward tags, like the 1533 pages I marked as "annarbor". Others were idiosyncratic, like the 5 pages I marked as "attention-to-irrelevant-details".

There are other, better ways to bookmark things so that lots of people see them. The facebook "like" button gets more page views without consuming any cognitive overhead about how to tag, whether to tag, and what you've tagged before. Actually writing about something is quite a bit better than just bookmarking it, because you get to be yourself for a little while and not just an automaton forwarding on links automatically.

Every bookmark I ever did on delicious, up until now, is archived for posterity here: vielmetti.typepad.com/vacuum/delicious-20101216.htm . As I review it, there really should have been more of them marked "attention to irrelevant details". It's hours of reading, though some of it goes by fast because the site that was linked to has disappeared, leaving only the bookmark and whatever clipping I managed to care about.

edit: now with more cognitive overhead

more delicious:

Les Orchard, Let a million bookmarks bloom. "Use the web. Host your own, pay for it, or find someone who values your data."

Stephen Hood, We can save Delicious, but probably not in the way you think. "The Delicious user community could organize to save the data themselves via a coordinated harvesting project." 

Edward Vielmetti is http://www.delicious.com/vielmetti.

finding buckets to refill regularly

Much of the writing I do these days has tags but not much in the way of categories. There are a few very broad categories that things can go into, but not much in the way of narrow ones, and the world of tags means that you can always think up one more new tag to apply rather than revisiting an old one.

In some ways I miss the regular reinforcement obtained by the reminders that I've written on items in a category previously in the past.  This blog has a "garlic mustard" tag, which signals that I should write about that at least once a year; without that reminder, somehow I might miss that opportunity to refresh that annual recipe opportunity.   The category list is long, but finite, and it's idiosyncratic to reflect some peculiar sense of evolved focus.

Tags are much harder, especially in a world of tags shared across a team.  There are enough variant spellings and punctuation in an 8000 unit tag cloud to make it tough to see which of those tags are most useful.  Perhaps I can carve out 20 or 30 to make my own, items to revisit repeatedly, but the system as a system does not constrain me as such.

It's very useful to have a bucket that can be usefully refilled periodically.  At the cycle of once per year, the cumulative knowledge of paczki rituals or cudighi recipes makes each successive year easier.  Monthly rituals leave room to breathe between attempts, and weekly cycles build up a substantial body of work in just a year.  

In many ways this is the mindless mindfulness approach to sustained productivity; don't decide what of 1000s of things you might do, simply have one picked for you and then go at it with a strict deadline to get you to where you need to go until the next time.  The effort can be impromptu and doesn't require much explicit preparation, since you've primed the pump with previous work.