On summarization

I've been writing summaries of the day's Hacker News, after a challenge from Kyle Mulka to do the same. Here's some observations summarized from that effort.

Writing summaries is easier with practice. If you know what you're looking for, you can reduce a long article to some pithy bits just by finding the one quotable sentence in the whole thing that stands in for the whole. If there's no pithy bit, then you hope that the title or the first paragraph is good enough. You might have to stitch together bits of several sentences to get one that describes the whole thing.

Summarization is active reading, and it improves recall. There's nothing quite like looking through a piece of text for its inner core to give you a real sense for what it's all about, and if you have to synthesize something new you recall that better than if you just hit the "like" button. 

I'm skeptical of automated summarization, mostly because there seem to be enough people in the world interested in doing it by hand to make machine-aided cognition superfluous at least for popular articles. Machine reading works I suppose if you have so few eyes on the text that you have to crunch on it to make sense of it, but if there's an internet crowd to do that work the machine can simply tally human efforts instead of doing complicated automated text processing.

Hacker News is a challenging news stream to summarize because it turns over so quickly. In 12 hours or less, the top of the front page is fully new, and it's time to summarize again. I've been doing it daily rather than 2x daily because the yield of interestingness just isn't there to call for more. There are other tech newsfeeds and newswires that are worth digesting, but not all of them are sufficiently different from Hacker News to provide a divergent task. For example, the top "popular" links on Pinboard overlap substantially with Hacker News, as does the front page of TechMeme.

Pinboard looks like it has the right kind of infrastructure to support summarization as a part of a workflow. The path would be to identify an interesting article, clip the good bit, hit "pin", paste in the good bit, add a few tags, and save. To extract the useful stuff out you'd run a report on the tag that you used that was unique to the category you were summarizing.

Summarization lends itself to pointless numbered bullshit (cf. #4 on this list). Lots of articles are written with teaser headlines (5 best ways to do x) and can be unpacked by undoing the tease (The 5 best ways to do x are a, b, c, d, and e). You might take as your goal producing a summary that either dispenses with the need to read the original or that points the reader towards the best of what you have gone through and digested for them.

3 thoughts on “On summarization

  1. Anne Johnson

    News.me from Betaworks is worth a try; it gives me useful daily summaries from Twitter, including stuff that I wouldn’t otherwise have seen from ‘friends of friends’ (really, followers of people I follow). Looks like there will be a transition into a Digg Daily Digest soon. http://blog.news.me/


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s