Category Archives: Systems

January 23, 2014: Metro-North computer “glitch” strands NYC commuters

ABC News reports:

Metro-North, the nation’s second-busiest commuter railroad, was brought to a standstill for nearly two hours on Thursday night after computer problems caused signal issues system-wide, railroad officials said.

CBS has this diagnosis of what went wrong, and why

The cause of the outage remained under investigation Thursday night. “We can’t say what the cause of it is. We’re still trying to figure that out,” Daniels told 1010 WINS.

For a good read on after-incident reporting, I always refer people to the Postmortems group on Google+ – it’s an excellent source of descriptions of what happened when computers “glitch” (because they rarely glitch all by themselves).

UPDATE: AP/WSJ The Metro-North computer “glitch” described as human error, AP story:

MTA head Thomas Prendergast said Friday that computers that run Metro-North’s signal system lost power at 7:45 p.m. Thursday when one of two main power supply units was taken out of service for replacement.

He said technicians performing the work did not realize that a wire was disconnected on the other main power supply unit.

Systems attract systems people

From Systemantics, the systems bible:

While Systems-people share certain attributes in common, each specific system tends to attract people with specific sets of attributes. For example, people who are attracted to auto racing are likely to be people who enjoy driving fast, tinkering with high-powered cars, and beating other people in fierce competition. The System calls forth those attributes in its members and rewards the extreme degrees of them. But a word of warning is in order. A priori guesses as to what traits are fostered by a given system are likely to be wrong. Furthermore, those traits are not necessarily conducive to successful operation of the System itself, e.g., the qualities necessary for being elected president do not include the ability to run the country.

Systems attract not only Systems-people who have attributes for success within the system. They also attract individuals who possess specialized attributes adapted to allow them to thrive at the expense of the system, i.e., persons who parasitize them. As the barnacle attaches to the whale, these persons attach themselves to systems, getting a free ride and a free lunch as long as the system survives.

A copy of the Systemantics 1975 edition is online, at The Ohio State University, which probably doesn’t know that it’s there.

Outages, cable failures, and postmortems: sources for information

If you, like me, are interested in how systems work and when they fail, here are three good sources to look at from the internet persective on global network behavior.

The outages mailing list (outages at outages.org) describes itself as follows:

The primary goal of this mailing list ("outages") is for outages-reporting that would apply to failures of major communications infrastructure components having significant traffic-carrying capacity, similar to what FCC provided prior to 9/11 days but they seem to have pulled back due to terrorism concerns. Some also believe that LEC's and IXC's also like this model as they no longer have to air their dirty laundry. Then again, this mailing list is not about making anyone look bad, its all about information sharing and keeping network operators & end users abreast on the situation as close to real-time information as possible in order to assess and respond to major outage such as routing voice/data via different carriers which may directly or indirectly impact us and our customers. A reliable communications network is essential in times of crisis. 

There's always good information about bad news to be found here, with a typical exchange being "I lost some circuits to city X" and the reply "Company Y has a fiber cut in city Z". 

A second excellent source of global routing information about failures and reconfigurations of the global internet is the weblog that's written by the company Renesys. The Renesys Blog has as of this writing details about network connectivity problems to North and South Korea, a fiber cut in the Black Sea that disrupted traffic as far away as Oman (3000+ km away), and details of a submarine cable landing in Cuba. 

Finally – though there is not really a finally in this world where part of the Internet is always under repair – there is a "Postmortems" discussion group on Google Plus where reports come in after the fact of people describing just what went wrong with their complex system and (usually) what they plan to do next to avoid the next round of similar failures. From squirrel-induced cascading power failures to denial of service attacks to runaway email systems, there's a new lesson to be learned from each after-action review of failure.

All of these groups overlap somewhat with the RISKS Digest, one of the oldest mailing lists still around on the internet, that covers risks to the public from computing technology.

distributed systems, and systems failure in general

 

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." Leslie Lamport, 1987; though the same observation would be true for cloud computing today.

 

Noted recently on this Google Plus thread; I don't think I'll quote the whole thing, but by referencing it here I'll find it again at least for a while.

Prediction markets are too easy to game, so let’s play games instead

Prediction markets provide a nice, idealized way to use market mechanisms to predict the future. They are a slightly idealized version of an office betting pool and allow small scale bets (real or virtual) on future events. Unlike a real futures market, where you can get guarantees of future performance on things like delivery of gasoline or frozen orange juice, there's no way that betting on some prospective GOP hopeful will lift your candidate to office.

Market manipulation is the bane of most prediction markets. Consider Intrade, a prediction market in which GOP hopefuls are listed. The media will report that a newcomer is trading well, but won't necessarily note that the frontrunner's place is based on less than $500 worth of daily trades. If someone were to want to bump their favorite over the top for at least a little while it would only take pocket money from a proper hidden campaign pocket to game the system in your favor.

Gaming the system is a problem in systems that pretend to be markets, and where some ideal form of market is lionized as a way to gain enlightenment from the invisible hand of Adam Smith. Turning the system into a game to be one sullies the economist's ideals of a rational allocation of resources based on infinite economic rigor.

Why fight people's tendencies to want to subvert the intent of the system builder? The new hot trend in online environments is gamification, where gaming the system becomes the whole goal of the system. Unlike markets, which try to come to truth with some single-valued score of value, games have the opportunity to bring the world into a lovely multi-valued focus, with different people winning each at their own version of the game. The introduction of badges, short term competitions, lots of ways to score points and other fun side games overwhelms the narrow trading interests in market-based predicitons. 

The future of prediction markets lives inside complex predictive games. The game design has to take into account not only the rapid conclusions that you can draw when you have an efficient market for opinions, but also the continuous efforts you have to keep up to maintain people's attention on a topic for any length of time. The good games reward active participation; the challenge becomes how to draw conclusions from the work of engaged gamers working inside your system.

Thanks to Brian Kerr for the idea. Games are on my mind because of the work of the Ann Arbor District Library and Eli Neiburger to build games into the summer reading program at the library. I tried to predict library use with a prediction market in 2007, and attendance at lunch in 2009, using Inkling Markets. The game of choice at the Workantile Exchange is a version of bingo; when I bring my kid to work an alert player may win. The Intrade GOP market shows the book of available trades; at this writing, purchasing about 300 shares of Mitt Romney would push the price up 8% at a cost of less than $1200. Empire Avenue turns social media friendships into a giant game, with badges and prizes.

System mapping

It's always good to collect a lot of examples as you look to map out a system. For that you want to have generative tools that suggest the kinds of examples that you want to look for.

The power outage maps I have been collecting started with the generative system of making a blank entry for each of 50 states, and looking to collect one in each state. I've done similar exercises where the generator is the first letter of the alphabet collecting examples A-Z. You are looking for breadth in your view, and something that constrains the search so that you don't spend too much time in one place before going on.

It helps to have some kind of rudimentary ranking algorithm when wrapping up your search, so that you can look for gaps into which another example will fit. The utility map effort looks like it's going to generate a simple checklist for each map, so that I can give each one a score on completeness (0 = no map, add 1 for county by county stats, add 1 for city by city stats, add 1 for zoom to the outage, add 1 for systemwide counts etc).

Put the things you collect into categories that have names, so that you can start working with abstractions instead of concrete instances. The list of categories becomes another place to generate a bounded set of additional elements.

Once you have all of these you can start to think about plausible things that might be in the system that you haven't found yet. My piece on "unknowledge management" (the phrase is from Tom Munnecke) where the task becomes looking for names for things that don't exist yet but that might plausibly exist given the system that you have described. This means that you end up with something peculiar to search for, and either it doesn't exist yet, it can't exist (and your models are off), or you find it.

So the iterative process looks like

describe a search space

collect examples

rank along common attributes

categorize into abstractions

synthesize the undiscovered instances

and that seems to be as good a plan as any for survey of the field.

This food system map of Baltimore looks like an instance of the structure of this work. In academia, the survey article (Google Scholar) is structured in much the same way.

Huron River water level watching: USGS National Water Information System

Follow the level of the Huron River as it goes through Ann Arbor with the USGS National Water Information System. On March 23, the river is rising, but it's also fluctuating in levels, enough that it's difficult to determine whether current trends would lead you to believe that there is any risk of flooding.

image from 137.227.241.67If you were to watch this closely, what would you watch?

The USGS will give you an alert if the level of the river is above or below a certain level.

The first derivative is the rate of change of the water level. You'd want to track that to note trends. If you measured the difference between adjacent water levels, you'd identify spikes, which have been associated with work on the dam in the past; those will give you a metric on the order of rate of change in inches per hour. If you did some smoothing, you'd find trending measured in feet per day.

The second derivative is the rate of the rate of change of the water level; you're looking for that to identify erratic behavior. As a rule of thumb a time series with large movements in both directions (up and down) over a short amount of time is a sign of river disturbance.

A plausible approach would be to compute a smoothed curve, and then alert on deviations from the smoothed curve. You'd love to be able to tell the difference between sudden downpour, steady rain, flash flood, work on the dam, and dam failure.

Another strip chart to line up with this is rainfail (measured in convenient units) and precipitation forecast.

This forecast is from USGS Waterwatch for Michigan. Hidden behind the projected water level is a rainfall forecast for the watershed.

image from water.weather.gov

Statewide data, if you want to look for more, from Waterwatch. The black dots are flooding, blue are near flood.

image from waterwatch.usgs.gov

Edward Vielmetti watches water levels on the Huron River. Write to him at edward.vielmetti@gmail.com.