Outages, cable failures, and postmortems: sources for information

If you, like me, are interested in how systems work and when they fail, here are three good sources to look at from the internet persective on global network behavior.

The outages mailing list (outages at outages.org) describes itself as follows:

The primary goal of this mailing list ("outages") is for outages-reporting that would apply to failures of major communications infrastructure components having significant traffic-carrying capacity, similar to what FCC provided prior to 9/11 days but they seem to have pulled back due to terrorism concerns. Some also believe that LEC's and IXC's also like this model as they no longer have to air their dirty laundry. Then again, this mailing list is not about making anyone look bad, its all about information sharing and keeping network operators & end users abreast on the situation as close to real-time information as possible in order to assess and respond to major outage such as routing voice/data via different carriers which may directly or indirectly impact us and our customers. A reliable communications network is essential in times of crisis. 

There's always good information about bad news to be found here, with a typical exchange being "I lost some circuits to city X" and the reply "Company Y has a fiber cut in city Z". 

A second excellent source of global routing information about failures and reconfigurations of the global internet is the weblog that's written by the company Renesys. The Renesys Blog has as of this writing details about network connectivity problems to North and South Korea, a fiber cut in the Black Sea that disrupted traffic as far away as Oman (3000+ km away), and details of a submarine cable landing in Cuba. 

Finally – though there is not really a finally in this world where part of the Internet is always under repair – there is a "Postmortems" discussion group on Google Plus where reports come in after the fact of people describing just what went wrong with their complex system and (usually) what they plan to do next to avoid the next round of similar failures. From squirrel-induced cascading power failures to denial of service attacks to runaway email systems, there's a new lesson to be learned from each after-action review of failure.

All of these groups overlap somewhat with the RISKS Digest, one of the oldest mailing lists still around on the internet, that covers risks to the public from computing technology.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s