There are a lot of things that the City of Seattle did right in their management of last weekend’s data center outage to fix a faulty electrical bus bar. They kept their customers – the public – well informed as to the outage schedule, the applications affected, and the applications that were still online. Their critical facilities team completed the maintenance earlier than expected, and they kept their critical applications online (911 services, traffic signals, etc) throughout the maintenance period.
Seattle’s mayor, Mike McGinn, acknowledged in a press conference last week on the outage that the city’s current data center facility “is not the most reliable way for us to support the city’s operations.” Are you looking for a data center provider, especially one where you’ll never have to go on record with that statement? If so, here are a few take-aways:
A failure to plan is a plan to fail. While the city of Seattle planned to keep their emergency and safety services online, had they truly planned for the worst? I’m sure they had a backup plan if the maintenance took a turn for the worse, but did they consider the following: what if a second equipment fault occurs? Traditionally, the “uptime” of an application is defined as the amount of time that there is a live power feed provided to the equipment running that application. I would offer a new definition of “uptime” for mission critical applications: the time during which both a live power feed and an online, ready-to-failover redundant source of power is available to ensure zero interruptions. “Maintenance window” shouldn’t be part of your mission critical vocabulary. Which brings me to my next point . . .
Concurrent maintainability and infrastructure redundancy is key. I will go one step further – concurrent maintainability AND fault tolerance are key factors in keeping your IT applications online. The requirement to perform maintenance and sustain an equipment fault at the same time isn’t paranoia – it’s sound planning. Besides, a little paranoia is a good thing when we’re talking about applications like 911 services, payment processing applications, or other business-critical applications.
Location. Location. Location. The city of Seattle’s data center is located on the 26th story of a towering office building in downtown Seattle. The fact that they had to take down multiple applications in order to perform this maintenance implies that the electrical feed redundancy to their data center is somewhat limited. There are many competing factors in choosing data center location: risk profile, electrical feed, connectivity options, and natural hazard risk profile, to name a few. For mission critical applications, your location choice has to center on factors that will keep your systems online 100% of the time.
Flexibility and scalability give your IT department room to breathe. The city of Seattle leased out their single-floor data center space before the information economy really took hold. As a result, their solution is relatively inflexible when it comes to the allowable power density of their equipment. They’re quickly outgrowing their space and already looking for an alternate solution. Look for a data center provider that focuses on planning for high-paced increases in rack power draw – do they already have a plan for future cooling capacity? How much power has the facility contracted from the local utility?