TJ Ciccone's blog

Failing Up: Stronger Data Centers through Incident Management

In the critical facilities industry, incidents are typically given a bad rap. Executives and operators view incidents – events that affect the redundancy of the data center – as bad business. So winning an award for managing incidents would seem like being recognized for your ability to bail water rather than build a sound boat. But to the right company, incidents aren’t a measure of failure; they’re challenges that improve your business process. The upside to incidents is the ability to learn from them and, more importantly, the opportunity to share those lessons with others; both internally and throughout our industry.

Bob Wichert and TJ Ciccone from RagingWire receiving the 2014 Uptime Institute Incident Management Award

Bob Wichert and TJ Ciccone from RagingWire receiving the 2014 Uptime Institute Incident Management Award

At the Uptime Institute’s Critical Facilities Summit in Charlotte, NC on October 5, 2015, RagingWire Data Centers received the 2014 Uptime Incident Management Award. This award was presented in recognition of achievement in tracking and responding to – not avoiding – incidents in data center infrastructure (as determined by incident contributions to the Uptime Institute Network Abnormal Incident Report (AIR) Database).

In simplest terms, this does not mean that RagingWire experienced the most incidents. It means as a company, we successfully capitalized on them; helping to spread knowledge to other members of the organization. So much so, in fact, that we submitted more than three times the amount of lessons learned over our nearest competitors.

What does it take to win this award? It takes an operational commitment to sharing data regarding incidents at your facility and implementing changes to prevent them in the future. It is a humbling, but rewarding, task. By being active participants in the AIR database, we have been able to collectively gather statistical data that has helped shape our data center world today.

When needing to build a case for 24/7 staffing, you can access the database and track the percentage of incidents that occur during non-peak hours. If you think your site is incurring an abnormal amount of faults on a piece of equipment, or a high Mean Time Between Failures (MTBF), you can turn to the database and search for others who may be experiencing the same issue. Wondering if a new type of cooling solution would be a good fit? The shared data can help you make a more informed decision.

We all have incidents, let’s just admit that together. They are an unavoidable side effect of what we do, and certainly a smart data center strives not to make the same mistake twice. But what defines your business is not the ability to never have an incident – which would require some tricky bookkeeping and diligent rug-sweeping – but the ability to learn from them and come out stronger as a company, and ultimately as an industry.

Are traditional power protection metrics good enough?

On a rainy Wednesday morning, at the stunning GE building in Washington, DC, I was treated to an in-depth panel discussion regarding the validity of traditional power protection metrics in the data center. The focus of this discussion was centered on the idea of TCO (Total cost of ownership) for data center equipment. In a classic battle of old versus new, the group was tasked with discussing whether or not the current metrics and standards being used today to measure total cost of ownership (TCO) were still valid considering the significant changes in data center topology that have occurred over the past decade. The 5-member panel consisted of some of the heaviest hitters in the regional data center industry.

There were six specific issues discussed by the panel. These issues were:

  1. A comparison of current versus best practice ownership models
  2. How to design 'right size' data center power protection for the right application?
  3. How to determine what exactly needs to be included in life cycle costs and how to quantify these?
  4. How does service response time affect TCO?
  5. How can initial design and specification decisions affect downstream costs?
  6. Do alternative financing approaches change TCO metrics?

With each of these being noteworthy enough to warrant their own panel discussion, one can only imagine the plethora of information uncovered in the three hour session. The portion of the discussion that was most revealing was how data gathering has changed the landscape of the TCO outlook. Obsolete methods include putting faith in tribal knowledge, and manufacturer claims, instead of what was actually being seen in the field.  The data center world is evolving away from relying on answering the question, "How did we do it before?" to "How can we do it better in the future?" In my current role, I was extremely interested in how what we do at RagingWire Data Centers fits into the latest trends in the industry. As it turns out, with the incorporation of our N-Matrix DCIM system, we are on the brink of the latest technological advances the industry has to offer regarding total cost of ownership.

RagingWire N-Matrix DCIM

Many of the players in the data center world are relying on either outdated methods, or third party software, in order to produce data that they should be producing in-house.  By doing this, you can utilize cross-departmental collaboration in order to make the best decisions in regards to purchasing the best equipment to suit your facility’s needs. The lesson to be learned here is by bringing this analysis in house, you can minimize downtime, reduce total capital expenditures and provide the best value for clients-present and future.

Is it hot in here?

During a recent RagingWire data center tour, a potential client asked, “Is it hot in here?” Much to everyone’s surprise, the tour director smiled as he answered, “Yes, yes it is.” The reason behind the tour director’s happiness goes much deeper than you might think.

Water Cooled Chillers - RagingWire Data CentersWalking into a RagingWire Data Center, you may notice something unlike most other data centers - it’s certain spots. By utilizing extensive air flow analysis, employing a top-notch operations team, and adopting the 2011 ASHRAE TC9.9 guidelines for higher end temperatures, RagingWire is leading the way in creating a more energy efficient data center environment. It’s still a comfortable place to work. It’s just more energy efficient than 5-10 years ago.

Though no global data center temperature standard exists, in 2011 ASHRAE published an update to its whitepaper titled, “Thermal Guidelines for Data Center Processing Environments.” This guideline raised the recommended high-end temperature range from 77°F to 80.6°F, and raised the allowed high-end to reach 89.6F. Still, many data center operators have failed to embrace the broader, more environmentally friendly guidelines. Why?

Server and other electronic equipment suppliers have embraced the TC9.9 guidelines for years and most warranty their equipment to meet the new specifications. The problem exists with outdated data centers or vintage computing equipment that require lower temperatures, and fear of changing current operating parameters. According to a 2013 Uptime Institute Survey of more than 1,000 data centers globally, nearly half of all data centers reported operating at 71-75°F. The next largest segment, from 65-70°F accounted for 37% of all data centers surveyed!

Why does RagingWire operate at these higher temperatures? It all comes down to one small, three-letter acronym, PUE. PUE, or Power Usage Effectiveness, is a measure of the data center IT load vs. the total power consumption including mechanical and electrical load.

In some cases, with cooling accounting for up to 50% of the data center load, reducing the amount of consumption will lead to positive change in critical facility PUE. By some estimates, every 1°F increase in server inlet temperature can lead to a 4-5% savings in energy costs.

But let’s put some money where our math is: If you operate a facility with a PUE of 1.4, and your total IT load is 1MW, increasing your server inlet temperature just 1°F can lower your annual energy consumption by over 600,000kWh per year!

By achieving a lower design PUE, RagingWire Data Centers captures significant savings, and is able to pass these savings on to its customers. This allows retail and wholesale data center clients the ability to operate in a world-class facility with a small-world footprint. Lowering operating costs and resource consumption, without a reduction in service is usually the kind of undertaking that makes a Board of Directors stand up and applaud. And it can be as simple as ticking up that thermostat.

Subscribe to RSS - TJ Ciccone's blog