Data center power efficiency increases, but so do power outages | Virtual Reality
A survey from the Uptime Institute found that while data centers are getting better at managing power than ever before, the rate of failures has also increased — and there is a causal relationship.
The Global Data Center Survey report from Uptime Institute gathered responses from nearly 900 data center operators and IT practitioners, both from major data center providers and from private, company-owned data centers.
It found that the power usage effectiveness (PUE) of data centers has hit an all-time low of 1.58. By way of contrast, the average PUE in 2007 was 2.5, then dropped to 1.98 in 2011, and to 1.65 in the 2013 survey.
PUE is a measure of the power needed to operate and cool a data center. A PUE of 2 means for every watt of power to run the data center, another watt is needed to cool it. A PUE of 1.5 means for every watt into the IT systems, a half of a watt is needed for cooling. So, lowering PUE is something of an obsession among data center operators.
However, Uptime also found a negative trend: The number of infrastructure outages and “severe service degradation” incidents increased to 31 percent of those surveyed, that’s up 6 percentage points over last year’s 25 percent. Over the past three years, nearly half had experienced an outage at their own site or a service provider’s site.
This begs the question: Is one causing the other? Is the obsession with lower PUE somehow causing more and bigger outages? Rhonda Ascierto, vice president of research with the Uptime Institute, says no.
“We can’t determine that,” she told me. “Some in the media have made that connection, but correlation is not causation. It’s certainly possible they are linked and some findings around efficiency are related, but we did not link those together.”
Most downtime incidents lasted one to four hours. Uptime asked people who suffered an outage what they estimated the cost to be, but 43 percent didn’t calculate the cost of an outage. That’s because far too many factors in determining the cause were outside that person’s specialty. Half of those who did make an estimate put the cost were less than $100,000, but 3 percent said costs were over $10 million.
What causes data center outages?
The leading causes of data center outages are power outages (33 percent), network failures (30 percent), IT staff or software errors (28 percent), on-premises non-power failure (12 percent), and third-party service provider outages (31 percent).
To err is human, and this survey showed it. Nearly 80 percent said their most recent outage could have been prevented. And that human error extends to management decisions, Ascierto said.
“Oftentimes, people talk about human error being the cause of outages, but it can include management errors, like poorly maintained or derated equipment that may not match runtime requirements,” she said. “The human error comes down to management responsibility.”
She added that another cause of failures is there is a trend toward data center consolidation, with firms moving workloads from secondary data centers to primary ones. This takes time, and since the secondary is being decommissioned, the owner doesn’t invest in it. So wear and neglect creeps into a doomed data center, making it more likely to fail.
Another cause for problems is the cascading effect of one data center taking down others. That could be either two private data centers or a hybrid situation where an on-premises center is connected to a third-party provider such as Amazon or Microsoft. If one goes down, it has a greater chance of taking down the other(s).
Uptime found 24 percent of those surveyed said they were impacted by outages across multiple data centers. “Five years ago it would be a much lower number,” said Ascierto, who added she expects an increase in outages caused by cascading failures between multiple sites, since more and more companies are adopting multiple cloud services strategies, as well as the growing interdependency of multiple IT services.
“There is this belief that having a hybrid architecture makes you more resilient, but visibility and accountability is more difficult and the rate of outage is high,” she said.
Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.