• Home
  • About
    • Leadership
    • Advisory Council
    • Partners
      • Partner Opportunities
    • Membership
    • Alliances
    • History
    • Governance
  • Get Involved
    • Ways To Get Involved
    • Committees
      • Diversity & Inclusion Committee
      • Education Committee
      • Sustainability Committee
    • Member Resource Groups
      • IM Women Community
      • Employ Military Veterans
      • LatinX Community
      • Millennial-GenZ Community
  • Giving Back
    • Donate/$1Million Challenge
    • DIFF
    • Scholarships
    • Awards
  • Share
    • Podcasts
    • iMatters Videos
    • Thoughts
    • White Papers
    • Blogs
      • iMasons
      • Geekism
    • News
    • Newsletters
    • Partner Insights
  • Events
    • IM Events
    • IM Live
    • Food for Thought
  • Join
    • Join iMasons
  • Members
    • Join iMasons
    • Member Area
    • Shop
admin@imasons.org
Infrastructure MasonsInfrastructure Masons
Infrastructure MasonsInfrastructure Masons
  • Home
  • About
    • Leadership
    • Advisory Council
    • Partners
      • Partner Opportunities
    • Membership
    • Alliances
    • History
    • Governance
  • Get Involved
    • Ways To Get Involved
    • Committees
      • Diversity & Inclusion Committee
      • Education Committee
      • Sustainability Committee
    • Member Resource Groups
      • IM Women Community
      • Employ Military Veterans
      • LatinX Community
      • Millennial-GenZ Community
  • Giving Back
    • Donate/$1Million Challenge
    • DIFF
    • Scholarships
    • Awards
  • Share
    • Podcasts
    • iMatters Videos
    • Thoughts
    • White Papers
    • Blogs
      • iMasons
      • Geekism
    • News
    • Newsletters
    • Partner Insights
  • Events
    • IM Events
    • IM Live
    • Food for Thought
  • Join
    • Join iMasons
  • Members
    • Join iMasons
    • Member Area
    • Shop

Is Software Resiliency the New Data Center Redundancy?

Is Software Resiliency the New Data Center Redundancy?

Is Software Resiliency the New Data Center Redundancy?

Sep 4, 2018 |

Digital infrastructure leaders used to live and die by availability. Uptime was everything. But as slow becomes the new down, leaders are looking outside the traditional 2N+1 box to deliver.

This is the fifth in a series of five blog posts reflecting the top-of-mind issues discussed during the most recent Infrastructure Masons Advisory Council meeting.

It used to be that redundancy – that is, having a backup system in case the main system goes down – was the way to maintain availability. And availability was everything. Resiliency – the ability of the system to recover from a fault – was less important. Bounce back at least before the generators run out of fuel, and you’re good.

Not so anymore. Availability still matters, to be sure, but digital infrastructure leaders are thinking outside the traditional 2N+1 redundancy box. Given the unrelenting and exponential growth of data – that is, of demand on the infrastructure – digital infrastructure leaders are looking for more efficient and sustainable ways to deliver availability without building two (or three) of everything.

Resiliency is the new redundancy

Some are finding their answers in software. As one Advisory Council member, an end user, explained, “When we had a data center fault, it directly impacted our business – we couldn’t serve our customers. That was huge and real and revenue impacting. It caused major red flags.”

The solution was not another 2N data center.

“Because of the threat,” the end user continued, “the engineering team built in software resiliency. Then as software resiliency went up, my need for data center resiliency went down.” In response to a question about resiliency versus redundancy, he clarified, “More redundancy in the portfolio, less resiliency in the elements.”

The end user added, “With zones I can fail over between those constantly – and we do, several times a week. That evolution of the software changed my data center requirements.”

“As the software resiliency went up, my need for data center resiliency went down.”  – Click to tweet

In response to a question about whether buying more systems is cheaper than building a more redundant facility, the end user said, “Yes, if the software is aligned to do it.” In other words, it only works if resiliency is built into the software. Then “the more availability zones I have the less equipment I need.”

The end user articulated the math: “If I have two availability zones it requires a replication factor of 2.4 (100% in each zone, plus 20% overhead each as additional buffer for when things spike during failover). If I go to three availability zones whereby I can actually distribute workloads, the replication factor goes from 2.4 to 1.9. That’s 1.9 times the equipment instead of 2.4. If I go to four availability zones the replication factor goes to 1.8. If I have 20 zones I can have a replication factor of 1.2. Then I need a total of 20% extra equipment (1.2) to run the workload instead of 140% extra equipment (2.4) to run that load.”

It’s all about the workload

But what level of redundancy and resiliency is appropriate? One end user stressed what felt like a consensus: “It depends on what the workload is.” He shared an example, “Take the case of a very large manufacturing company running its entire manufacturing execution system out of a data center. It’s a very highly transactional database that has to be in sync real time because it’s driving the whole supply chain, all the factory utilization – the whole business.”

“In that case you want to have A and B in full synchronous replication at all times,” the end user explained. “Then maybe the network costs start to trump whether it’s more expensive to build in higher resiliency” – or redundancy. So it’s not necessarily about infrastructure redundancy being more effective at delivering availability than software resiliency, but about the cost of ensuring availability via software or infrastructure.

“What level of redundancy and resiliency is appropriate? It depends on the workload.” – Click to tweet

“It depends on what the application is.”

One end user said that because resiliency v. redundancy decisions are so dependent on the application, “The business will drive us from the cost perspective to more compute-specific architectures.” For example, he said, “with machine learning we can go straight power. We don’t have to worry about persistent data to be restored. And we don’t have to be in major metro areas because latency isn’t issue.” In contrast, “These are different requirements than, say, virtual machines for public clouds where we need ultra-resiliency.”

In the past, the end user continued, the economics clearly favored application-agnostic architectures that served the majority of line of business needs – because that approach pooled the risk of any single business unit being wildly off its capacity needs projections. Today, he said “It may be that cost drivers [of having application-specific architecture] are now substantial enough that business units are willing to take on the risk of getting their predictions wrong [and either overbuilding or not having enough capacity].”

Slow is the new down

One partner predicted that, all things considered, “The more critical the data is – the bigger the impact the data will have on the profitability or survivability of the business – the more it will still tend to be in redundant environments and require some level of structural support. I think it will be that way for a long time.”

“The more critical the data is, the more it will still tend to be in redundant environments and require some level of structural support.”  – Click to tweet

Even when data isn’t objectively ‘critical’ if it’s the reason the business exists (think of any of the social media platforms, for example), availability is as important for that business as it is for the manufacturer in the earlier example.

“Look at it from the business’s side,” suggested one partner. “If you know that if you don’t respond to customer requests in a certain number of seconds they’ll go to another app, then you’re going to do everything you can to make sure you can respond within that certain number of seconds.”

“Slow is the new down,” said another partner.

But the way the hyperscale companies are achieving availability, in many cases, is different. They’re doing it through software resiliency rather than infrastructure redundancy.

“Criticality is in the eye of the business. Slow is the new down.”  – Click to tweet

Bottom line: Availability still matters, to be sure. So does speed. But the answer to the question of how to deliver availability and speed is changing. It’s not always 2N+1.

For more insights into what’s top-of-mind in 2018 for digital infrastructure leaders, check out the previous posts in our series.

Previous posts in the 2018 top-of-mind series:

  • Alignment: Business, Infrastructure, App & User
  • Cloud versus Colo or Cloud and Colo?
  • Serving the Next Billion People
  • Managing Unpredictability Without Breaking the Bank
4
Share

You also might be interested in

EMEA Leaders On Sustainability

Jun 25, 2020

Industry leaders from the Netherlands, the United Kingdom, Sweden, and Nigeria shared their perspectives on sustainability during the iMasons EMEA Global Member Summit on May 28, 2020.

Leadership Lessons – Swimming with Sharks

Jul 11, 2018

We were honored to host a fireside chat with Michael Tobin at the iMasons event in Monaco.

Finish Data Center Forum & Tour of Yandex

Finish Data Center Forum & Tour of Yandex

Aug 12, 2019

iMasons participated in the Future Data Center Summit in Helsinki, followed by a tour of the Yandex Data Center facility.

Connect

Find us on

Contact Info

  • admin@imasons.org
  • https://imasons.org

Fresh from blog

  • End User Summit- Data Center Track brought to you by Vertiv
  • Fireside Chat with Chetan Dube
  • Fireside Chat with Ali Fenn
  • Awards Ceremony 2020

Twitter

  • “The importance of digital infrastructure is being proven out now during this pandemic” says Joe Kava  @Google …  http://t.co/3ZZYbXb8ud 
  • 9 months ago

Follow @InfraMason

© 2021 — Infrastructure Masons. - All rights reserved.

  • Home
Prev Next