The tremendous demand on the digital infrastructure industry to provide more capacity more quickly creates challenges across the supply chain. Among the challenges most top-of-mind for industry leaders are safety, quality, and security.
This is the third in our End User Summit Top-of-Mind Blog Series, which reflects the top issues discussed during the Infrastructure Masons October 2019 End User Summit in San Jose. Attendees included senior-level end user leaders from across the globe, IM Foundation Partner executives, and members of the Diversity & Inclusion (D&I) Committee.
Attendees at the End User Summit were split into six groups: data center, hardware, network, infrastructure management, diversity & inclusion, and partners. In all six, there was deep discussion about scale and its ramifications. “Most of the large data center operators have reached a scale where issues that were minor are becoming more than minor,” explained one partner. “At the scale we are at right now, failures create problems at a level that is not manageable.”
Dealing with it is not optional. Said one end user, “We have to grow. We don’t have an option. It’s not that we can say ‘Oh we’re becoming too big. We can’t handle it.’ We have to grow and we need to figure out how to do it.”
“The tremendous demand on the digital infrastructure industry to provide more capacity more quickly creates challenges across the supply chain.” – Click to tweet
“We’re all on the hook for not killing anybody,” said one partner. “But how often do we share our safety procedures across the industry? There’s no secret sauce in saving lives. There’s a set of standards and procedures around safety, be it in construction and/or operations, that, quite frankly, we should be standardizing.”
A lack of standardized operating procedures is more significant given the industry’s talent constraints. One partner explained, “Many of us get into a situation where we’re using partners or subcontractors to do some of the work. And then how do you make sure that your culture of quality and safety resonate through your entire supply chain to your subcontract labor force?”
“There’s no secret sauce in saving lives. There’s a set of standards and procedures around safety that we should be standardizing.” – Click to tweet
Some end users talked about quality problems in the context of poor quality control in manufacturing. One said, “The quality of the products across multiple vendors is really poor these days. Generator problems, UPS problems, battery problems. We’re spending a significant amount of labor dealing with quality problems and dealing with procedures to work around those quality problems.”
Other attendees talked about the challenge of managing failures at scale. One end user explained, “When I started at my company, we had 14,000 servers. We’ve grown to almost 200,000. The team that managed the day-to-day repairs didn’t have a concept of failure at scale. So while the failure rate stayed the same, the number of failures increased 14-fold. We need to start thinking in a different way. How do we actually manage failures on any level? As we scale, we actually need to get better.”
“How do we actually manage failures at any scale? As we scale, we actually need to get better.” – Click to tweet
As with quality, managing security is exponentially more difficult at scale. One end user explained, “Supply chain security is a challenge both on the incoming hardware as well as actually maintaining the hardware. It creates a bottleneck if we go to high scale. So how do we actually feed data centers with hardware either on prem or off prem? This has to be addressed in the near future because we are already reaching the breaking point. A lot of very large environments are breaking.”
Resolving security challenges has to include planning for what to do when you have a breach. “It’s not like you can prevent every breach,” explained one end user. “So you have to actually plan and build your environments to sustain compromise on the hardware. I’m not necessarily trying to protect against every use case but I plan for the most common ones. Because no matter how good you are, someone will find a way in eventually.”
“Resolving security challenges has to include planning for what to do when you have a breach.” – Click to tweet
Solution: Transparency across the supply chain
Attendees agreed, transparency and communication through the supply chain is critical to improving safety, quality, and security. As one end user explained, “When a vendor becomes aware of a deficiency, how fast does that get communicated to the end user? In some instances it is very quick. If we’re the first to know or second to know, that’s the best we can ask.”
Communication in the form of industry-wide data sharing could also help. One partner said, “There’s an opportunity for vendors to step up and share data. It can be anonymous. But do we ever collectively take failure data and feed them into an overall algorithm? My guess is we may find recurring themes.”
Of course, being transparent without giving away competitive advantage is tricky. “Obviously, sharing information from an owner’s side out is desirable, but from the vendor side, that can be very dicey, very challenging,” said one partner. Attendees suggested that a neutral organization like Infrastructure Masons could be a clearinghouse for information that impacts scale, safety, quality, and security.
“I think this venue is a great chance for us to get together and talk about what we could be comfortable in sharing for the benefit of the industry,” one partner said. An end user added, “I like the idea of aligning to a methodology where you could report back reliability in a consistent way that makes it easy for our vendors to understand how they’re performing.”
In addition to improving quality and security, sharing data could also help improve safety. As one end user suggested, “If we’re more transparent about the data [around failures], we’re actually inherently making ourselves a safer industry.” And that, he said, could go a long way to keeping the regulators at bay. “I came from the nuclear industry. Yes, you could argue that what drove the regulators was preventing stuff from getting into the environment. But ultimately, I think regulatory issues were driven by safety.”
“If we’re more transparent about the data [around failures], we’re actually inherently making ourselves a safer industry.” – Click to tweet
Figuring out how to scale 39x or even 6x without compromising safety, quality, or security is an all-hands-on-deck endeavor. As one partner said to the end users in the room, “The forecasting volatility and pressure you face to keep up with demand translates down to pressure on suppliers to deliver faster in markets that we’ve never been to. Could we have some shared responsibility to solve this? It’s not just my problem. It’s your problem, too.”