A lesson from network outages: Redundancy matters

Online outages are serious. Vendors lose money for every minute their users can't reach their web services, and business productivity tanks when employeescan't access the web applications they rely on to get their jobs done. People can be convinced to forgive the occasional blip, but full-blown outages reinforce the impression that nothing truly critical should be entrusted to the internet .

A look at some of the outages over the past year reveals a disturbing pattern. While the move to cloud-based architecture and applications has reduced complexity in IT infrastructure, that has come at the cost of resiliency. IT has to regularly balance redundancy -- which improves resiliency -- with complexity, and recent outages show that redundancy keeps getting left behind. Taking the time to assess potential "what if" scenarios and plan for the worst-case scenario could have, if not prevented, at least minimized the effects of these outages.

"IT needs to plan for redundancy on critical services," said Nick Kephart, a senior director at network infrastructure monitoring company ThousandEyes.

Department of Redundancy Department

Redundancy is a basic IT tenet. Whether it's multiple backend servers running the same web applications or setting up disk drives in RAID arrays, IT regularly ensures availability even in the case of a failure. Yet the massive DDoS attack against DNS (Domain Name System) service provider Dyn showed that many organizations failed to think about redundancy on their critical infrastructure.

The attack overwhelmed Dyn's servers with enough junk traffic that legitimate DNS requests were no longer being answered. Web properties that had relied on Dyn to direct traffic to their servers realized too late that not having a backup DNS provider meant they were, for all intents and purposes, cut off from the rest of the internet during that period.

Those who load-balanced their DNS name servers across multiple providers -- such as Amazon, who used both Ultra DNS and Dyn -- were able to switch during the outage and remain unaffected.

The internet usually hums along without any major issues, but the growing intensity and frequency of DDoS attacks proves that DNS needs to be treated as critical Internet infrastructure and protected as such. Theattack against DNS wasn't an aberration -- cloud-based DNS provider NS1 was hit earlier in the year, and there was also the June attack that targeted all 13 of the DNS root servers . "It was a large-scale attack on the most critical part of the internet infrastructure and resulted in roughly three hours of performance issues," said Archana Kesavan, a manager at network infrastructure monitoring company ThousandEyes.

For many enterprises, Dyn seemed like the logical way to address redundancy for DNS services because Dyn already provides a distributed architecture. IT teams don't want to have multiple DNS providers because it increases complexity to the network infrastructure, but DNS outages can and do happen, so IT teams need to double or even triple up on their DNS providers. IT should also lower the time-to-life settings on their DNS servers so that traffic can be redirected faster to the backup provider in case of an outage at the primary one.

Popularity can hurt, too

Outages aren't just the result of malicious activity or equipment failure. Popularity can be just as damaging in the absence of proper network and capacity planning. There is no such thing as too many visitors, and a hit application everyone is clamoring for is fantastic. Or at least, until the increased traffic melts down the servers and the network collapses under the load, then everyone loses.

Lack of a CDN (content delivery network) front end can be costly if traffic bursts aren't factored into the network architecture, Kephart said.

January had one of the largest lottery jackpots in recent history, but Powerball couldn't keep up with the frenzy surrounding the mega-million payout. Neither the application nor the network could handle the uptick in traffic, leading to increased packet loss and extended page load times. Powerball avoided complete meltdown by distributing traffic across Verizon's Edgecast CDN network, Microsoft's data center, and the Multi-State Lottery Association data center just before the drawing. "The damage was already done, and user experience to the website was sub-standard," Kesavan said.

PokemonGo's servers experienced similar outages when the combination of network architecture and overloaded target servers prevented users from playing the game. Apple's servers struggled to handle the much-anticipated launch of Nintendo's Super Mario Run , with sporadic outages affecting all its online stores, including the iOS App Store, Mac App Store, Apple TV, and Apple Music.

Benchmarking and capacity planning is critical, especially before software updates and large-scale events. No matter how well the network architecture is designed, CDNs and anycast servers can support the network and maximize user experience.

Did we say redundancy yet?

Don't forget about Infrastructure redundancy, either. It's tempting for IT teams to think, "My ISP can handle this, I don't need to do anything else," but even upstream providers can have outages, whether because of a mistaken configuration, hardware failure, or a security incident, Kephsart said. Networks by nature will have outages and face security threats, so IT needs to design into the network architecture the flexibility to react when something fails. Enterprises generally do a good job of building redundancy within their own data centers, but they overlook doing the same for third-party infrastructure providers.

Don't rely on a single provider, because that becomes a single point of failure. Distribute dependencies across ISPs, DNS providers, and hosting companies.

It is hard to justify security decisions when the only way to tell if it worked is to be able to say, "Hey, we didn't get hacked," or, "We didn't have an outage," at the end of the year. Those are great goals, but when there are competing demands, it's hard to justify the extra expenses or added complexity on the possibility that bad things won't happen. But that's the kind of calculus IT needs to be doing every day.

A lesson from network outages: Redundancy matters

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

新华网这张照片绝了!直讽江泽民宋祖英淫乱组图

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

Google Chrome Portable 140.0.7339.186 穩定版免安裝中文版 - Google 瀏覽器

免费翻墙节点大全