Cyber Monday: Do You Know the Cost of Your System’s Downtime?

As Black Friday and Cyber Monday loom over eCommerce, threatening to take down your website with legions of bargain shoppers, chaos engineering firm Gremlin has calculated the exact cost of not preparing for this four-day shopping cart battle.Gremlin’s platformwas created by former engineers at Amazon and Netflix that helps companies run chaos engineering experiments to avoid downtime and outages. They’ve now created a nerve-wracking eCommerce cost of downtime calculator that piles on the increasing cost of downtime at the top 25 U.S. online retailers with your time on page.

If Amazon were down for a whole ten minutes, it’d lose about $2 million. If America’s favorite brick-and-mortar Walmart’s website was down that long, it’d be out $400,00 in online sales.

Why such a terrifying message at a time of seasonal joy? Well, if you work on site reliability engineering, technical support or QA on an eCommerce site, you’re going to be on-call this time of year anyway.

According to Deloitte , online sales will increase between 17 and 22 percent this holiday season. That’s a lot of extra traffic straining your servers.

Downtime, especially for an e-commerce site, especially during this peak holiday traffic, directly equates to revenue loss.

According to IDC , for the Fortune 1000, the average total cost of unplanned application downtime is somewhere between $1.25 billion to $2.5 billion annually, making an average of average hourly cost of an infrastructure failure about $100,000 an hour, while a critical application failure is about $500,000 to $1 million per hour.

Gremlin explained that “Enterprise commerce businesses typically rely on a complex microservices architecture, from fulfillment to website security, ability to scale with holiday traffic, and payment processing.”

This leads to a lot that can go wrong, which will have customers quickly clicking somewhere else to find another sale that loads faster or doesn’t keep crashing. This is why downtime is so, so very costly for online stores. Evidently, for Amazon, even a second of downtime costs more than all the other top 25 retailers combined.

Systems resiliency can really help you decrease your shopping cart abandonment issues.

What Are the Basics of Systems Resiliency?

As software culture moves toward releasing faster and faster, and toward allowing almost anyone to build and release, on many more complex pieces of smaller disparate, distributed pieces of code, while we distribute the risk, we also complicate the reliability.

The movement of breaking away from massive legacy beasts in favor of building with microservices and miniservices has led to the rise of whole careers that go past reactive customer support and one-way QA. Site reliability engineering and chaos engineering have become necessities in all enterprises, not just their banking origins. These are whole jobs working toward preventing downtime and preventing security breaches ― another risk that seems to peak around the holidays.

Change of some kind is inevitably the cause of most outages. Site reliability engineering or SRE was created as a way to forge the speed of IT with the deliberate stability of operations and to limit the liability of change. SRE is a job dedicated to thinking about the whole lifecycle of a piece of software ― from design to decommission ― and all the risks to its stability.

Chaos engineering, an off-shoot of SRE, is the art of breaking things intentionally by throwing a poo-filled kitchen sink at your systems before life does. Principle SRE at Gremlin Tammy Bütow already told The New Stack how it’s important to break your systems to understand what needs to be fixed.

But SRE and chaos won’t stabilize your systems alone. It’s important to have a really good incident management system in place and broadcasted visibly around your company, contributing to your improved communication and shared an understanding of your software. You also need really good monitoring in place and even observability. You need youron-call rotation set up. You also need to decide the guinea pig service that will first be hit by rampant chaos. And, finally, to understand the value of these tools and processes, you need to have a really good idea of the business impact of your downtime.

How Do You Measure Downtime?

It’s important to consider the cost of your downtime early and often. With these numbers, developer teams can better persuade and “sell” the need for these tools and practices to the business side, which in turn will push DevOps, CI/CD and faster more stable release goals of the whole business.

Start by bringing development, business and customer-facing support or sales in a room together. Note how much your service costs your users ― the higher the price, the greater the assumption of reliability and uptime. Next, what is your customer acquisition cost? Probably marketing can help with this one. Factor in that it’s five times harder to get a new customer than to keep a current one, making customer retention the key to long-term success. Plus with bad reviews online, you won’t be attracting new customers for long anyway, so don’t forget to also calculate for loss of reputation.

Next, factor in the cost of productivity lost and of on-call engineers focusing on fixing things already built instead of making improvements. Some enterprise surveys have put that productivity loss at about two-thirds of the total cost.

If your downtime could be something penalized because it breaks a privacy or security regulation, factor that fine in too.

Most important, you need to understand what you would be losing. For eCommerce, it’s fairly easy to calculate based on average sales per minute.

Calculating the cost of your downtime isn’t precise. Gremlin offers this equation as a starting point:

R (lost revenue) + E (lost productivity) + C (what you owe to customers, like SLA) = COD (Cost of Downtime)

No matter how you calculate it, the discussion of what that downtime could cost and the importance of preventative measures like chaos engineering and site reliability engineering is essential to the future of your business ― if for nothing else than breaking down silos between business and IT and starting a conversation.

Cyber Monday: Do You Know the Cost of Your System’s Downtime?

Trending Articles

[奇怪机翻组] 双梦相牵 / ふたりの夢もち [RJ01259078] [WebRip] [1080P HEVC-10Bit AAC 2.0]...

HONDA CITY VTI-S 菜單分享

#新闻拍一拍# 新的摩尔定律：黄氏定律

一如既往的痴情能否打动月瓶金蝎？ (豆瓣月亮水瓶小组)

求購按摩椅~'~

「粉红」不是霸凌辜莞允杠部落客：我爽在哪？

Intel 7-10代集成显卡驱动31.0.101.2137完整版

涉Gotbit加密货币市场操纵台男纽约被捕

臺灣法治會計學會2025年第三季研討會

不靠姊姊！張柏芝弟弟開計程車維生

关门一家亲：习远平、张澜澜、徐才厚

剑指offer——24.二叉树中和为某一值的路径

苏珊米勒日晕05.11｜狮子鼓励孩子；处女相信自己 (豆瓣 SUSAN MILLER小组)

【台積電IT卓越新戰略5】台積IT組織5年三次大調整，要靠平臺工程讓DevOps創新再加速

【日语无字】春之钟.Haru.no.kane.1985.JAP.vhsrip.NoSub.by.xiongzaixia&vivi

美籍老公不讓步李愛綺兒子念公立小學

爆杨兰兰对于朦胧一见倾心泄露亲爹习近平致命机密？【阿波罗网报道】

湖州师范学院音乐学院开发的 Kontakt 8 明代魏氏乐琵琶/瑟/月琴音源即将发布

LameXP 4.21.2382 免安裝中文版 - MP3音樂轉檔軟體

免费翻墙节点大全