The Tool Sprawl Problem in Monitoring

Home Blog DevOps The Tool Sprawl Problem in Monitoring One of the biggest KPIs in the DevOps space is monitoring. There are so many tools to help any organization to complete their monitoring picture, but no tool does everything and most organizations use many tools to help complete their monitoring solution. Mashing tools together often creates a problem of its own ― the tool sprawl problem.

More on the subject:

AWS GuardDuty Monitoring with Logz.io Security Analytics and the ELK Stack MongoDB Performance Monitoring Using The ELK Stack OpenStack Monitoring With Elasticsearch, Logstash, and Kibana

In modern computing, it’s not how much data you collect and report, or how efficient, or how durable your monitoring solution is. Sure, those are all important considerations, but it’s how effective and useful your monitoring is that makes the difference. It’s how much value to the business it creates, and how well the data can be exploited to identify and resolve critical issues. Monitoring is never a completed effort.

It evolves. It is enhanced by tools and by integrations. Often enough, the journey to improve monitoring is what creates and accentuates the tool sprawl problem. In this article, I’d like to examine how monitoring tool sprawl can become a serious issue for modern, engineering-driven companies.

Monitoring Challenges

The task of monitoring modern IT environments is too complex to properly handle without tools. The days of allowing logs to sit on servers and fishing through them to find answers are long gone. Alerting on an operating system issue and manually clearing out all the noise from old vendor solutions for sysadmins (think HP, Dell, IBM) no longer scales in the world of cloud computing.

Luckily, there are plenty of modern tools to solve modern issues. But like any type of software, every monitoring tool has weaknesses and strengths in their own right. Organizations will often patch together multiple monitoring tools based on their strengths and just deal with the sprawl.

So what are the modern problems to solve and tools to solve them?

Logs

Log data is considered an extremely valuable data source for monitoring and troubleshooting both applications and the infrastructure they are installed on. Most log management tools on the market provide analysis capabilities. Some provide advanced analytics such as machine learning and anomaly detection. Most of these tools now include plugins and integrations with cloud vendors to provide greater insight into cloud-based applications.

The world’s leading open source log management tool is, of course, the ELK Stack ― an extremely popular and powerful platform but one that often requires more engineering effort and expertise to scale .

Metrics

Metrics, or time-series data, is another type of telemetry data used for monitoring. Used primarily for APM (Application Performance Monitoring), ITIM (IT Infrastructure Monitoring) and NPM (Network Performance Monitoring), metrics introduce another kind of challenge being more verbose in nature and requiring more elaborate data storage and retention strategies as well as analysis features.

Open source solutions are often comprised of a time series database such as Prometheus, InfluxDB or Graphite with Grafana playing the role of the analysis and visualization layer. Plenty of SaaS vendors offer their own APM and monitoring solutions, including premade dashboards for monitoring specific services or platforms.

Security

The increase in cyber threats means organizations must operate with security in mind. A big part of security is active monitoring and reactive controls. Triggering alarms on root or administrator login is an example, or signaling a Puppet run when a security-controlled configuration is changed via an automated response to a security incident. To be able to build this kind of solution requires a very specific kind of tool, usually falling under the category of SIEM or Security Analytics. Again, there are both open source and proprietary solutions on the market but the skills gap is proving to be as big a challenge as integrating and deploying these solutions.

Compliance

SOC, PCI, HIPAA, SOX, GDPR, ISO, and CODA are just a few regulatory and compliance certifications companies must contend with to remain in business. All of them require some level of auditable data to show that their required checks and controls are being maintained. This means companies must find tools to capture, store, and retrieve data for compliance. Some tools excel at configuring controls or capturing security data but aren’t as strong at capturing application logs and transforming them into formats that mesh well with security logs to have an overlay picture.

Alerting/Reporting

Again, most tools provide canned reports, most also allow you to build your own reports. The key difference is some provider’s reports will be more relevant to an organization than others. An example of where the tool sprawl can become real is an organization with a security team that prefers the tailored security event reports from Alertlogic, an operations team that uses Datadog’s metrics for capacity planning and the developers use the ELK Stack to determine API performance issues. All three tools can create all three reports, but they do not specialize in providing all three. This key difference is what creates a tool sprawl challenge, in this case for reporting and alerting.

Multiple solutions mean what?

After reading the previous section, it is easy to see how companies choose multiple tools and vendors to solve their monitoring needs. In the following section, I’d like to examine some of issues that can result from having multiple monitoring solutions.

Multiple panes of glass

Having security data flow to one tool, systems performance data to another, and application data to a third makes correlation much more difficult. Even if you are able to have data sources feed multiple frontend tools, it still requires additional “stitching” to deliver the data in a meaningful way and the systems still present information differently. This can force the need to build translation jobs between solutions, or lengthy exports and manual correlation in spreadsheets. Nobody wants to do that.

Administration (and cost) is heavier

This means managing permissions through RBAC, customization of data feed sources, plug-in management, and supporting infrastructure must be considered. The resources and cost burden can become extremely heavy pretty quickly when designing for scale, high availability, and storage.

Additional automation Every age

The Tool Sprawl Problem in Monitoring

Trending Articles

SM3268AB 8CE三星量产无法格式化

[下载工具]Think4V utubedown(Youtube高清视频下载工具) v2.1.6 官方版2.1.3

出售: SINE Othello 電源線

博讯｜张磊帮助下，李源潮的儿子被耶鲁录取

FullEventLogView 1.73 免安裝中文版 - 事件檢視器取代工具

同門四角戀？李沛旭喇舌「小郭雪芙」曾智希，蔡淑臻拍完婚紗...怒毀婚

五代RAV4 降車身（機械車位因素）

[攻略] 《魔獸世界》6.2.2 白色魚人蛋再現！來去收編魚人寶寶特基！

jetBrains Product crack 2024 Java based

2013 KUGA 6G轉動方向盤會聽到摳摳摳的異音，有人知道原因嗎?

【豌豆字幕組】[藥屋少女的呢喃（藥師少女的獨語）/ Kusuriya no Hitorigoto][25][繁體][1080P][MP4]

好用的照片后期处理软件【DxO PhotoLab Elite 5.4.0.4765 (x64) 多语言便携版】..

出售: Thixar Silence Plus 啫喱板

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

三條崙討海人故事…重建烏倉寮憶43年前船難

致喬立建設道歉聲明

[一般] 神州全地圖掉寶資料

方易通7862 8/128G 無360 刷機

動感校園小記者・瑪利諾修院學校｜採訪王瑋駿陳晞文帶領試玩風帆

有藍電流行車紀錄器分享文嗎