Naturally, all of you have read my famous
“ Why Your Security Data Lake Project Will FAIL!
” [note: Anton’s ego wrote this line :-)]
Today I read a great Gartner note on data lake failures
in general ( “How to Avoid Data Lake Failures”
[Gartner access required]). Thus, I wanted to share a few bits that, in my experience, are VERY relevant to security data lake efforts I’ve seen in recent years.So:
“Proponents of data lakes often exaggerate their benefits
by promoting them as enterprisewide solutions to all data and analytics problems.” indeed, we’ve seen the exact same thing with security data lakes! Of course, then the reality hits: you build a huge pile of dirty data poo and nothing else …
“Data lakes are rarely started with a definite goal in mind
, but rather with nebulous aspirations […]” same is often seen with security data lakes.
“Avoid confusing a data lake implementation with a data and analytics strategy. A data lake is just infrastructure
[…]” this is pretty much what I said in the post
.
“The popular view is that a data lake will be the one destination for all the data in their enterprise
and the optimal platform for all their analytics.” the paper
later explains that, generally speaking, this is very
false, becauses it rests on 3 false assumptions. This is false even if scoped down to all security relevant data.
The paper later describes several exciting FAIL scenarios, all of which I’ve seen with security data lakes. For example, “single version of the truth” as a failure scenario often means a single version of raw unusable data that nobody wants and nobody knows how to use
.
Another “failway” is “ Data Lake Is My Data and Analytics Strategy
” with its juicy “ego-driven perspective on data lakes: they see them as means by which to be viewed as thought leaders […]” that result in all the useless data, none of the insight
situation.
Yet another FAIL comes from “Infinite Data Lake” confusion. Imagine lots of useless data … now imagine a lot of useless data a year later. Two years. Five years. What is worse than unusable data? OLD unusable data
that has even less context.
NOW: useless. TWO YEARS LATER: that much more useless at huge hardware cost!
Finally, they close with: “ The goal of gathering all data in one location was never truly achieved
in the data warehousing world. It’s unlikely to be achieved in the data lake world, either […]”
Note that this post intentionally does not quote any of the recommendation from the paper. Sorry, but you have to read the paper for that (because policy ).
Enjoy!
Related posts:
Why Your Security Data Lake Project Will FAIL! Sad Hilarity of Predictive Analytics in Security? On Unknown Operational Effectiveness of Security Analytics Tooling Now That We Have All That Data What Do We Do, Revisited Killed by AI Much? A Rise of Non-deterministic Security! Security Analytics Lessons Learned ― and Ignored! Why No Security Analytics Market? <- important read for VCs and investors! Works in 2018 too, mostly. More On Big Data Security Analytics Readiness 9 Reasons Why Building A Big Data Security Analytics Tool Is Like Building a Flying Car Big Analytics” for Security: A Harbinger or An Outlier?