BIT9 PathNames

We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.

Why?

There are somevery good reasons as a matter of fact. One really bigreason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.

In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.

The Issue

It may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.

Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.

In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance a very bloated OS volume.

Investigating this issue from a database perspective, Ilooked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:

A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.

I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)

I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”

Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:

There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.

If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.

-- select count(1) from [dbo].[OrphanedPathnameIds] (nolock) CREATE TABLE #pathname( pathname_id BIGINT ); GO INSERT INTO #pathname ( pathname_id ) SELECT pathname_id FROM dbo.pathnames WHERE pathname_id NOT IN ( SELECT DISTINCT pathname_id FROM dbo.antibodies WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instances WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instances_deleted WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instance_groups WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.temp_antibody_instances WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.processing_temp_antibody_instances1 WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.processing_temp_antibody_instances2 WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.processing_temp_antibody_instances3 WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.temp_antibody_instances WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.processing_temp_antibody_instances1 WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.processing_temp_antibody_instances2 WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.processing_temp_antibody_instances3 WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instances_snapshots WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.approval_requests WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.internal_events WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.internal_events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.remote_events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.remote_internal_events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.approval_requests WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.events WHERE pathname_id IS NOT NULL UNION

BIT9 PathNames

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本