We live in a day and age when security (data, network, server, etc) is seemingly at the forefront of the daily news. In our quest to improve security we seemingly always find more and more products that are supposed to help with that security. Sometimes, those products require a data repository. As it happens, it is not uncommon for that repository to be in a database. Today, I am looking into a specific issue with the Bit9 product.
Why?There are somevery good reasons as a matter of fact. One really bigreason is that I could find no decent information about this issue. The bigger reason is the seemingly indifferent level of response and the delays I saw in response from the Bit9 support channels.
In this specific case, the first response from their support channels took more than two days. Subsequent responses were more than a week later. For the client, this actually caused delays in a project they were working on. Was the issue significant? It was not an outage causing issue, but it was one that did seem to continue to grow and cause concerns with disk space.
The IssueIt may be appropriate to discuss what Bit9 does prior to breaking into the details about the issue. If you have never had any experience with this product before, you may be running a product from a competitor such as Symantec. Bit9 is a an endpoint protection tool from Carbon Black. This helps to prevent against malware and endpoint attacks. In short it is a security based tool to protect your computing enterprise.
Go ahead and click the image to link to their site if you wish to learn more about how it works. For me, the nitty gritty specifics on how it works is a bit outside the scope of this article.
In short, Bit9 records a record of every filename and every filepath for every client machine. It stores these in a database in SQL Server. This shouldn’t be too much of an issue. For one instance serving about 300 client machines we saw about 1.2 million paths being stored. One another instance with far fewer clients and far fewer actual paths on those clients (physically checked) we had 114 million paths stored in the database. The difference in size was 20GB vs ~160GB. When the server is installed with all defaults, you can imagine how this looked for this instance a very bloated OS volume.
Investigating this issue from a database perspective, Ilooked to find what was consuming so much space in the database. To do that, I ran my tablespace script and discovered the following:
A significant standout there with the pathnames table. Looking at the definition of the table I see something far more interesting and disturbing all at once.
I see a table with three columns and two large string fields. Each of these fields has a non-clustered index on it. This may or may not be such a big problem (other than the fact that the size of the non-clustered indexes on this table are much larger than the data) except that each of the string fields is an exact duplicate of the other. That’s right. Within this table, the data is duplicated into this second string field and each field has its own index. Not only does it appear that I have a ton of duplicated data, it appears I have entirely useless indexes (neither had been touched for a read since the server had been up)
I inquired about this to the folks at Bit9, both from a design perspective and from an archival perspective. Absolute silence on the design (expected). The inquiry about archival (or purge) of non-essential data did fetch a response albeit a painfully slow response. The basic question is: “Is it safe to purge or archive old or unnecessary data.” The response we received was “run this and we will tell you what to do next.”
Great, they sent a script to help determine the state of data within the database. I am not posting their script here. Suffice it to say that the script they sent was not very pretty. They query about 20 tables, union the results from those tables, then perform a not in operation to see how many of the pathnames are invalid. No problem. Executing the script did reveal the following:
There seems to be the problem. 95.89% of the rows being stored in the pathnames table are orphaned records! This is a bit of a problem. The software does not appear to manage removal of invalid paths. From here, I knew what the course of action needed to be and acted on it. A big piece of the equation was provided from the results of the script. Another piece was provided in the makeup of the script. From these pieces of information, I created a purge script to help manage the orphan problem. I then put that script into an agent job and set it to run on a weekly basis.
If you find yourself in the same boat, here is the script needed to purge the data. If running this, I recommend disabling the two non-clustered indexes and then performing the delete (especially if you sit at 96% orphaned and over 100 million rows). After deleting the mass amount of orphans, go ahead and rebuild the indexes to stay in compliance with the software contract until Bit9 responds about the index requirement and the schema of the table.
-- select count(1) from [dbo].[OrphanedPathnameIds] (nolock) CREATE TABLE #pathname( pathname_id BIGINT ); GO INSERT INTO #pathname ( pathname_id ) SELECT pathname_id FROM dbo.pathnames WHERE pathname_id NOT IN ( SELECT DISTINCT pathname_id FROM dbo.antibodies WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instances WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instances_deleted WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instance_groups WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.temp_antibody_instances WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.processing_temp_antibody_instances1 WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.processing_temp_antibody_instances2 WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.processing_temp_antibody_instances3 WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.temp_antibody_instances WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.processing_temp_antibody_instances1 WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.processing_temp_antibody_instances2 WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT old_pathname_id FROM dbo.processing_temp_antibody_instances3 WHERE old_pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.antibody_instances_snapshots WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.approval_requests WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.internal_events WHERE pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.internal_events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.remote_events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.remote_internal_events WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT process_pathname_id FROM dbo.approval_requests WHERE process_pathname_id IS NOT NULL UNION SELECT DISTINCT pathname_id FROM dbo.events WHERE pathname_id IS NOT NULL UNION