Twelve months ago, a team of 50 Google employees used GitHub to patch the “Apache Commons Collections Deserialization Vulnerability” (or the “ Mad Gadget vulnerability ” as we call it) in thousands of open source projects. We recently learned why our efforts were so important.
The San Francisco Municipal Transportation Agency had their software systems encrypted and shut down by an avaricious hacker. The hacker used that very same vulnerability, according to reports of the incident . He demanded a Bitcoin ransom from the government. He threatened to leak the private data he stole from San Francisco’s citizens if his ransom wasn’t paid. This was an attack on our most critical public infrastructure; infrastructure which underpins the economy of a major US city.
Mad Gadget is one of the most pernicious vulnerabilities we’ve seen. By merely existing on the Java classpath, seven “ gadget ” classes in Apache Commons Collections (versions 3.0, 3.1, 3.2, 3.2.1, and 4.0) make object deserialization for the entire JVM process turing complete with an exec function. Since many business applications use object deserialization to send messages across the network, it would be like hiring a bank teller who was trained to hand over all the money in the vault if asked to do so politely, and then entrusting that teller with the key. The only thing that would keep a bank safe in such a circumstance is that most people wouldn’t consider asking such a question.
The announcement of Mad Gadget triggered the cambrian explosion of enterprise security disclosures. Oracle , Cisco , Red Hat , Jenkins , VMWare , IBM , Intel , Adobe , HP and SolarWinds all formally disclosed that they had been impacted by this issue.
But unlike big businesses, open source projects don’t have people on staff to read security advisories all day and instead rely on volunteers to keep them informed. It wasn’t until five months later that a Google employee noticed several prominent open source libraries had not yet heard the bad news. Those projects were still depending on vulnerable versions of Collections. So back in March 2016, she started sending pull requests to those projects updating their code. This was easy to do and usually only required a single line change. With the help of GitHub’s GUI, any individual can make such changes to anyone’s codebase in under a minute. Given how relatively easy the changes seemed, she recruited more colleagues at Google to help the cause. As more work was completed, it was apparent that the problem was bigger than we had initially realized.
For instance, when patching projects like the Spring Framework , it was clear we weren’t just patching Spring but also patching every project that depended on Spring. We were furthermore patching all the projects that depended on those projects and so forth. But even once those users upgraded, they could still be impacted by other dependencies introducing the vulnerable version of Collections. To make matters worse, build systems like Maven can not be relied upon to evict old versions.
This was when we realized the particularly viral nature of Mad Gadget. We came to the conclusion that, in order to improve the health of the global software ecosystem, the old version of Collections should be removed from as many codebases as possible.
We used BigQuery to assess the damage. It allowed us to write a SQL query with regular expressions that searched all the public code on GitHub in a couple minutes.
SELECT pop, repo_name, path FROM (SELECT F.id as id, repo_name, path FROM (SELECT id, repo_name, path FROM [bigquery-public-data:github_repos.files] WHERE path LIKE '%pom.xml') AS F JOIN (SELECT id FROM (SELECT id,content FROM (SELECT id,content FROM [bigquery-public-data:github_repos.contents] WHERE NOT binary) WHERE content CONTAINS ' commons-collections<') WHERE content CONTAINS '>3.2.1<') AS C ON F.id = C.id) AS V JOIN (SELECT difference.new_sha1 AS id, COUNT(repo_name) WITHIN RECORD AS pop FROM FLATTEN([bigquery-public-data:github_repos.commits], difference.new_sha1)) AS P ON V.id = P.id ORDER BY pop DESC;We were alarmed when we discovered 2,600 unique open source projects that still directly referenced insecure versions of Collections. Internally at Google, we have a tool called Rosie that allows developers to make large scale changes to codebases owned by hundreds of different teams. But no such tool existed for GitHub. So we recruited even more engineers from around Google to patch the world’s code the hard way.
Ultimately, security rests within the hands of each developer. However we felt that the severity of the vulnerability and its presence in thousands of open source projects were extenuating circumstances. We recognized that the industry best practices had failed. Action was needed to keep the open source community safe. So rather than simply posting a security advisory asking everyone to address the vulnerability, we formed a task force to update their code for them. That initiative was called Operation Rosehub.
Operation Rosehub was organized from the bottom-up on company-wide mailing lists. Employees volunteered and patches were sent out in a matter of weeks. There was no mandate from management to do this―yet management was supportive. They were happy to see employees spontaneously self-organizing to put their 20% time to good use. Some of those managers even participated themselves.
Patches were sent to many projects, avoiding threats to public security for years to come. However, we were only able to patch open source projects on GitHub that directly referenced vulnerable versions of Collections. Perhaps if the SF Muni software systems had been open source, we would have been able to bring Mad Gadget to their attention too.
Going forward, we believe the best thing to do is to build awareness. We want to draw attention to the fact that the tools now exist for fixing software on a massive scale, and that it works best when that software is open.
In this case, the open source dataset on BigQuery allowed us to identify projects that still needed to be patched. When a vulnerability is discovered, any motivated team or individual who wants to help improve the security of our infrastructure can use these tools to do just that.
By Justine Tunney , Software Engineer on TensorFlow
We’d like to recognize the following people for their contributions to Operation Rosehub: Laetitia Baudoin, Chris Blume, Sven Blumenstein, James Bogosian, Phil Bordelon, Andrew Brampton, Joshua Bruning, Sergio Campamá, Kasey Carrothers, Martin Cochran, Ian Flanigan, Frank Fort, Joshua French, Christian Gils, Christian Gruber, Erik Haugen, Andrew Heiderscheit, David Kernan, Glenn Lewis, Roberto Lublinerman, Stefano Maggiolo, Remigiusz Modrzejewski, Kristian Monsen, Will Morrison, Bharadwaj Parthasarathy, Shawn Pearce, Sebastian Porst, Rodrigo Queiro, Parth Shukla, Max Sills, Josh Simmons, Stephan Somogyi, Benjamin Specht, Ben Stewart, Pascal Terjan, Justine Tunney, Daniel Van Derveer, Shannon VanWagner, and Jennifer Winer.