The Infrastructure Team at Coinbase has the goal of enabling any engineer in the company to quickly and securely access and deploy complex infrastructure. This effort started with our secure deployment pipelineCodeflow, was extended by our codification tooling GeoEngineer , and utilized by our blockchain infrastructure project Snapchain .
Our latest project to empower engineers was to make it easy and safe to elevate their own permissions temporarily to perform complex infrastructure changes.
Everything that engineers do at Coinbase is locked down by a mechanism that implements consensus. In order to interact with any production environment you must have a quorum of engineers approve the permissions, code, and configuration. This creates strict guardrails around making changes to our production environments along with an audit trail. This also enables us to secure customers funds with confidence.
Our philosophy of consensus also applies to access to critical services such as AWS and GitHub since our production services depend on them. In the past we have manually onboarded employees onto such services with consensus and an audit trail. Manually provisioning accounts to services has been easy for us to do until this year. In 2018 Coinbase has experienced incredible hypergrowth growing from 200 to almost 600 employees. This means that the number of employees joining per week has increased dramatically. Manually provisioning accounts resulted in operational toil. This is an obvious place for us to eliminate toil through automation.
We have built a Single Sign On (SSO) system that fulfills our consensus philosophy by protecting all changes to a user’s permissions via consensus to eliminate this source of toil. The system that we built had the following requirements to meet our high security and productivity standards:Reduce the manual toil to maintain user accounts through centralized management Full codification of users’ permissions Audit trail of users’ permissions over time MFA for all authentication, ideally push based Highly available and 12 factor , allowing for blue/green deploys Minimal surface area for vulnerabilities Help us scale 10x more engineers to 10x more critical services with ease Work with our current workflows e.g. `assume-role`
To build this identity provider (a service that authenticates users on behalf of other services) we use a combination of SAML, LDAP, and consensus.
SAML (Security Assertion Markup Language) is the defacto enterprise SSO protocol. It is used to send cryptographically signed assertions about a principal (ie. their permissions) to service providers like AWS and GitHub. These assertions are used to authorize users into their platform. SAML profiles describe the different request-response protocols that identity providers and service providers can use to communicate with each other. SAML bindings describe which lower level communication and messaging mechanisms are used in the steps of SAML profile specifications.
LDAP is a tried-and-true directory service that is typically used to represent organizations in a tree-like structure. It also has secure native authentication mechanisms for users.
In order to understand how consensus is used to protect changes to users’ permissions, we will first explain how consensus is used at Coinbase.Consensus at Coinbase
Software development process at Coinbase utilizing consensus. ( Heimdall is licensed under CC BY-SA3.0 ).
In the software development process at Coinbase engineers can only deploy code to production environments that meet a specific set of checks and requirements. These checks and requirements are numerous but one of the key requirements is that all deployed git branches much be checked via consensus by a tool we wrote called Heimdall. This tool enforces an immutable git history that has ensured all commits have consensus.
The general software development process to deploy code to production environments is as follows:Engineer creates a pull request to a protected branch with immutable history (ie. master). N qualified reviewers engage in a code review process, where N is configurable on a per-repository basis. N depends on how sensitive the repository is or is not. After all qualified reviewers ensure that the code is of high quality they may approve the pull request (ensuring consensus). A webhook triggers to notify Heimdall that all commits of the pull request have consensus. The engineer merges the pull request into the git branch with immutable history. Heimdall marks the new merge commit with consensus. This ensures that all commits to the protected branch have consensus. The engineer attempts to deploy a commit to a production environment with our secure deployment pipelineCodeflow. Codeflow asks Heimdall if the commit has consensus. If and only if it has consensus the deploy initiates! The Single Sign On System
Architecture of the Single Sign Onsystem.
In our configuration of LDAP we have two directories ― users and groups.
The groups directory describes which groups users are a part of. Service providers use this to translate into permissions specific to that service.
When an engineer would like to elevate their permissions to a service they make a pull request to a repository that is used to build the groups directory. This repository is protected by consensus with Heimdall. This repository then updates the groups directory which is served from a read-only filesystem. The git commit history creates an audit trail which is one of our requirements for compliance.
The users directory contains information about users as well as their cryptographically hashed passwords. Users authenticate against this directory as well as MFA with a push notification from Duo Push .To allow LDAP to be blue/green deployed in a highly available mode by fulfilling the 12 factor requirement of having stateless instances we use the slapd-sql module for the user directory. We store the data in Postgres ( Amazon’s RDS ) instead of on dis