One of the headline new features in OpenAM 13 is support for Stateless Sessions , which allow for essentially unlimited horizontal scalability of your session infrastructure. This is achieved by moving session state out of the data store and placing it directly on the client as a signed and encrypted JWT . Any server in the cluster can then handle any request to validate a session token locally by simply validating the signature on the JWT and checking that the token has not yet expired (using the expiry timestamp baked into the token itself). Stateless sessions are not in themselves a new concept, and there are a handful of implementations out there. You may be thinking “Great! Where do I sign?”, but there has been an Achilles’ heel with stateless that has held it back from being truly production-ready ― how to handle logout.The general advice is that stateless logout is very hard or impossible . Well, we’re not afraid of a bit of hard work at Forgerock , so we decided to solve that problem. In this post I’ll tell you how we did it.
Why do we want to logout anyway?Before we get into the technical details, we should step back and ask why we need to handle logout anyway? Does it matter? If you care about security and usability, then the answer should be yes .
The purpose of a session cookie is to prevent a user having to re-authenticate for every single request they make toa system. Instead the user authenticates once and we provide them with a secure time-limited token that proves that they have authenticated. The user then simply presents this token with every request and the system then checksthe token is still valid. As an added bonus, we can also associate session state with that token, but this is not the primary purpose. The drawback of this approach is that if somebody manages to steal this session token then they can act as that user until the token expires.
Assuming that we cannot completely eliminate the possibility of token hijacking (and the history of computer security suggests that we cannot in a completely foolproof way), then we should take steps to limit the possible damage that could occur. The most obvious way is to limit the time-window in which an attacker can make use of the token:
Firstly,we should require re-authentication for any operation that allows the attacker to extend the time-window, for example changing the user’s password would allow indefiniteaccess to their account, so we require re-authentication for that (OpenAM allows any user profile attributes to be protected in this way). Secondly, we can shorten the expiry time on all session tokens, but this is a trade-off of security against user frustration at having to re-authenticate frequently and potentially losing work in progress. Finally, we can allow longer session expiry times, but allow a user to explicitly indicate when they are finished working and to invalidate the session in that case. We can also trigger invalidationif the session has been idle for a certain time period.This is the case that explicit logout addresses, and is the norm for most applications as it provides a more acceptable balance of security and usability. Stateless LogoutIn a stateful session architecture, logout is straightforward: we simply remove the session from internal storage and delete the cookie from the user agent. We can do the latter in a stateless architecture but if the cookie has already been stolen then this achieves nothing as we cannot tell tell that the cookie has been stolen.
We could place restrictions on the cookie, such as tying it to a particular IP address, but in a world of mobile clients it is not unusual for the IP address to change legitimately during a session as the client connects to different networks. (OpenAM does support this mode too, but this is primarily for protecting agent sessions).
It seems no matter what we try we end up needing some state on the server to support logout. But how much do we need? In the stateful model we store all active sessions on the server. One alternative would be to instead store all inactive sessions. Initially this may seem like a bad idea: we might expect the number of active sessions to stay roughly constant within some bounds, but surely the number of inactive sessions will grow unbounded over time? We can get around this if we make sure that our session tokens include the expiry time of the token and are tamper-proof (e.g., via a MAC or signature). Then we only need to store those sessions that have been logged out but have not yet expired, which will often be a much smaller set.
This is the approach that OpenAM takes. Logged out (but not yet expired) tokens are stored in the Core Token Service (CTS). In order to check validity of a session we validate the cookie signature, check the expiry time, and then check the session blacklist to make sure this token hasn’t been logged out.
Wait? Check the CTS? Doesn’t that provide a bottleneck that will limit scalability? Yes, but in practice we can push that limit up to a really high level.
Firstly, the CTS backend, OpenDJ , can support pretty large read rates in a clustered mode. Session blacklists are monotonic (once a session is blacklisted it is never unblacklisted). This means we can use multimaster replication to scale writes and achieve strong eventual consistency . In the worst case there may be a small time delay before all servers in the cluster know of a new blacklist entry, but this is usually a very short time window. Even in the case of network partitions all sides can continue accepting writes and reads (i.e., an AP system in terms of the CAP theorem ) ― after the partition heals we can simply take the union of the blacklist from each side to re-establish consistency without conflicts, and we should broadcast session logouts as far as we can in the meantime.
The CP alternative would be for both sides to stop processing requests for the duration of the partition to avoid processing a (stolen) session token that has been logged out on the other side. This would result in total loss of availability for all applications using the session service (i.e., likely an entire organization), which would be unacceptable to m