What is Google Zanzibar?

Google Zanzibar is a white paper that describes Google's authorization system for handling authorization for its vast number of users and services. Today, it is still an extremely popular term in the IAM space, being used almost synonymously to describe fine-grained authorization, just as Role Based Access Control (RBAC) is used to describe authorization systems.

Renowned for its distributed, scalable, and consistent architecture, Google Zanzibar allows developers to model their permissions as data in a graph and provides a natural solution for implementing Relationship-Based Access Control (ReBAC). Granular, innovative yet resource hefty and complex, Google Zanzibar has many pros and cons. In this blog, we’ll cover what Google Zanzibar is, when, and how you should implement it to solve your application’s authorization needs.

Let’s start by discussing how and why Google required developing a solution like Zanzibar -

Why did Google build Zanzibar?

Unified Identities

Google's broad, diverse ecosystem encompasses a multitude of distributed applications serving various purposes, including B2B, B2C, and advertising platforms (Think of a YouTuber who manages their own channel, has a specific type of access to a ‘Google Drive’ based storage through their workplace, and can edit the reviews they posted on Google maps, all using a single identity).

As you can probably figure, this poses a significant application-level authorization challenge. With all these applications relying on a single unified identity system, ensuring that users and services have the appropriate permissions and access privileges becomes an almost ridiculously complex endeavor.

Not Just Allow/Deny

Another problem that Google faced in the permissions field was the requirement to expand on basic allow/deny capabilities. While most authorization systems use binary decisions and manage the enforcement on top of them, Google products, by nature, require a more sophisticated way to search for resources based on permissions. Almost every touchpoint of a Google product has this aspect (Think about something as simple as checking who has access to a Google Doc).

Scaling Enormously

With these two problems in mind, Google also needed to provide a single, unified, robust solution for all development teams, with one critical requirement: working at scale.

Google processes over 10 million authorization requests per second and must do so without compromising on application performance or security (because developers will just "skip" the authorization service).

This was what led Google to the decision to develop one system to rule them all.

Before we discuss Google’s solution, let's examine the two other main approaches to solving access control challenges.

ACLs vs. RBAC in Fine-Grained Authorization

For many years, Access Control Lists (ACLs) were the go-to permission management solution in distributed systems (such as network devices). With its high level of accuracy (as every permission in the system is explicitly written in a list), an ACL provided an easy way to verify user access safely.

ACLs are also great at handling highly granular authorization policies, as every single object in the system is expressed in the list along with its permissions.

The thing is, with increasing distribution and ever-growing amounts of data, ACLs became almost impossible to distribute, maintain, or compute. While it might be possible to find a model that can centralize and distribute them properly, maintaining it in a modern application, especially on the Google scale, becomes impossible.

ACLs fail to provide users and developers with experiences. For developers, this lack stems from the impossibility of maintaining them in a centralized fashion. For users, when you want to let them manage a multitude of permissions (Which is often the case), ACLs are, at the same time, too simple in functionality and too complex to configure and maintain.

The solution for this complexity? Role-based access Control (RBAC). With its focus on the user (or subject), RBAC makes permissions easy and straightforward to maintain and configure. What RBAC loses in the process is granularity (that comes with a cost of complexity and expensive memory usage). That means Google needed a way to combine the granularity of ACLs and the simplicity of RBAC and make them work in their scale. If you are familiar with ACLs, you probably know this challenge is basically impossible.

From List to Graph

One of the main challenges of scaling ACLs is their name - it’s a list. While lists are a very reliable and corrected data structure, they are highly inefficient, especially when it comes to search and traversal operations. For this reason, Google decided to go for a more search-friendly structure, choosing to store ACLs in a graph.

Instead of flat, horizontal lists of permissions per user or per object, Zanzibar creates a node for every subject and object in the system, with edges declaring their relationship.

If we look at an example (This is a tree abstraction of permissions graph):

User Bob and Bob's Files folder are nodes, while the Owner relationship between them is the edge, which determines the relationship (and thus the level of permissions) between the two.

With this graph in mind, Google Zanzibar also allows further utilization of these relationships between objects to derive relationships—if Bob is the owner of Bob's Files, and the folders Bob's Pics and Bob's Docs are inside that folder, Bob will be automatically designated as the owner of these two folders and the files within them as well.

This means we don’t have to directly assign 1.jpg to Bob—the traversal path will allow us to derive an Owner (or other) role for Bob based on the relationships defined in the graph.

By replacing lists with traversal graphs, we can efficiently write and search permissions data. It also allows us to derive much more complex and granular roles that are perfectly tailored to handle hierarchies. Based on the relationships between objects, this system is often described as Relationship-Based Access Control (ReBAC).

Google Zanzibar Standards

From the implementation and consumption perspective, the Google Zanzibar paper assumes two main methods that define a Zanzibar system: Functions and Tuples.

Functions

Functions are the basic, intuitive methods that a user would use in a graph. These include:

Read specific objects in the graph
Write new nodes and edges
Watch for changes in configuration and data
Check permissions for a user or a list of users' permissions using traversal search.
The Google Zanzibar check function uses the usual Subject, Action, Resource arguments to return user permissions (is Bob (subject) allowed to write (action) a document (resource)). The only difference between this “Analytical” approach to the check function and a standard one is that while a usual check function returns a binary answer, Zanzibar focuses on the role (Like owner or Viewer, which is the edge) and not the action. This gives each check function a broader context of meaning when it comes to making authorization decisions. This gives the Check function the ability to filter a list based on policy requirements rather than having a single decision. For example, it can return all the resources that Bob is allowed to access.
Some Zanzibar implementations (such as Permit) allow performing a check based on the action itself and analyzing it as a policy decision.
Expand (a query) provides us with a traversal decision tree which can help with auditing and monitoring it.

Tuples

To efficiently describe the graph and utilize these functions, Zanzibar includes a terminology/method called Tuples. Tuples are conventions we use to describe resources:object, object#relation, and object#relation@user relationships.

Using these tuples, Google Zanzibar creates a language that makes working with the policy graph easy for all its consumers. For example, here’s how we would represent Bob’s ownership of a finance_reports folder: folders:finance_report#owner@Bob. Tuples help the check function be highly readable without “breaking” each section into attributes and non-textual structures. Familiarizing yourself with tuples can help with the readability and configuration of a Zanzibar-based system.

Challenges of Using Google Zanzibar

By now, you should have a clear understanding of how Google Zanzibar works and the benefits Google gained from creating such a system. Despite these advantages, let’s try and understand some of the potential shortcomings of this system.

In short, deploying a Zanzibar-based graph introduces a sizable and complex system into your cloud environment, often necessitating reliance on a hosted service. This dependency can instigate latency concerns and further scaling challenges.

The Centralized Graph Problem

While using a graph is a very efficient method of querying permissions, it has one basic major issue: It must be centralized to provide verified, correct, and reliable responses.

Google, according to some reports, has distributed the same replicas of 2 trillion record graphs across the globe. That’s something you can do when you basically "own" the internet, but for the average user, storing and maintaining multiple copies of such a large graph could be a nightmare.

Some Zanzibar implementations solve this problem by sharding the graph’s data and using event-based consistency updates to ensure the reliability of policy decisions and queries.

An Opinionated System

As you've probably gathered so far, Zanzibar is a highly opinionated permissions system that puts fine-grained queries on top of other preferences, locking (or guiding) developers into their way of doing things. In most use cases, there are more requirements from an authorization system, such as its ability to analyze policy rules that consider attributes (With Attribute Based Access Control - ABAC) to the use of simpler roles to determine coarse-grained permissions.

Another issue with Zanzibar is its lack of extensibility over existing systems. If you already have a policy-based system implemented (such as an RBAC or any Policy-as-Code system), it becomes almost impossible to implement a whole Zanzibar system into your application and combine it with the existing system.

Considering attributes (ABAC) for making authorization decisions is the simple example that comes to mind here. It’s great that our Zanzibar-based implementation allows us to create authorization policies based on relationships, but once we need to define something like: “ Bob can have access to the Expense Reports folder if he is the Owner of it, during certain hours, and from a certain geo-location”, that’s something a purely Zanzibar-based system just doesn't allow.

With this in mind, we must remember that most developers find it hard to adapt to a completely new system, and bringing such an opinionated one into our application is a decision that must be considered carefully.

A common method of dealing with this complexity is building a Zanzibar-based permission engine as part of a comprehensive authorization layer that is also aware of other policy-based permission models.

Let’s expand on how that can be done -

Implementing Google Zanzibar

As mentioned before, one of the main challenges of implementing a Zanzibar-based system is combining it with an existing policy-as-code-based system. Permit.io uses Zanzibar as part of its extensive policy platform, allowing you to implement and utilize the benefits of a Zanzibar based system side by side with policy-as-code. Our implementation simplifies Zanzibar from two perspectives:

Using Permit means you don’t necessarily have to deal with building and maintaining a graph. Our combination of SDKs and a no-code UI allows you to consume a Zanzibar-based system without the hassle of managing it.
Permit users can use a combination of Zanzibar APIs with an existing role or attribute-based access control system.

Permit’s UI, combining RBAC, ABAC, and ReBAC policies all together in one interface. You can see how such policies are created in this detailed guide.

Configuring a Zanzibar-based system in Permit is relatively simple thanks to it being based on the following method of configuration:

You configure resources and their relationships. For example, a configuration of a folder resource can declare a child-parent relationship with a file resource.
Permit.io provides an abstraction layer that distinguishes between user-resource relationships and resource-resource relationships. User-resource relationships are configured as resource-level role assignments (John is assigned to the finance_folder as an owner), while resource-resource relationships use tuples to connect to each other (2024_report is child of the finance_reports folder). This distinction helps configure your graph easily while extending upon existing RBAC and ABAC models with Zanzibar-based ReBAC.
Query and policy analysis in Permit allows for the same level of analysis as actions. As check and query APIs in Permit are less opinionated than those in basic Zanzibar, they allow you to dynamically call filter queries and verification decisions in the same structure. This dynamic approach allows you to get the exact decision you need with every check function, allowing for even finer-grained permissions compared to opinionated Zanzibar implementations.

As you can see, Permit’s policy engine-agnostic structure allows you to consume graph data directly with a Zanzibar core implementation alongside a policy-as-code structure, providing a solution for RBAC, ABAC, and ReBAC. This detailed guide shows how Permit.io deals with the implementation of a Google Drive-like system.

Bottom Line: Do You Need Google Zanzibar?

While Zanzibar is sometimes considered almost synonymous with fine-grained authorization, its opinionated basic structure actually lacks granularity when it comes to handling things like attributes. ABAC is usually better for cases that need fine-grained ownership models, with the exception of those that require built-in implicit hierarchies. If hierarchies are a major part of your application’s authorization system, then Zanzibar is indeed a good choice. You can also combine the two by using tools like Permit.io, as it contains another abstraction layer that makes configuration and extensibility significantly easier.

Thus, the two main questions to ask yourself when considering Zanzibar are: