The Unspoken Tradoffs of Fine-Grained Authorization
- Share:
In a world full of data that is processed in distributed microservices systems, Fine-Grained Authorization (FGA) has become the go-to security approach to protect users and their data. With FGA, we are no longer just checking who the user is and giving them permissions by top-level role. Instead, we evaluate more complex conditions by analyzing data attributes, relationships, time, location, etc. By doing so, we get a capability that lets us make multi-dimensional authorization decisions that will allow us to authorize users in the finer resolution available.
Like any other domain in software engineering, understanding and examining its tradeoffs is a fundamental skill that every software engineer must develop. FGA also has its own tradeoffs that need to be wisely evaluated for a successful integration of such authorization service into our applications.
In this article, we will review a case study of implementing a fine-grained authorization use case, examine its possible tradeoffs, and discuss how to overcome them in your service.
Authorization Clean Code Patterns
Before discussing the details of those tradeoffs, letâs first understand the basic capability of every authorization decision and the clean code principles we need to follow when implementing any authorization service.
Keep Enforcement Functions Clean
Every design of an authorization system incorporates three main components:
- Schema/Policy - where we configure the rules we want to enforce
- Data/Engine - where the policy is evaluated with the relevant data to get a decision
- Enforcement - the code on our application that enforces the policy decision or filters data based on it.
To abstract it more, every policy decision - no matter how coarse or fine its permission - needs to answer the following question:
Can a user (or service or principal) perform a certain action on a specific portion of data or a resource?
One of the key principles of a clean authorization code is keeping this check function pure and consistent. The function should take three arguments: the principal (or user), the action they want to perform, and the resource they are attempting to perform the operation on. The function should then return the result of the policy decisionâallowing or denying accessâwithout embedding any logic about what is allowed or not within the application code itself. By separating the logic from the codebase, we avoid coupling the application to the authorization rules, which helps keep the code clean and maintainable.
If we start comparing multiple check functions and embedding logic into our application, it can quickly lead to a situation where authorization rules are scattered throughout the codebase, making them difficult to track, update, or maintain. This defeats the purpose of externalizing the policy engine in the first place.
Simplify and Reuse Conditions
Another important principle is to keep the policy rules simple and modular. We want to create conditions that are reusable across different parts of the application. In other words, we don't want to hard-code complex conditions tightly coupled to specific use cases. Instead, we want to decouple conditions to be inherited and reused in different contexts.
For example, if we have a condition that checks whether a user can access a particular resource, we want that condition to be stateless whenever possible and not manage states as part of the condition evaluation. Stateless conditions allow us to model the system and keep our policy code clean. The goal is to reduce dependencies between conditions so that they can be combined and nested without introducing unnecessary complexity. In the following section, we will see more practical examples of this rule.
Minimize Data Manipulation Before Policy Evaluation
A clean authorization system also minimizes the amount of data transformation before passing data to the policy engine. If we need to synchronize data between our application and the policy engine, we want that data to remain as close to the original application data as possible. This reduces the risk of data mismatches and helps maintain consistency between the application and the policy engine.
If an event or new data enters the system, the policy engine should work with that data in its original form as much as possible. This ensures that the policy engine evaluates the correct data without needing to apply excessive transformations.
Event-Driven Synchronization
Another best practice for externalizing authorization is to use an event-driven system. With event-driven synchronization, we can ensure that the data in the policy engine is always up to date with the latest application events. For instance, when a new user is created, or a new resource is added, an event is triggered that synchronizes this data with the policy engine. This ensures that the policy engine always works with the most current information, which is essential for making accurate authorization decisions.
Adhering to these principles will allow us to implement a clean, efficient, and scalable FGA system that will allow for flexibility as the system evolves.
Conditions vs. Relationships
With these principles in mind, we can start looking at tools that help us make FGA decisions and see that they typically propose one of two main approaches to evaluating the answer to this question: conditions and relationships. Letâs examine these.
Conditionsâalso known as attribute-based access control (ABAC)âinvolve taking the attributes of our data and building a schema of conditions on top of them. A simple example might involve attributes such as the user's location. For instance, we might say:
"If the user is located in the US, and the dashboard shows US data, then the user is allowed to access it."
Thatâs a straightforward condition, but we likely need more complex conditions in real-world applications. In every proper authorization policy engine (like Open Policy Agent), we can create nested conditions, inherit them, and so on. The main advantage of using conditions is that we can model stateless authorization systems. Whenever we provide a particular set of attribute values, the policy engine evaluates those values and returns a decisionâallowing or denying access.
But thereâs a downside: If we need to maintain data consistency on the authorization service, conditions can become less efficient. For example, if we donât know exactly where the user is located but know they are part of a group of employees in the US, we need to store that state. At this point, using conditions becomes less practical, making it harder to verify decisions before running them.
Relationships, on the other hand, provide a different approach to handling such data-influenced decisions. Instead of relying on conditions, we can model relationships between our data.
For example, in cases where we do not have the capability to evaluate conditions directly, we can use implicit grants of permission based on predefined relationships. This is the heart of relationship-based access control (ReBAC), and the trending Googleâs Zanzibar paper. Here, the idea is to assume that all data entities have some relationships with each other, forming a graph where nodes represent data and edges represent the relationships between them.
However, using relationships has its limitations, notably in terms of time and complexity. Traversing these relationships requires processing the state of the data, which takes time, especially as the data grows more complex.
Understanding the advantages and limitations of conditions-based engines (such as AWS Cedar and Open Policy Agent) and relationship-based engines (such as OpenFGA and SpiceDB) is the main consideration we should take when examining tradeoffs of our authorization service.
To help us make this tradeoff easier, some tools, like OPALâOpen Policy Administration Layerâcreate a layer of abstractions on top of these engines and let you combine conditions and relationships in an FGA service.
Checkout the OPAL project (and consider giving us a âď¸) on GitHub: https://github.com/permitio/opal
With OPAL, you can have a data engine on top of Open Policy Agent that lets you implement both conditions and relationships to achieve verification and correctness in the same authorization service.
In the following section, we will examine a use case that needs this combination of relationship and condition for a simple policy rule to dive deeper into the tradeoffs we need to consider on FGA services.
Case Study - FGA in Healthcare
In a recent consultation meeting, one of our customers at Permit.io, a large Health Maintenance Organization (HMO) serving millions of patients and thousands of practitioners, faced a challenge in implementing FGA. The goal was to create a policy rule determining whether a user could access a single patient visit by evaluating three critical conditions. These conditions need to consider the visit, the practitioner, and the diagnoses of the visit.
// An example object of a visit that we want to filter
{
"appointment_id": "123",
"practitioner_id": "123456",
"diagnosis": ["y4500"],
"concealed": false
},
From the functional perspective, the HMO has to filter a list of visits and allow only visits that adhere to the following three conditions:
- Visit Concealment: Whether the visit itself was not marked as "concealed" and therefore restricted from general access.
- Practitioner Concealment: Whether the practitioner associated with the visit is advertised in the system.
- Diagnosis Concealment: Whether all the diagnoses associated with the visit were not concealed and restricted from general access.
In short imperative code, we can describe this condition this way:
if (!visit.concealed &&
!visit.diagnoses.some(d => d.concealed) &&
visit.practitioner.advertised) {
return true
}
This relatively simple policy rule highlights the complexities of FGA, as it needs to balance all three conditions to control access appropriately. Modeling this rule shows how even a straightforward requirement can reveal the three key tradeoffs involved in implementing FGA.
It is important to mention, that we tried to keep the policy rule simple, and just allow all the users to view a visit if itâs stand in the requirements of the policy rule. In a real world, we will also want to examine user role, attribute, or relationship to allow access to visits.
Implementation Tradeoffs of Fine-Grained Authorization
The best way to understand the tradeoffs will be to demonstrate three implementation methods for the policy rule from the use case. You can find the methods we are showing here in the following Git repository: https://github.com/permitio/fga-tradeoffs
The code below uses Permitâs Terraform Provider, which allows you to model Open Policy Agent policy schemas using the HCL declarative language and Permitâs JS SDK to run the check function against OPAL's fine-grained policy decision point.
Tradeoff 1: Dirt Enforcement Code
The first approach involved handling each of three conditionsâvisit concealment, practitioner advertisement, and diagnosis concealmentâ as a simple flat condition. In this method, we will use the policy engine to create the following three conditions, separate from each other:
// Visit condition
{ "resource.concealment" : { "equals" : false } },
// Diagnosis Condition
{ "resource.concealed" : { "equals" : false } },
// Advertised Condition
{ "resource.is_advertised" : { "equals" : true } },
As you can see, the conditions we set do not enforce the rule we planned. The application code evaluates whether a visit should be accessible based on these conditions.
On the surface, this approach seems appealing due to its performance benefits. The logic is centralized within the application, meaning decisions can be made quickly without the need to query external systems or manipulate the diagnoses and practitioner data when syncing them in advance. However, this comes at a cost: the application code becomes cluttered with authorization logic, tightly coupling the business logic with access control rules.
As you can see in the check below, we had to âdirtâ our check function with another policy âruleâ that allows a visit only if the three conditions are equal to true.
const filterList = await Promise.all(
Visit.map((visit) => {
return permit.bulkCheck([
{ user, action, resource: {
type: "Visit".toLowerCase(),
attributes: { ...visit },
},
},
...visit.diagnosis.map((diagnosis) => ({user, action, resource: {
type: "Diagnosis".toLowerCase(),
key: diagnosis
},
})),
{user, action,
resource: {
type: "Practitioner".toLowerCase(),
key: visit.practitioner_id,
tenant: "default",
},
},
]);
})
);
return filterList.map((check) => !check.some((filter) => filter === false));
Maintaining such a system over time becomes increasingly difficult, as any changes to the authorization rules would require modifying core application logic. This violates the principle of keeping the application code clean and focused, ultimately leading to bloated and difficult-to-manage software.
Tradeoff 2: Complex Policy Code
A second approach externalizes this complexity, shifting the burden of authorization decisions to the policy engine. Here, the application delegates the decision-making to the policy engine, which evaluates whether the conditions for accessing a visit are met.
Here is an example of a Rego code that incorporates all the three conditions together:
package permit.custom
import data.permit.generated.conditionset
import data.permit.generated.abac.utils.attributes
import future.keywords.every
import future.keywords.in
allow {
input.resource.type == "visit"
is_null(object.get(input.resource,"key",null))
allowed_visit
allowed_visit_diagnoses
allowed_visit_practitioner
}
default allowed_visit_practitioner := false
allowed_visit_practitioner {
conditionset.resourceset_advertised_5fpractitioner with input.resource.type as "practitioner" with input.resource.key as attributes.resource.practitioner_id with input.resource.attributes as {}
}
default allowed_visit_diagnoses := false
allowed_visit_diagnoses {
every diagnosis in attributes.resource.diagnosis {
conditionset.resourceset_non_5fconsealed_5fdiagnosis with input.resource.type as "diagnosis" with input.resource.key as diagnosis
}
}
default allowed_visit := false
allowed_visit {
conditionset.resourceset_non_5fconsealed_5fvisit with input.resource.attributes.appointment_id as null
}
This code ensures the application code remains clean and is solely responsible for business functions, while the policy engine handles the complex authorization checks. This separation of concerns makes the application code more maintainable in the long term. However, the tradeoff lies in the complexity of modeling these conditions within the policy engine. Conditions like visit concealment, practitioner concealment, and diagnosis sensitivity must be carefully structured to ensure they work harmoniously. As the conditions become more complex and require more data or changes, maintaining and auditing the policy engine becomes more difficult.
Another key challenge of this type of condition is that we coupled data correctness (saved diagnoses status) with verified condition (should all be true). This coupling could lead to problems when we need to audit and verify policy decisions in a performant, sensitive system.
Additionally, the system may face performance concerns if the policy engine needs to handle increasingly complex conditions for each authorization decision.
Tradeoff 3: Data Manipulation
Understanding the following Graph example requires some basic understanding of Relationship-Based Access Control, and especially the role derivation principle. You can read more about it here
The third approach uses a graph to model the relationships between visits, practitioners, and diagnoses. In this graph-based model, nodes represent entitiesâsuch as visits, practitioners, or diagnosesâand the edges represent the relationships between them, like concealment or advertised status. The policy engine traversing this graph and by using derivation of role permissions determines whether access should be granted.
Using this approach, we will need to sync our data into the policy engine in advance. While this isnât bad, as we saw the advantages of relationship data correctness, it could be time-consuming and performance-sensitive.
This is an example of the graph we can build. It will use the relationships between the entities and a special BoolMark node to derive permissions only if all the related nodes allow the user.
Looking at the graph above, we can see two main challenges:
First, we must manipulate the data before sending it to the authorization service. Instead of just listening to events and storing the data, we need a transformer that assigns it the right relationships in the graph.
For example, the following transform need to run when syncing practitioner to the system:for (let diagnosis of Diagnosis) { await permit.api.resourceInstances.create({ resource: "diagnosis", key: diagnosis.id, tenant: "default", attributes: { ...diagnosis }, }); if (!diagnosis.concealment) { await permit.api.relationshipTuples.create({ object: "diagnosis:" + diagnosis.id, subject: "bool_mark:1", relation: "non_concealed_diagnosis", }); } }
Second, the redundant BoolMark node can make this model's maintainability more challenging when we need to scale it or add more condition checks.
While this graph could be a great solution for particular checks of visits, diagnoses, or practitioners, and it is also easier to perform simple check functions in our application, the need for conditional checks and data manipulation makes it more challenging to implement.
const permissions = await permit.getUserPermissions(
"fga_user",
["default"],
Visit.map(({ appointment_id }) => `visit:${appointment_id}`)
);
This code shows how simple the check function is. We must only pass the visit ID without explicitly passing the other entitiesâ IDs.
Considering the Three Tradeoffs
Each approachâwhether focusing on complex check functions in the application, externalizing conditions to the policy engine, or using a graph-based modelâhas its advantages and drawbacks. Choosing the right approach depends on the priorities of the system. Do we prioritize performance over maintainability? Or is it more important to keep the code clean and externalized, even if that means introducing complexity into the policy engine? For the HMO, these tradeoffs highlighted the intricacies of implementing FGA, showing that even a simple access rule requires careful architectural decisions to balance flexibility, performance, and maintainability.
Solving the tradeoffs with Foreign Keys in the HMO Case
Since the particular role required checking permissions in relationships that are always explicit in the visit object, we decided to consider a fourth approach that addressed the tradeoffs we encounteredâbetween complexity, performance, and maintainability. We implemented a solution using foreign keys that connected the various entities in the HMO system: visits, practitioners, and diagnoses. This approach simplified the modeling of FGa while ensuring flexibility and efficiency.
Instead of relying on complex graph traversal or embedding logic in the application, we leveraged foreign key relationships within the policy engine. Each visit referenced its associated practitioner and diagnoses using foreign keys, which the policy engine evaluated to determine access. This externalized the authorization checks, keeping the application code clean and focused on business logic.
The foreign key approach had several advantages. First, it maintained flexibility by allowing us to easily modify or extend authorization rules without affecting the core system. Second, it significantly reduced the overhead associated with data synchronization, as we only needed to manage individual entity relationships rather than complex graphs. This streamlined the process and enabled real-time decision-making without performance bottlenecks.
Here, you can see how (relatively) simple is to define the condition schema for the foreign key conditions we built into our Attribute-Based Access Control system:
{
"allOf" : [
{ "resource.concealed" : { "equals" : false } },
{ "resource.practitioner_id" : {
"object_match" : {
"match" : {
"is_advertised" : { "equals" : true }
},
"fk_resource_type" : permitio_resource.practitioner.key,
}
} },
{
"resource.diagnosis" : {
"all_match" : {
"match" : {
"concealment" : { "equals" : false }
},
"fk_resource_type" : permitio_resource.diagnosis.key,
}
}
},
],
}
Ultimately, the foreign key solution provided the right balance of simplicity, performance, and flexibility, enabling the HMO to securely manage sensitive data while keeping the system maintainable and scalable.
Conclusion
As seen in the HMO case, implementing FGA presents significant challenges, even with relatively straightforward rules. Modeling and enforcing a simple ruleâwhether a practitioner can access a visit based on three conditionsâuncovered complex tradeoffs involving performance, complexity, and maintainability.
At Permit.io, we provide an authorization service aimed at delivering the best developer experience for software developers. By simplifying schemas and data sources, we offer developers who want to create better authorization in their products an efficient and comfortable way to do so. You can try Permit for yourself with our generous free tier at: https://app.permit.io
We are also inviting you to continue your learning about Fine Grained Authorization in our Authorization Slack Community, where fellow authorization-savvy engineers share their insights and consult with others about similar topics. Join us now at: https://io.permit.io/slack
Written by
Gabriel L. Manor
Full-Stack Software Technical Leader | Security, JavaScript, DevRel, OPA | Writer and Public Speaker