Supervised Learning-Based Approach Mining ABAC Rules from Existing RBAC Enabled Systems

Attribute-Based Access Control (ABAC) is an emerging access control model. It is the more flexible, scalable, and most suitable access control model for today’s large-scale, distributed, and open application environments. It has become an emerging research area nowadays. However, Role-Based Access Control (RBAC) has been the most widely used and general access control model so far. It is simple in administration and policy definition. But user-to-role assignment process of RBAC makes it non-scalable for large-scale organizations with a large number of users. To scale up the growing organization, RBAC needs to be transformed into ABAC. Transforming existing RBAC systems into ABAC is complicated and time-consuming. In this paper, we present a supervised machine learning-based approach to extract attribute-based conditions from the existing RBAC system to construct ABAC rules at the primary level and simplify the process of the transforming RBAC system to ABAC.


Introduction
The access control model defines how the access of system resources to users is controlled and how information about who is authorized for what is maintained.An access control policy is how the system knows who is authorized for what operations on system resources.
ABAC being flexible and context-sensitive satisfies today's application environments [13,14,15,16].It is based on the concept of the user, object, and environmental attributes of the system.Access to the system resources is controlled by the access rules defined over these attributes.These rules are the set of conditional expressions over the user, object and environmental attributes joined with "and" connectives.For example designation, age, experience, etc. can be the user attributes, department of an object can be object attribute while time and location of the user can be the environmental attributes.
The user, object, and environment attribute values at the time of request need to satisfy the conditions defined in the rule to acquire particular permission.For example, a policy rule can be stated as: "A permission to evaluate the grade-sheet of the student is granted to the user if and only if his Designation is Assistant Professor, Age >= 25 years, Experience >=3 years, his Department = CS and the time of the request is in between 10 am to 5 pm".Defining ABAC policy is a complex and time-consuming process.Constructing ABAC policy for newly designed applications is worthy.However, it is better if we can reduce the effort of constructing them for an existing application that has implemented RBAC as an access control model.
An earlier number of approaches [17,18,19,20,21,22] on role mining have been proposed.However, they do not address the problem of ABAC policy mining.Recently there has been little research carried out on the automatic mining of ABAC policies.These approaches utilize access control logs and the Access Control List (ACL) of the system to mine the ABAC rules.Techniques they have used are either association mining or unsupervised machine learning.Association mining can extract the patterns of the attribute values from the log entries and provide association among them.They do not extract conditional expressions with inequalities as shown in the above example.The unsupervised machine learning approach has issues related to clustering algorithms.It also results in mining a large number of roles.All of these approaches work better for categorical data.But, do applicable for discrete data.
In this paper, we propose a supervised machine learning approach mainly for the applications that need to transform the access control model from RBAC to ABAC.
The paper is organized as follows.Section 2 describes related work.Section 3 gives background knowledge and preliminaries for the RBAC and ABAC models.Section 4 explains our approach.Section 5 contains results and discussion.Section 6 concludes the paper.

Related Work
Recently little bit of research has been carried out on mining ABAC rules [19,23,24,25,26,27].Amani et al [28] have proposed an automatic approach to extract ABAC rules from event logs of business processes.They have used association rule mining to find associations of the subject, object, and environmental attributes.Association mining is applied on the event log to identify attribute relations using frequent and in-frequent item sets.ABAC rules are then constructed by using attribute relations.
Zhongyuan and Scott [27] have presented an ABAC policy mining algorithm using Access Control Lists (ACLs) and attribute data.The access control list is a set of user-permission relations as tuple < u, o, op > containing user, object, and operation.The algorithm defined by the authors iterates over such tuples selects some tuples as seeds to construct candidate ABAC rules and attempts to generalize the candidate rules by considering additional tuples in the ACL.The generated rules are merged, and simplified and highly-quality rules are finally considered for constructing the ABAC policy.
Carlos et al propose [29] the model to extract ABAC rules from sparse logs.This model is also based on association rule mining.The authors have highlighted the fact that most of the logs in practice contain less amount of information.The model invalidates overly permissive rules and reduces the excessive rules generated by the mining process.Overly permissive rules are the rules such assign permission to users that are undesirable according to security perspectives.Standard association mining algorithms fail to identify these rules.For this purpose, a new quality measure has been defined to guide the mining process.
An automated ABAC rule mining proposed by Matthew W Sanders and Chuan Yue [24] addresses the under-privilege and over-privileged issues related to access control.Over-privilege increases the risk of security in the system while under-privilege restricts users from performing their duties.The rule mining algorithm they have presented minimized the privilege errors of ABAC policies.They have presented the new algorithms, evaluation metrics, and optimization methods to optimize the large privilege space of ABAC policy.
Karimi et al. [23] proposed a methodology for extracting ABAC policy rules.They have used an unsupervised learning approach for mining policy rules from the system log.Transaction records of the system log are grouped into clusters.Each cluster corresponds to one access control rule.Features of the cluster are used to construct policy rules.As the approach is using a clustering algorithm, it has all the issues related to the clustering algorithm.Similarly, the approach fails to extract attribute conditions for non-categorical attributes.
Most of the approaches are based on association rule mining leading to the large number of rule extraction needing pruning methods to eliminate irrelevant rules.An unsupervised learning-based approach also results in inaccurate access rules due to the issues related to clustering algorithms.All of these approaches do not able to extract attribute conditional relations for discrete data attributes.For example, they can extract the conditions like {dept = CS} for categorical attributes.But, they fail to extract the conditions like {experience > 5} for discrete data attributes.The supervised learning approach we present can extract such conditions and works for both categorical and non-categorical attributes.
In the next section, we give some necessary background and definitions required to explain our approach.

Background and Preliminaries
In this section, we briefly give an overview of the concepts and definitions of the components of the RBAC and ABAC models.
RBAC works on the concept of role.The role is analogous to the user's job profile in an organization.For example cashiers, accountants, etc.A role is the set of permissions needed to carry out the software activities to perform the job.Permission is the element comprising of (o, op).For example; a user with permission (evaluate, grade-sheet) can evaluate the grade sheet of the student.The set of such permissions is assigned to a role and the set of roles is assigned to users.RBAC system maintains the information about permission assignment and role assignment in the permission-to-role assignment relation and the user-to-role assignment relation respectively.ABAC works on the concept of user, object, and environment attributes as described in the previous section.Table 1 gives the difference between RBAC and ABAC concerning the access control mechanism.Table 2 gives the symbols and their meanings used to describe the concepts of RBAC and ABAC models and the model we are presenting.With the above definitions and RBAC and ABAC concepts discussed, the next section describes our supervised machine learning approach for constructing ABAC rules.

The Proposed Supervised Machine Learning Approach
We present the mechanism to extract the attribute-based conditions from the existing RBAC system that are used to construct ABAC rules.It reduces the efforts of a system administrator while transforming the RBAC system into ABAC.Our approach uses the permission-torole assignment relation, user-to-role assignment relation, and transaction log for this purpose.Tables 3 and 4 show the sample representation of user-to-role assignment relation and role-to-permission assignment relation respectively.The roles of the RBAC system classify the users into various security classes.In ABAC, attribute-based rules classify the users into one of the security classes.Attribute conditions defined in the ABAC rule help to classify the users and decide whether the given permission can be granted to them or not.We have used the J48 decision tree machine algorithm to extract user, object, and environment attribute relation conditions required to construct the ABAC rule.Three separate decision trees are constructed to extract user, object, and environment attribute relation conditions respectively.

ABAC Rule Extraction Problem
The ABAC Rule extraction problem can be stated as: Given user-to-role assignment relation (UA), permission-to-role assignment relation (PA), and transaction log (L); extract attribute condition set Cua, Coa, and Cea and construct ABAC rules using them.
Data sets required for constructing decision trees are generated by using relations UA, PA, and system log entries.The next section depicts how these data sets are generated.

Generating Data Set for Machine Learning Process
Given the relation UA, a set of users to whom the role r is assigned is constructed.This set is constructed as follows: Table 5 shows the sample of Ur concerning the sample UA as shown in Table 3.
Given Ur, user attribute value data of all users, u ∈ Ur, is extracted by using function fu, and the data set with tuples as below is generated.Example 4: Suppose (John, Evaluator) ∈ Ur, then the corresponding tuple in Tu will be < Assistant Professor, Regular, 5, Evaluator > Assume Au = {a1, a2, a3, a4} are the overall user attributes in the system.Table 6 represents the set of tuples in Tu for the users to whom role R1 is assigned (Table 5).

Table 6. Sample Tu Dataset
Similarly, given the relation PA, object attribute value data for the objects o in p ∈ P is extracted using fo, and the data set with the tuples as below is generated.
Given the transaction log L, environment attribute value data is extracted from log entry, and data set with the tuples as below is generated.
Once the datasets are ready, decision trees are constructed to extract the attribute relation conditions for generating ABAC rules.The decision tree construction process is explained in the next section.

Extraction of Relation Conditions
The decision tree classifier is applied to the abovegenerated data sets and the three separate decision trees for the user, object, and environmental attributes are constructed.The role is considered a categorical attribute for building the user classifier while permission is considered a categorical attribute for building the object and environment classifiers.Relation conditions are then extracted from decision rules provided by decision trees.The steps of the process for extracting relation conditions are summarized in algorithm1.
In Lines 1, 2, and 3 of the algorithm, decision trees are constructed by applying a decision tree classifier on the training datasets Tu, To and Te respectively.Loop in lines 4 to 6 extracts user attribute conditions from decision tree TRu while loop in lines 7 to 10 extracts object and environment attribute conditions from decision trees TRo and TRe respectively.
Extracted attribute conditions can now be useful to construct ABAC rules.The next section explains the rule generation process.

Generating ABAC Rules
Given user, object, and environment attribute relation conditions, the ABAC rules are constructed as per the steps summarized in algorithm 2. for all ri ∈ R 5.
end for 7.
for all pi ∈ P 8.
Cea for all ri ∈ R 4.
end for 8.
end for 9. return RS 10. end procedure Algorithm 2 generates the rules at the primary level.Generated rules can then be pruned if necessary by the system administrator.Administrators can also eliminate unnecessary rules from the rule set.

Results and Discussion
To evaluate the effectiveness of our algorithm on discrete data attributes and to show how relation conditions with relational operators are extracted, we have used the synthesized data sets for UA, PA, and transaction log.Implementation of these data structures and the data fields in the transaction log record vary from application to application.We have used general formats to demonstrate the working of the approach.Where Rows in the table represent user information and the role assigned to him.A set of users with the same role forms a one-user class.For example; all the users to whom the role evaluator has been assigned belong to the evaluator class.The user information belonging to one user class forms a particular range(or set) of values for each data attribute.For example; user data values of the users belonging to the evaluator class will form one range(or set) of values for the data field experience while forming another range for other classes.It is difficult to extract such ranges from a large set of records manually.A decision tree helps to extract these data value ranges or conditions automatically.
The decision tree shown in Fig. 1 is constructed after the application of the decision tree classifier on the above data set with "Role" as a class index attribute.

Figure 1. Decision Tree Output
The decision tree rules shown in the decision tree of the figure provide the attribute relation conditions exactly similar to the conditions required to construct ABAC rules.The output of the decision tree classifier shows how conditions with inequalities for continuous attribute "EXP" have been extracted.One of the decision tree rules for the user class PS is as follows.

{DES = P} and {APP = R} and {EXP > 6} ⟶ {userClass = PS}
From the above decision tree rule, the following user attribute relation conditions are extracted to construct the ABAC rule for all the permissions belonging to role PS.

Cua[PS] = {{Des = P},{App = R}, {Exp > 6}}
Thus our supervised learning-based approach can extract relation conditions for discrete data attributes.The use of user-to-role relation for extraction of the rules provides an accurate user-role relationship compared to approaches that use system logs for rule extraction.Due to supervised learning, the number of rules extracted would be more accurate compared to the unsupervised learning-based approach.Table 8 gives the comparison of our approach with the approaches discussed in the related work section.
As the supervised learning algorithm works on data set containing class labels, the user classification done by it is more accurate compared to unsupervised learning.Accurate classification leads to generating accurate attribute conditions.The information contained in the system log may be incomplete leading to generating an inaccurate number of rules.Our approach is supervised learning based and uses user-to-role and permission-to-role relations along with system log data to generate a dataset for the learning process.Similarly, the use of a decision tree classifier makes our approach applicable for discrete data attributes.

Conclusion
We have presented a supervised learning-based approach to extract attribute relation conditions from the existing RBAC system to construct ABAC rules at the primary level to transform them into an ABAC system.We have used user-to-role relation, permission-to-role relation, and system log to extract the conditions.The approach is effective and can extract attribute relation conditions for both categorical and discrete data attributes.However, our approach does not consider the multi-value attributes of the system, and it is mainly useful for transforming existing RBAC systems into ABAC systems.

Definition 1 :
(User-to-Role Assignment Relation).A user-to-role assignment relation is defined as a set of tuples UA = {< u, r > | u ∈ U and r ∈ R}.A tuple < u, r > is that the role r is assigned to the user u.Definition 2: (Permission-to-Role Assignment Relation).A permission-to-role assignment relation is defined as a set of tuples PA = {< p, r > | p ∈ P and r ∈ R}.A tuple < p, r > is that the permission p is assigned to the role r.Definition 3: (RBAC Transaction Log).A transaction log of the RBAC system is a set of tuples L = < u, r, p, t, d >.Where t is a timestamp or any contextual information about the transaction like the location of the user performing transaction and d is the decision for the transaction (permit /deny).Example 1: < John, Evaluator, (evaluate, grade-sheet), t1, permitted > indicates the user John with the role Evaluator had acquired the permission (evaluate, gradesheet) at timestamp t1 Definition 4: (ABAC Rule).An ABAC rule is a set of conditions φ = {ci| ci = ai ⊙ vi }, where ai ∈ {Au ⋃ Ao ⋃ Ae} and ci is the relation condition for corresponding user, object and environmental attributes.It is interpreted as a conjunction c1 ˄ c2 ˄, …, ˄ cn.A rule gets satisfied if it evaluates to true.fu and fo are the user and object attribute value functions that return values for the user attributes, Au, and object attributes, Ao respectively.Vua and Voa are the set of attribute values such that Vua = { v | ∀ai ∈ Au , v = fu(ai, u) } and Voa = { v | ∀ai ∈ Ao , v = fo(ai, o) } Example 2: Suppose Au = {designation, appointment, experience}, fu(designation, John) = Assistant Professor, fu(appointment, John) = Regular and fu(experience, John) = 5 Then VJohna = { Assistant Professor, Regular, 5}.Definition 5: (User Attribute Conditions).User attribute conditions is a set of conditions, Cua = {ck | ck = ak ⊙ vk, ∀ak ∈ Au} Example 3: Cua = {{designation ∈ [Assistant Professor, Professor]}, {appointment ∈ [Adhoc, Regular]}, {experience > 3}} indicates that the value of the user attribute designation can be either Assistant Professor or Professor, an appointment can be either Contractual or Regular and experience can be greater than 3 years.Definition 6: (Object Attribute Conditions).Object attribute conditions is a set of conditions, Coa = {cm | cm = am ⊙ vm, ∀am ∈ Ao} Definition 7: (Environment Attribute Conditions).Environment attribute conditions is a set of conditions, Cea = {cq | cq = aq ⊙ vq, ∀aq ∈ Ae}

Table 2 .
Symbols to Describe the Concepts of RBAC and ABAC

Table 3 .
Sample User-to-Role Assignment Relation

Table 5 .
Sample Ur Relation

Table 7 .
Table 7 represents the sample user attribute value data we have used for experimentation.Entries in the table are the sample tuples generated by equation (2) as stated in the previous section.Sample Training Dataset of User Attribute