Column Level Security BigQuery(GCP)
BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery column-level access control, you can create policies that check whether a user has proper access at query time.
For example, a policy can enforce access checks such as:
- You must be in group:high-access to see the columns containing TYPE_SSN.
To enhance column-level access control, you can optionally use dynamic data masking. Data masking allows you to mask sensitive data by substituting null, default, or hashed content in place of the column’s actual value.
Column-level access control workflow:
Example use case:
Consider an organization that needs to classify sensitive data into two categories: High and Medium.
To set up column-level security, a data steward with the appropriate permissions would perform the following steps to set up a hierarchy of data classification.
- The data steward creates a taxonomy named “Business criticality” in Data Catalog. The taxonomy includes the nodes or policy tags High and Medium.
- The data steward decides that the policy for the High node includes access for a group named high-tier-access.
- The data steward creates more levels of nodes in the taxonomy, under High and Medium. The lowest level node is a leaf node, such as the employee_ssn leaf node. The data steward can create a different access policy for the employee_ssn leaf node, or not.
- The data steward assigns a policy tag to specific table columns. In this example, the data steward assigns the High access policy to the employee_ssn column in a table.
- In the Current schema page of the console, the data steward can see the policy tag that governs a particular column. In this example, the employee_ssn column is under the High policy tag, so when viewing the schema for employee_ssn, the console displays the taxonomy name and the policy tag in the Policy tags field: Business criticality:High.
Roles used with column-level access control:
The following roles are used for BigQuery column-level access control.
The Data Catalog Policy Tag Admin role is required for users who need to create and manage taxonomies and policy tags.
Data Catalog Policy Tag Admin/datacatalog.policyTagAdmin
Applies at the project level.
This role grants the ability to do the following:
Exploring GCP Data Catalog Policy Tags:
This article explains the GCP feature: Data Catalog Policy Tags, and how it can help manage access control to sensitive columns in BigQuery tables based on data classes (email, financials, etc.).
Creating Policy Tags:
BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery Column-level security, you can create policies that check, at query time, whether a user has proper access.
First, go to ‘Data Catalog’ section, and click ‘Create and manage policy tags’.
Next, we need to create a Taxonomy which is essentially a logical group of tags, click ‘+CREATE’
Please note the naming convention requirement for the Taxonomy Name.
Please also note that the Taxonomy (and all tags within it) need to be specifically stored at a location (as of October 2020, only these 3 Locations are supported: US, EU and Seoul).
The Taxonomy and tags can only be applied to BigQuery tables residing in the same Location. In this case, we will create the Taxonomy and tags in the US location.
Next, we can create all the tags within this Taxonomy. The hierarchy of the tags could be very flexible, in the following example we are showing the structure of a tag like this:
Let’s have a look at a more realistic scenario of the tags Taxonomy for data classifications:
Click ‘Save’ to have the Taxonomy and tags created.
Applying tags to BigQuery data:
Let’s move to BigQuery and start tagging the data.
A table has been created with typical PII data like names, addresses, DOB etc.
Have a look at this table. Please note that it needs to reside in the same Location ‘US,’ where the Taxonomy and all policy tags were created.
A quick query reveals that there is one record.
Next, we can start tagging these fields click ‘Edit Schema.’ Check one or more fields then we will be able to click ‘Add policy tag.’ Please note that 1 field can only have 1 policy tag associated.
We can select a tag on the next page to be applied to the selected field. Again you will only be able to see the Taxonomy and its tags within the same Location.
After saving, we have successfully tagged one field.
Continue until you have tagged all fields which need to be restricted. Please note that different tags are used based on the nature of the data. This can help separate the access based on roles later on.
Now let’s try to make a query to the table again. What happened? We can still see the data!
It turns out that we haven’t enabled the Taxonomy and tags yet, let’s go back to the Data Catalog page and find the Taxonomy we have just created. Toggle ‘Enforce access control’ to be ON.
Go back to the BigQuery page and check the table schema. We can see the following notification.
In the Preview tab, we can only see the column and its content that is not restricted.
After waiting for around 30 minutes (so the Taxonomy and tags can take effect), if we now try to make a query to the table, we will see the following error message indicating that my login does not have permission to see the data which has been tagged.
To access the BigQuery data tagged, a user must be granted ‘Fine-Grained Reader’ permission on the Policy Tag.
Go to Data Catalog, then enter the Taxonomy that would like to grant permission to users.
Once the Policy Tag is selected, we could add a Member to the role ‘Data Catalog’ ->’ Fine-grained Reader.’ So now the user account ‘Leo. Zhang’ can access the BigQuery data, which was tagged with ‘C4’.
Let’s test it in BigQuery:
The error message has changed. This is because I am trying to query all the columns though I am only entitled to see the columns tagged with ‘C4’.
If we preview the table and we can see that BigQuery automatically hides the columns PostCode and Country that I am not entitled to see.
By explicitly excluding PostCode and Country columns, I can now query the table successfully!
’ Fine-grained Reader’ permission can be applied at different levels: Taxonomy, Policy Tags, and Child Policy Tags. This capability provides flexibility if we want to archive more granularity of the access control, e.g., Data Governance Admin could be granted at Taxonomy level. In contrast, Data Analyst can be granted access for data classification C1 and C2 but not C3 or C4.
Test in monitor mode
This is one of the best practices introduced by Google: https://cloud.google.com/bigquery/docs/best-practices-policy-tags#monitor_mode
Before enforcing access policies for your organization, you can run monitor-only mode. Monitor-only mode is where you are not yet implementing access control but are auditing the effects of your policy tags.
This best practice assumes:
- You already have a set of users authorized to access your data.
- You want to determine if enforcement of new column-level security changes would unexpectedly prevent those users from accessing data.
To use monitor only mode, create a taxonomy and policy tags, assign the policy tags to columns, but do not yet enforce access control. Then, have your previously authorized users continue to use the system. As they use the system, an audit trail is generated. You can scan the audit logs to view accesses to columns protected by policy tags. Determine if any of the accesses were unexpected. That is, whether the access would have resulted in a PERMISSION_DENIED error, had the policy been enforced. After you are satisfied that the column-level security is properly set up, enforce access control. Note that you won’t see PERMISSION_DENIED errors unless access control is enforced.