BigQuery Integration Patterns & Permissions
Query Execution Patterns
When BigQuery tables are exposed to the Harbr platform, different services within the asset / product usage life-cycle interact with them. The technical nature of these interactions are summarised in the table below.
Component | At-Source Asset | On-Platform Asset | Export |
Data Copy Stored | No | Yes | Yes |
Connector Type | Trino BigQuery Connector | Spark BigQuery Connector | Spark BigQuery Connector |
BigQuery Query API Utilised | No | No | No |
BigQuery Storage Read API utilised | Yes | Yes | No |
BigQuery Storage Write API utilised | No | No | No |
Storage Optimisations : Minimise the volume of data read via the BigQuery Storage API through SQL operations ** | Yes LIMIT - Returns only the numbers of rows specified WHERE - Returns only rows that match the where clause condition COLUMN SELECT - Reads data content only for the columns specified in a query | No | N/A |
Service account permissions requirements - Project LEVEL granularity
Harbr BigQuery connectors leverage GCP Service Accounts to enable and authenticate the connections from the platform to BigQuery.
In addition to performing the Harbr platform actions to configure the connector and assets, the Service Accounts used must have the following minimum permission to enable the platform journeys.
Create Asset / Use Asset in Spaces Tasks
The service account must have the permissions below at the GCP Project level:
BigQuery Data Viewer
BigQuery User
Note: If you are creating assets against Big Query External Tables, the Service Account used in the connector must also be permissioned on the GCS Storage bucket location that the external table is built against.
Export a Product / Asset to BigQuery
The service account must have the permissions below at the GCP Project level:
BigQuery User
To apply permissions to your service account navigate to your project in the Google Cloud Console and select IAM -> Permissions -> View by Principles -> <your service account> -> Edit -> Assign Roles and select the necessary roles.
Service account permissions requirements - DATASet level granularity
To permission your service account at the finer grained dataset level, additional config is required on top of the default Google IAM roles. The best practise for this config is described below.
Create Asset / Use Asset in Spaces Tasks
Provide your service account BigQuery Data Viewer and BigQuery Data User permissions at the dataset level via either :
Sharing the dataset with the service account via the Google Console (Console -> BigQuery -> Dataset -> Sharing -> Permissions -> Add Principle)
Applying an IAM condition on the project level BigQuery permissions of the form :
Condition Type = name
Operator = is
Value = projects/<project-id>/datasets/<project-id>.<dataset>
Additionally create a Custom IAM role with the permissions outlined below and apply this to your service account alongside the permissions above. Example Name : CustomRoleBQAssetCreate
bigquery.jobs.*
bigquery.readsessions.*
Export a Product / Asset to BigQuery
In addition to the permissions required for asset creation above, to use a dataset restricted service account to export to BigQuery, add the permissions below to the service account via a new custom role (e.g. CustomRoleBQExport) or by extending read role above :
bigquery.tables.create
bigquery.tables.update
bigquery.tables.updateData
bigquery.tables.delete (used to remove staging tables created during the loading flow)