Big Query
Harbr BigQuery connectors leverage GCP Service Accounts to enable and authenticate the connections from the platform to BigQuery. Rather than a storage connector, that connects to a GCP or S3 bucket for example, a Big Query connector is referred to as a database connector.
Your GCP Service Accounts must have the following minimum permission to enable the connector and asset configuration:
BigQuery Data Viewer
BigQuery User
If you are creating assets against Big Query External Tables, the Service Account used in the connector must also be permissioned on the GCS Storage bucket location that the external table is built against.
To apply permissions to your service account navigate to your project in the Google Cloud Console and select IAM -> Permissions -> View by Principles -> <your service account> -> Edit -> Assign Roles.
The first step in creating this connector is to set up a bucket to store the data you intend to access.
Create the Connector
Click Manage on the Navigation bar.
Select Connectors to view the Manage Connectors screen
Click the Create Connector button at the top right
Enter a Name for your Connector and a Description (optional)
Choose Choose Type > BigQuery
Select and upload GCP Service Account Key File that will be used to manage access to your BigQuery project(s). See here for more details on how to set one up.
Project ID will be populated automatically based on the file.
Add any Integration Metadata needed for programmatic integration.
Click Create. Connection test will run and if successful, will show Connection Test Status as Successful.
5. Click Close.
Execution Patterns
When BigQuery tables are exposed to the Harbr platform, different services within the asset / product usage life-cycle interact with them. The technical nature of these interactions are summarised below.
Component | At-Source Asset | On-Platform Asset | Export |
Data Copy Stored | No | Yes | Yes |
Connector Type | Trino BigQuery Connector | Spark BigQuery Connector | Spark BigQuery Connector |
BigQuery Query API Utilised | No | No | No |
BigQuery Storage Read API utilised | Yes | Yes | No |
BigQuery Storage Write API utilised | No | No | No |
Storage Optimisations
Minimise the volume of data read via the BigQuery Storage API through SQL operations ** | Yes LIMIT - Returns only the numbers of rows specified WHERE - Returns only rows that match the where clause condition COLUMN SELECT - Reads data content only for the columns specified in a query | No | N/A |
Service Account Permissions - Project Level Granularity
BigQuery connectors on the Harbr platform leverage GCP Service Accounts to enable and authenticate the connections from the platform to BigQuery.
In addition to performing the Harbr platform actions to configure the connector and assets, the Service Accounts used must have the following minimum permission to enable the platform journeys.
Create Asset / Use Asset in Spaces Tasks
The service account must have the permissions below at the GCP Project level:
BigQuery Data Viewer
BigQuery User
Note: If you are creating assets against Big Query External Tables, the Service Account used in the connector must also be permissioned on the GCS Storage bucket location that the external table is built against.
Export a Product / Asset to BigQuery
The service account must have the permissions below at the GCP Project level:
BigQuery User
To apply permissions to your service account navigate to your project in the Google Cloud Console and select IAM -> Permissions -> View by Principles -> <your service account> -> Edit -> Assign Roles and select the necessary roles.
Service Account Permissions - Data Set level granularity
To permission your service account at the finer grained dataset level, additional config is required on top of the default Google IAM roles. The best practise for this config is described below.
Create Asset / Use Asset in Spaces Tasks
Provide your service account BigQuery Data Viewer and BigQuery Data User permissions at the dataset level via either :
Sharing the dataset with the service account via the Google Console (Console -> BigQuery -> Dataset -> Sharing -> Permissions -> Add Principle)
Applying an IAM condition on the project level BigQuery permissions of the form :
Condition Type = name
Operator = is
Value = projects/<project-id>/datasets/<project-id>.<dataset>
Additionally create a Custom IAM role with the permissions outlined below and apply this to your service account alongside the permissions above. Example Name : CustomRoleBQAssetCreate
bigquery.jobs.*
bigquery.readsessions.*
Export a Product / Asset to BigQuery
In addition to the permissions required for asset creation above, to use a dataset restricted service account to export to BigQuery, add the permissions below to the service account via a new custom role (e.g. CustomRoleBQExport) or by extending read role above :
bigquery.tables.create
bigquery.tables.update
bigquery.tables.updateData
bigquery.tables.delete (used to remove staging tables created during the loading flow)