Skip to main content
Skip table of contents

Big Query

Harbr BigQuery connectors leverage GCP Service Accounts to enable and authenticate the connections from the platform to BigQuery. Rather than a storage connector, that connects to a GCP or S3 bucket for example, a Big Query connector is referred to as a database connector.

Your GCP Service Accounts must have the following minimum permission to enable the connector and asset configuration:

  • BigQuery Data Viewer

  • BigQuery User

If you are creating assets against Big Query External Tables, the Service Account used in the connector must also be permissioned on the GCS Storage bucket location that the external table is built against.

To apply permissions to your service account navigate to your project in the Google Cloud Console and select IAM -> Permissions -> View by Principles -> <your service account> -> Edit -> Assign Roles.

The first step in creating this connector is to set up a bucket to store the data you intend to access.

Create the Connector

  1. Click Manage on the Navigation bar.

  2. Select Connectors to view the Manage Connectors screen

  3. Click the Create Connector button at the top right

  4. Enter a Name for your Connector and a Description (optional)

  5. Choose Choose Type > BigQuery

  6. Select and upload GCP Service Account Key File that will be used to manage access to your BigQuery project(s). See here for more details on how to set one up.

  7. Project ID will be populated automatically based on the file.

  8. Add any Integration Metadata needed for programmatic integration.

  1. Click Create. Connection test will run and if successful, will show Connection Test Status as Successful.

5. Click Close.

Execution Patterns

When BigQuery tables are exposed to the Harbr platform, different services within the asset / product usage life-cycle interact with them. The technical nature of these interactions are summarised below. 

Component

At-Source Asset

On-Platform Asset

Export

Data Copy Stored

No

Yes

Yes

Connector Type

Trino BigQuery Connector

Spark BigQuery Connector

Spark BigQuery Connector

BigQuery Query API Utilised

No

No

No

BigQuery Storage Read API utilised 

Yes

Yes

No

BigQuery Storage Write API utilised

No

No

No

Storage Optimisations

 

Minimise the volume of data read via the BigQuery Storage API through SQL operations **

Yes

LIMIT - Returns only the numbers of rows specified

WHERE - Returns only rows that match the where clause condition

COLUMN SELECT - Reads data content only for the columns specified in a query  

No

N/A

Service Account Permissions - Project Level Granularity

BigQuery connectors on the Harbr platform leverage GCP Service Accounts to enable and authenticate the connections from the platform to BigQuery. 

In addition to performing the Harbr platform actions to configure the connector and assets, the Service Accounts used must have the following minimum permission to enable the platform journeys.

Create Asset / Use Asset in Spaces Tasks 

The service account must have the permissions below at the GCP Project level: 

  • BigQuery Data Viewer

  • BigQuery User

Note: If you are creating assets against Big Query External Tables, the Service Account used in the connector must also be permissioned on the GCS Storage bucket location that the external table is built against.

 Export a Product / Asset to BigQuery

The service account must have the permissions below at the GCP Project level: 

  • BigQuery User 

To apply permissions to your service account navigate to your project in the Google Cloud Console and select IAM -> Permissions -> View by Principles -> <your service account> -> Edit -> Assign Roles and select the necessary roles. 

Service Account Permissions - Data Set level granularity

To permission your service account at the finer grained dataset level, additional config is required on top of the default Google IAM roles. The best practise for this config is described below.

Create Asset / Use Asset in Spaces Tasks 

Provide your service account BigQuery Data Viewer and BigQuery Data User permissions at the dataset level via either : 

  • Sharing the dataset with the service account via the Google Console (Console -> BigQuery -> Dataset -> Sharing -> Permissions -> Add Principle)

  • Applying an IAM condition on the project level BigQuery permissions of the form :

    • Condition Type = name

    • Operator = is

    • Value = projects/<project-id>/datasets/<project-id>.<dataset>

Additionally create a Custom IAM role with the permissions outlined below and apply this to your service account alongside the permissions above. Example Name : CustomRoleBQAssetCreate

  • bigquery.jobs.*

  • bigquery.readsessions.* 

Export a Product / Asset to BigQuery

In addition to the permissions required for asset creation above, to use a dataset restricted service account to export to BigQuery, add the permissions below to the service account via a new custom role (e.g. CustomRoleBQExport) or by extending read role above : 

  • bigquery.tables.create

  • bigquery.tables.update

  • bigquery.tables.updateData

  • bigquery.tables.delete (used to remove staging tables created during the loading flow)

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.