Configure Databricks

Databricks can be used in two ways in our platform:

As an underlying data platform and cloud solution to store assets, share data through Delta Sharing and provide processing/compute for Export, Spaces and Query features.
Currently only Databricks on Azure is supported with Harbr.
As a standalone connector to load asset data and to export it out of the Harbr platform. Supported Harbr platform versions are:
- Harbr on Microsoft Azure
- Harbr on Amazon Web Services (AWS)
- Harbr on Google Cloud Platform (GCP)

Prerequisites

To connect Harbr and Databricks you will need:

a Databricks account, that has the following permissions:

Permission	Status
Clusters	CAN_MANAGE
Jobs	CAN_MANAGE
Cluster Policies	CAN_MANAGE
DLT	CAN_MANAGE
Directories	CAN_MANAGE
Notebooks	CAN_MANAGE
ML Models	Not required
ML Experiments	Not required
Dashboard	Not required
Queries	CAN_MANAGE
Alerts	CAN_MANAGE
Secrets	CAN_MANAGE
Token	CAN_MANAGE
SQL Warehouse	CAN_READ
Repos	CAN_MANAGE
Pools	CAN_MANAGE

Note that the above requirement may change depending on what Harbr features you intend to use. We recommend you consult with your Harbr contact before the configuration.
a Harbr account with the appropriate roles:
- Default user
- Organisation Admin
- Technician
Unity Catalog enabled on your Databricks Workspace. Unity Catalog is a unified governance solution for all data and AI assets including files, tables, machine learning models and dashboards.

Configure Databricks on Azure destination

Create a connector

Go to Manage > Connectors > Create a new connector
Choose Type > Databricks.
Type in:
- Your Databricks Host Url
  1. (e.g. https://dbc-a1b2345c-d6e7.cloud.databricks.com)
- Token
  1. See details on how to generate token (https://docs.databricks.com/en/dev-tools/auth/pat.html )
  2. Currently the only supported token type is Databricks personal token authentication
The details will be validated and we will display what capabilities the connector can perform in the platform with the given permissions

Capabilities	What it allows to do
catalog	list objects in a catalog Files in a filesystem Tables in a database / catalog Objects in a data catalog Models in a model store register: manage: create, update, delete, get info about assets within a catalog get metadata: get schema, metadata, data quality metrics, samples
access	creating and managing delta shares credentials get credentials create a new identity
jobs	manage spark job: create, get status, update, delete, get info
clusters	manage: create, get status, update, delete, get info
resources	manage object in the catalog manage an object in the cloud

Configure Organisation

Each organisation can be configured to use different a Databricks warehouse to store and process the data. To do this:

Go to Organisation Administration
Go to Metadata tab
Add an entry with the following key/value pair.
There are two main values to configure:
1. upload_platform: Azure connector that will be using for file processing
2. processing_platform: Databricks connector that will be used
You can get the connector unique identifier from the url:

key: harbr.user_defaults

value:

CODE

{
	"consumption": {
		"catalogs": [
			{
				"name": "",
				"id": "",
				"connector_id": "",
				"databricks_catalog": "assets",
				"databricks_schema": "managed",
				"databricks_table_name": {
					"naming_scheme": "PESSIMISTIC"
				},
				"default": true,
				"access": {
					"share": {
						"default_ttl_minutes": "1200"
					},
					"query": {
						"default_llm": "",
						"default_engine": ""
					},
					"iam": {
						
					}
				}
			}
		]
	},
	"upload_platform":
	{
	"connector_id":"yourconnectorid"
	},
	"processing_platform": 
	{
		"connector_id": "yourconnectorid",
		"default_job_cluster_definition": {
			
		}
	}
}

Your organisation is now setup to use Databricks.