Update an Asset

Configure an Asset Update

On-platform assets can configured to be updated on a schedule. By default, updates are toggled off and can be toggled on from the Updates tab on the Asset Overview page. Once turned on, a range of information will be supplied to ensure users can complete the configuration.

For cloud storage sources (e.g. S3, Azure, AWS S3):

Select whether the update action is Append or Replace. This is important to let us know how the new data is to be processed relative to the existing data.
Click Save and Enable Updates to proceed. This selection is not required for file assets as these can only be processed with a replace update action. Various processes will then take place
A distinct data location on the asset connector is created for future data loads
- A distinct folder is created on the asset connector for .tnf files to be added. These are the trigger files that initiate a data ingest job
- A distinct folder is created into which update job status files will be loaded following a successful or failed data ingest
- A polling mechanism is initiated which polls the .tnf location for the arrival of any trigger files
- A modal containing all of the above information and instructions on how to perform updates is presented

For data warehouse / database sources (e.g. Snowflake, BigQuery, Oracle ), select either to trigger a manual update or add a schedule for updates that will then run automatically.

For a databricks connector, see identity configuration instructions detailed below.

Important: Only assets that are created from connectors can be updated, meaning that those that are manually uploaded cannot have their data refreshed.

Initiate an Asset Update

Prepare your data

Upload the refreshed data content to a new subdirectory underneath the base location of the asset in the object store source (e.g. AWS S3, GCP Cloud Storage, Azure Blob storage)..

Note: This folder can have any name other than ‘singleload' which is a protected convention for the first data load when creating an asset in the UI.

Warning: The data loaded into the new sub-folder must follow the same structure as the original data load used to create the asset. This includes the same set of folders representing the tables in the asset, the same data format (e.g. Parquet) and the same schema for each table.

Trigger the ingest

Once refreshed data content has been loaded to the endpoint location in step 1, a update is initiated by copying a Transfer Notification File (TNF) into the /tnfs/ subdirectory of the asset connector location.

A TNF is a JSON text file containing the name of the newly created unique subdirectory containing the updated content - an example is shown below:

CODE

{"DataFolder":"newly_created_folder"}

where newly_created_folder is replaced by the newly created folder in the endpoint location (e.g. second_load).

The TNF file:

Must have the extension .tnf
Can be named using whichever convention you wish.
Must have the structure shown above.

The Exchange monitors the /tnfs/ subdirectories of repeating data assets on a regular basis (parameterisable frequency with a default of 5 minutes), and when detected, initiates a load of the refreshed data from the sub-directory specified in the “DataFolder” JSON field.

When the data load process is complete the provided TNF will be moved to the /tnfs_complete/ subdirectory of the data asset connector location with an additional field containing the status of the data ingest job, e.g.:

CODE

{"DataFolder":"newly_created_folder"
 "status":"success"}

Cloud storage event triggers can be set-up on the /tnfs_complete/ subdirectory of data asset to facilitate programmatic monitoring of data load job execution status. Additional JSON fields can be added to the provided TNF file to facilitate such processes as required as these will be persisted to the output file.

View Asset Updates

You can view the status of your data asset updates at any time. The history will always contain at least one line item for the initial data load that took place when the asset was first created. Any subsequent update jobs will be listed in the updates history table. This table contains:

Trigger file name - the name of the .tnf file that was discovered by the polling mechanism. This is what signals that data has been loaded and is available for ingest.
Time started
Time completed
Duration
Status
- Completed
- Failed
- In progress