Azure Blob Storage

Azure Blob Storage backend for zones, enabling external tables over Azure-hosted data files

Category: cloud-storage

Description

## Overview Azure Blob Storage serves as a storage backend for DeltaForge zones. Rather than being referenced directly in SQL statements, Azure Blob Storage is configured as the underlying storage location for a zone. Files stored in a Blob container are then accessed through external tables that reference a zone-relative or local path. ## Configuration To use Azure Blob Storage as a storage backend, configure a zone whose storage path points to an Azure Blob location. The zone definition includes the storage account, container, optional prefix, and credentials. Once the zone is configured, data files within that container are accessed by creating external tables: ```sql CREATE EXTERNAL TABLE sales.transactions USING PARQUET LOCATION 'transactions/2024/'; ``` The LOCATION path is resolved relative to the zone's storage root in the configured container. The engine reads the matching blobs, dispatching to the appropriate format handler (Parquet, CSV, JSON, Avro, ORC, or Delta) based on file extension or the explicit USING clause. Multiple blobs are read in parallel across available CPU cores. Columnar formats such as Parquet benefit from range-read support, enabling projection and predicate pushdown to minimize data transfer. ## Authentication Three authentication methods are supported. Use exactly one: 1. **Shared key**: Provide the storage account access_key. This grants full access to the storage account and is suitable for development or private networks. 2. **SAS token**: Provide a sas_token scoped to the target container with read and list permissions. SAS tokens are time-limited and provide the least-privilege approach for production workloads. 3. **Service principal**: Provide tenant_id, client_id, and client_secret. The service principal must be assigned the Storage Blob Data Reader role (or equivalent) on the target container. This is the recommended method for automated and production deployments because it integrates with role-based access control (RBAC). All secret values (access_key, sas_token, client_secret) are stored in the secure credential vault and never persisted in plaintext or included in log output. ## Key Options - **account** (required): Azure storage account name. - **container** (required): Blob container within the storage account. - **prefix**: Narrows the zone root to a specific virtual directory within the container. - **access_key**: Storage account shared key for full account-level access. - **sas_token**: Time-limited Shared Access Signature token scoped to the container. - **tenant_id / client_id / client_secret**: Azure AD service principal credentials for token-based authentication with RBAC integration.

See Also

Open in interactive docs →   DeltaForge home →