Google Cloud Storage

Google Cloud Storage backend for zones, enabling external tables over GCS-hosted data files

Category: cloud-storage

Description

## Overview Google Cloud Storage serves as a storage backend for DeltaForge zones. Rather than being referenced directly in SQL statements, GCS is configured as the underlying storage location for a zone. Files stored in a GCS bucket are then accessed through external tables that reference a zone-relative or local path. ## Configuration To use GCS as a storage backend, configure a zone whose storage path points to a GCS bucket location. The zone definition includes the bucket, optional prefix, project ID, and credentials. Once the zone is configured, data files within that bucket are accessed by creating external tables: ```sql CREATE EXTERNAL TABLE analytics.page_views USING PARQUET LOCATION 'page_views/2024/'; ``` The LOCATION path is resolved relative to the zone's storage root. The engine reads the GCS objects matching the location, dispatching to the appropriate format handler (Parquet, CSV, JSON, Avro, ORC, or Delta) based on file extension or the explicit USING clause. Multiple objects are read in parallel across available CPU cores. GCS supports range reads, so columnar formats such as Parquet benefit from projection pushdown where only the required columns and row groups are transferred. ## Authentication Two authentication methods are supported: 1. **Service account key file**: Provide the path to a JSON key file via the service_account_key option. The service account must be granted the Storage Object Viewer role (roles/storage.objectViewer) or equivalent on the target bucket. 2. **Application Default Credentials (ADC)**: When service_account_key is omitted, the connector uses the standard Google Cloud ADC resolution order: GOOGLE_APPLICATION_CREDENTIALS environment variable, attached service account on GCE/GKE, or the gcloud CLI default credentials. For production workloads, a dedicated service account with least-privilege permissions is recommended. The service account key file contents are read at connection time and are not copied or cached beyond the credential resolution step. ## Key Options - **bucket** (required): The GCS bucket name backing the zone. - **prefix**: Narrows the zone root to a specific object name prefix within the bucket. - **service_account_key**: File system path to a Google Cloud service account JSON key file. - **project_id**: Google Cloud project that owns the bucket. Auto-detected from credentials when omitted.

See Also

Open in interactive docs →   DeltaForge home →