Amazon S3 storage backend for zones, enabling external tables over S3-hosted data files
## Overview Amazon S3 serves as a storage backend for DeltaForge zones. Rather than being referenced directly in SQL statements, S3 is configured as the underlying storage location for a zone. Files stored in S3 are then accessed through external tables that reference a zone-relative or local path. ## Configuration To use S3 as a storage backend, configure a zone whose storage path points to an S3 location. The zone definition includes the bucket, optional prefix, region, and credentials. Once the zone is configured, data files within that S3 location are accessed by creating external tables: ```sql CREATE EXTERNAL TABLE sales.events USING PARQUET LOCATION 'events/2024/'; ``` The LOCATION path is resolved relative to the zone's storage root. The engine reads the S3 objects matching the location, dispatching to the appropriate format handler (Parquet, CSV, JSON, Avro, ORC, or Delta) based on file extension or the explicit USING clause. Multiple files are read in parallel across available CPU cores. Because S3 supports range requests, columnar formats such as Parquet benefit from projection pushdown, reading only the columns and row groups required by the query. ## Authentication Authentication follows the standard AWS SDK credential resolution order: 1. Static access keys provided via the access_key_id and secret_access_key options. 2. Temporary credentials via session_token, obtained through STS AssumeRole or federation. 3. Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN). 4. EC2/ECS instance roles attached to the compute node. For production workloads, IAM roles attached to the compute infrastructure are recommended over static access keys. When static keys are provided, the secret_access_key is stored in the secure credential vault and never persisted in plaintext. ## Key Options - **bucket** (required): The S3 bucket name backing the zone. - **prefix**: Narrows the zone root to a specific key prefix within the bucket. - **region**: AWS region of the bucket. Auto-detected when omitted. - **access_key_id / secret_access_key**: Static credentials for authentication. - **session_token**: Temporary STS credentials. - **endpoint**: Custom URL for S3-compatible object stores (MinIO, Ceph, LocalStack). When a custom endpoint is specified, the connector uses path-style addressing automatically. - **allow_http**: Permits non-TLS connections. Use only for local development endpoints.