Read Microsoft Excel (.xls, .xlsx, .xlsm, .xlsb) and ODS spreadsheet files
## Overview The Excel data source reads Microsoft Excel spreadsheet files (.xls, .xlsx, .xlsm, .xlsb) and OpenDocument Spreadsheet (.ods) files. It supports sheet selection, header detection, cell range restriction, and schema inference from sampled rows. Excel files are registered as external tables using the `CREATE EXTERNAL TABLE` statement with `USING EXCEL`. DeltaForge reads multiple Excel files in parallel when the LOCATION contains multiple files. The `sheet_name` option selects which sheet to read from each workbook, and `file_filter` can narrow down which files to include from a directory. ## Usage Create an external table pointing to one or more Excel files. Use `sheet_name` to select a specific sheet. ```sql -- Create a zone and schema CREATE ZONE IF NOT EXISTS analytics TYPE EXTERNAL; CREATE SCHEMA IF NOT EXISTS analytics.excel_demos; -- Register Excel files with sheet selection CREATE EXTERNAL TABLE IF NOT EXISTS analytics.excel_demos.all_orders USING EXCEL LOCATION '/data/orders' OPTIONS ( sheet_name = 'Orders', has_header = 'true', infer_schema_rows = '1000', file_metadata = '{"columns":["df_file_name","df_row_number"]}' ); -- CREATE EXTERNAL TABLE auto-runs schema discovery, no separate DETECT SCHEMA needed -- Grant access GRANT ADMIN ON TABLE analytics.excel_demos.all_orders TO USER analyst; -- Query the data SELECT order_id, product_name, quantity, unit_price FROM analytics.excel_demos.all_orders; ``` Use `file_filter` to select a single file from a multi-file directory: ```sql CREATE EXTERNAL TABLE IF NOT EXISTS analytics.excel_demos.q1_report USING EXCEL LOCATION '/data/reports' OPTIONS ( file_filter = 'Q1_2024_*.xlsx', sheet_name = 'Summary', has_header = 'true' ); ``` ## Schema Detection Excel is not a self-describing tabular format in the same way as Parquet or Avro. DeltaForge infers column types by sampling the first N rows (controlled by `infer_schema_rows`, default 1000). Cell types (numeric, date, string, boolean) provide hints for inference, but mixed-type columns may require a larger sample size. CREATE EXTERNAL TABLE auto-runs schema discovery as part of registration, so column metadata appears in the catalog immediately and `information_schema.columns` is queryable right away. Re-run `DETECT SCHEMA FOR TABLE zone.schema.table` only when the spreadsheet structure changes (new columns, renamed headers, sheet replaced). ## Schema Evolution When reading multiple Excel files with different column sets, DeltaForge unifies all column schemas into a single combined schema. Columns that exist in some files but not others appear as NULL for rows from files that lack those columns. ## Key Options - **sheet_name**: Select which sheet to read by name (e.g., 'Orders'), index, 'First', or 'All'. - **has_header**: Whether the first row contains column headers (default 'true'). - **infer_schema_rows**: Number of rows sampled for type inference (default '1000'). - **file_filter**: Select a subset of Excel files from a directory using a glob pattern. - **file_metadata**: Inject system columns (df_file_name, df_row_number) for provenance tracking. - **range**: Restrict reading to a specific cell range (e.g., 'A1:Z1000'). - **empty_cell_handling**: Control how empty cells are treated ('AsNull', 'AsEmptyString', 'SkipRow').