Executes a query or scans a table, then runs the type inference engine on the result columns to detect semantic data types, report confidence scores, and generate CAST expressions for type-safe transformations.
INFERTYPE <table_ref> [LIMIT <n>]
INFERTYPE SELECT ...
## Overview INFERTYPE profiles query result columns to detect their semantic data types. Raw or imported data often arrives as STRING columns. INFERTYPE analyzes sample values to determine the actual data type (e.g., INTEGER, DATE, TIMESTAMP, BOOLEAN, DOUBLE) and generates the CAST expressions needed to convert columns to their inferred types. ## Behavior Execution proceeds in five phases: 1. **Execute Query**: The inner SQL (or generated SELECT * FROM table LIMIT n) runs through the full execution pipeline. The result set is captured. 2. **Extract Samples**: String representations of column values are extracted from the result batches. NULL values are skipped. 3. **Type Inference**: The type inference engine (from delta-forge-connectors) analyzes each column's sample values independently. It tests values against type patterns (integer, float, boolean, date, timestamp, etc.) and selects the best-fit type with a confidence score. 4. **Build Result Grid**: A per-column analysis result set is constructed. 5. **Generate Transform SQL**: A complete SELECT statement with CAST expressions is generated and included in the result message. ### Result Columns The result set contains one row per column with the following columns: - **column_name**: The original column name. - **original_type**: The Arrow data type of the column as returned by the query. - **detected_type**: The inferred semantic type (e.g., INTEGER, DATE, DOUBLE). - **confidence**: A confidence score between 0.0 and 1.0 indicating how well the sample values match the detected type. - **sample_values**: Representative sample values from the column. - **transform_expression**: The CAST expression to convert the column (e.g., CAST(col AS INTEGER)). - **column_sql**: The full column expression for inclusion in a transform SELECT. ### Transform SQL The execution result message includes a generated SELECT statement that applies all CAST expressions. When a source_ref is available (table reference shorthand), the FROM clause references the original table. This generated SQL can be used directly as the basis for a typed staging-to-silver transform. ## Compatibility INFERTYPE is a DeltaForge extension. It leverages the same type inference engine used by CSV, JSON, and Excel connectors during data import. The inference rules are consistent across all import and profiling paths.
| Name | Type | Description |
|---|---|---|
inner_sql | SELECT statement or table reference. | |
source_ref | Original table ref (set for shorthand form). |
-- Infer types for all columns in a raw table
INFERTYPE staging.raw.customers;
-- Infer types with a custom sample size
INFERTYPE staging.raw.customers LIMIT 5000;
-- Infer types from a filtered query
INFERTYPE SELECT * FROM staging.raw.events WHERE region = 'US' LIMIT 500;
-- Infer types for a specific set of columns
INFERTYPE SELECT order_date, total_amount, zip_code
FROM staging.raw.orders
LIMIT 2000;