Read proto3 SensorNetwork binaries across two factory floors and a warehouse (20 sensors, 82 readings) to practice exploding repeated readings, joining them as CSV, and preserving Timestamp well-known types.
CREATE ZONE IF NOT EXISTS external TYPE EXTERNAL;
CREATE SCHEMA IF NOT EXISTS external.protobuf_iot;
-- Exploded readings: 82 rows, one per SensorReading
CREATE EXTERNAL TABLE IF NOT EXISTS external.protobuf_iot.sensor_readings
USING PROTOBUF
LOCATION '/data/protobuf/sensors'
OPTIONS (
schema_path = '/data/protobuf/sensors/schema/sensor.proto',
message_name = 'iot.SensorNetwork',
proto_flatten_config = '{
"row_path": "sensors",
"explode_paths": ["sensors.readings"],
"include_paths": [
"sensors.sensor_id", "sensors.sensor_type",
"sensors.location", "sensors.status",
"sensors.readings.value",
"sensors.readings.recorded_at",
"sensors.readings.unit",
"sensors.installed_at"
],
"column_mappings": {
"sensors.readings.value": "reading_value",
"sensors.readings.recorded_at": "recorded_at",
"sensors.readings.unit": "unit"
},
"timestamp_format": "iso8601",
"max_depth": 10
}',
file_metadata = '{"columns":["df_file_name","df_file_modified","df_dataset","df_row_number"]}'
);
-- Summary: 20 rows, readings comma-joined per sensor
CREATE EXTERNAL TABLE IF NOT EXISTS external.protobuf_iot.sensor_summary
USING PROTOBUF
LOCATION '/data/protobuf/sensors'
OPTIONS (
schema_path = '/data/protobuf/sensors/schema/sensor.proto',
message_name = 'iot.SensorNetwork',
proto_flatten_config = '{
"row_path": "sensors",
"include_paths": [
"sensors.sensor_id", "sensors.sensor_type",
"sensors.location", "sensors.status",
"sensors.readings.value",
"sensors.readings.recorded_at",
"sensors.readings.unit",
"sensors.installed_at"
],
"default_repeat_handling": "join_comma",
"column_mappings": {
"sensors.readings.value": "reading_values",
"sensors.readings.recorded_at": "reading_times",
"sensors.readings.unit": "reading_units"
},
"timestamp_format": "iso8601",
"max_depth": 10
}'
);
-- Verify
ASSERT ROW_COUNT = 20 SELECT * FROM external.protobuf_iot.sensor_summary;
ASSERT ROW_COUNT = 82 SELECT * FROM external.protobuf_iot.sensor_readings;
ASSERT VALUE min_val = 22.5 WHERE sensor_id = 'TEMP-A001'
ASSERT VALUE max_val = 23.4 WHERE sensor_id = 'TEMP-A001'
SELECT sensor_id, MIN(reading_value) AS min_val, MAX(reading_value) AS max_val
FROM external.protobuf_iot.sensor_readings
WHERE sensor_id = 'TEMP-A001'
GROUP BY sensor_id;
-- Cleanup
DROP EXTERNAL TABLE IF EXISTS external.protobuf_iot.sensor_readings WITH FILES;
DROP EXTERNAL TABLE IF EXISTS external.protobuf_iot.sensor_summary WITH FILES;
DROP SCHEMA IF EXISTS external.protobuf_iot;
## When to Use Use this demo for IoT / telemetry style protobuf payloads: a parent `Sensor` message carries device metadata plus a `repeated SensorReading` array of time-value samples. It is the right template when you need both a "one row per sample" analytics table and a "one row per device" summary table from the same `.pb` files. ## What You Will Learn 1. Build two tables from the same set of `.pb` files - one exploded (`sensor_readings`, 82 rows) and one joined (`sensor_summary`, 20 rows) - without reshuffling the data. 2. Point `row_path` at the top-level repeated `sensors` field so every Sensor becomes a row. 3. Flatten nested `SensorReading.value`, `recorded_at`, `unit` fields into top-level columns via `include_paths` and `column_mappings`. 4. Convert `google.protobuf.Timestamp` (`installed_at`, `readings.recorded_at`) into ISO 8601 strings. 5. Track provenance across 3 facility files (`factory_floor_a.pb`, `factory_floor_b.pb`, `warehouse.pb`) using `file_metadata` columns. 6. Assert exact double-precision reading values (e.g. `22.5`, `7.8`, `0.5`) to verify `float`/`double` fidelity end-to-end. ## Prerequisites - Three `.pb` files plus `schema/sensor.proto` and `schema/timestamp.proto`. - An `EXTERNAL` zone (defaults to `external`). - Basic familiarity with `CREATE EXTERNAL TABLE ... USING PROTOBUF` options.