Demos
Runnable end-to-end demos across Delta, Iceberg, graph, healthcare, EDI, geospatial, and every supported file format.
Avro
- Avro E-Commerce Orders: Logical Types & Nullable UnionsRead 80 e-commerce orders across two Avro files with mixed codecs, exercising date/timestamp-millis logical types, nullable unions, and integer-cent...
- Avro Insurance Claims: Schema Evolution Across File VersionsRead 90 insurance claims across three Avro files where v2 adds two new fields, validating automatic NULL-filling on v1 rows, file_filter, max_rows...
- Avro IoT Sensors: Smart Building Telemetry at ScaleRead 2,500 sensor readings from 5 building-floor Avro files with schema evolution (v1→v2), mixed codecs, file_filter, max_rows, and...
CSV
- CSV Advanced Options TestbenchEight CSV external tables, each exercising one parsing option (delimiter, null_value, comment_char, skip_starting_rows, max_rows, trim_whitespace...
- Northwind Trading CompanyClassic Northwind sample database as 11 semicolon-delimited CSV external tables (830 orders, 2,155 line items) -- joins, aggregations, shipping...
- Sales Schema EvolutionSingle external table reads five quarterly CSVs (2024-Q1 to 2025-Q1) whose columns evolve over time; missing columns surface as NULL and...
- Veterinary Clinic Patient RecordsThree-branch veterinary clinic with 75 visits across branch-north/south/east subdirectories; demonstrates recursive scanning, file_filter wildcards...
Delta
- Change Data Feed -- Row-Level Change TrackingEnable Change Data Feed to emit per-row insert / update_preimage / update_postimage / delete records so downstream systems can consume only the...
- CHECK Constraints and Table Properties LifecycleDeclare CHECK constraints inside TBLPROPERTIES and prove they survive every UPDATE, DELETE, and re-INSERT -- not just the initial load.
- Delta Basics -- Create, Insert, Update, DeleteWalk through every CRUD operation on a Delta table and watch the transaction log create new versions you can time-travel back to.
- MERGE Patterns -- Slowly Changing Dimension Type 2 (SCD2)Preserve the full audit trail of dimension changes by expiring current rows with MERGE and inserting new versions in a single two-pass pattern.
- OPTIMIZE then VACUUM -- The Delta Maintenance PlaybookFix the small-files problem caused by daily micro-batch ingestion by compacting with OPTIMIZE and then purging orphaned versions with VACUUM, while...
- Partitioning and Dynamic Partition PruningPartition a Delta fact table by region and watch the engine skip entire partition directories when a JOIN or subquery filter on the dimension side...
- Schema Evolution -- Add Columns and Backfill NULLsAdd new columns to a live Delta table with zero data rewrite, rely on implicit NULL fill for historical rows, and backfill only the records that...
- Time Travel and RESTORE -- Inspect Before You RecoverRecover from an accidental bulk UPDATE by time-travelling through versions, picking the last good snapshot, and restoring the table with a single...
EDI
- Advanced JSON Extraction: On-Demand Segment Access via df_transaction_jsonExtracts deeply-nested X12 segments ad hoc using SQL JSON functions, json_typeof, json_array_length, json_extract_path_text, jsonb_pretty, and the...
- EDIFACT Basics: UNB/UNH Envelopes Across DirectoriesParses 22 UN/EDIFACT and EANCOM messages (ORDERS, ORDRSP, INVOIC, CUSCAR, BAPLIE, PAXLST, PNRGOV, DESADV, PRICAT, IFTSTA, APERAK, CONTRL) spanning...
- HIPAA Claims Financial, 837 Charges JOIN 835 RemittanceCharge-to-payment reconciliation across three 837 healthcare claims (professional, dental, institutional) and one 835 remittance advice. Two...
- Order Lifecycle: Cross-Document 850/855/856/810/861 TraceabilityTraces a purchase order through the full order-to-cash cycle: PO creation (850) to acknowledgment (855) to shipment (856/857) to invoice (810) to...
- Repeating Segments & Loops: Indexed / Concatenate / ToJson ModesThree external tables over the same 14 X12 files show how DeltaForge materializes multi-occurrence segments (N1 party loops, PO1 line items)...
- X12 Supply Chain Basics: ISA/GS/ST Envelope + Materialized FieldsParses 14 X12 EDI transactions (850/810/855/856/857/861/997/824) into two external tables, a compact view exposing the ISA/GS/ST envelope and...
Excel
FHIR
- FHIR Clinical Observations: Vital Signs & Lab ResultsFlatten HL7 FHIR R5 Observation resources, heart rate NDJSON bulk exports, vital-sign panels, lab results, and blood pressure with nested...
- FHIR Clinical Records: Conditions, Procedures & AllergiesIngest three related HL7 FHIR R5 clinical resource types (Condition, Procedure, AllergyIntolerance) from a single directory, demonstrating...
- FHIR Medications: Prescriptions & CoverageIngest HL7 FHIR R5 MedicationRequest and Coverage resources, including contained Medication definitions, dosage instructions, dispense requests, and...
- FHIR Patient Demographics: BasicsIngest HL7 FHIR R5 Patient resources from bulk NDJSON and individual JSON files, flattening nested name, telecom, and address arrays into queryable...
- FHIR XML Clinical ResourcesIngest native HL7 FHIR R4 XML resources (Patient, Observation), handling the default FHIR namespace, @value-attribute pattern, repeating elements...
Graph
- Global Shipping Routes: Weighted Shortest Path and MSTRoute optimization on 25 world ports and 55 weighted shipping lanes: Dijkstra shortest path, BFS depth layers, minimum spanning tree, and PageRank...
- GPU Banking Fraud Analytics, 10M Accounts, 48M TransactionsGPU-accelerated fraud and influence analytics on a 10M-account banking network with 48M transactions: PageRank, Louvain, betweenness, triangle...
- Hospital Referral Network: Graph DML (INSERT, UPDATE, DELETE)Mutate a live graph via standard DML on its backing Delta tables, verify changes with Cypher, and re-run algorithms on the modified topology.
- Karate Club: Graph Basics, Ingestion, and Verified AlgorithmsLoad Zachary's Karate Club into Delta tables, build an undirected graph, and run Cypher patterns plus 15 algorithms with NetworkX-verified golden...
- Research Collaboration Network: Advanced Cypher PatternsUse advanced Cypher on a university collaboration graph (40 researchers, 170 directed edges, 4 relationship types): negative patterns, aggregations...
- Sales Territory Optimization: SQL + Cypher InteropMix Cypher and SQL in a single query: run graph algorithms via the cypher() table function, persist results into Delta, then JOIN with relational...
HL7
Iceberg
JSON
ORC
- ORC Clinical Trials: NULL Handling & CASE LogicExercise ORC's NULL bitmap on a heavily-nullable patient dataset using COALESCE, NULLIF, CASE, and string predicates.
- ORC Energy Meters: Advanced Aggregation at ScaleAggregate 1,500 smart-meter readings across 3 monthly ORC files using HAVING, COUNT DISTINCT, FILTER clauses, and CTE + window ranking.
- ORC Insurance Claims: Cross-Table JOINs & SubqueriesJoin two ORC-backed external tables (policies, claims) and prove INNER/LEFT/anti-joins, EXISTS, and IN-subquery semantics across ORC data.
- ORC Server Logs: Schema Evolution & Glob FilteringRead 5 HTTP access-log ORC files with two schema versions and prove schema evolution, LOCATION glob filtering, and file_metadata traceability.
- ORC Warehouse Inventory: Window Functions on Mixed NumericsApply ROW_NUMBER, RANK, LAG/LEAD, NTILE, and running totals to 100 rows of warehouse inventory stored across 2 ORC files with int32, float64, and...
Parquet
Protobuf
Pseudonymisation
- Pseudonymisation Apply: Clinical Trial De-identificationPharma de-identification workflow: register query-time rules on a clinical-trial table, verify transforms with SHOW and SELECT, and aggregate...
- Pseudonymisation Healthcare: HL7, FHIR, and EDI De-identificationExercise all seven transform types and all three linkability scopes across three healthcare formats: HL7 v2 ADT, FHIR R4 Patient, EDI HIPAA X12...
- Pseudonymisation Lifecycle: Insurance ClaimsInsurance-claims walkthrough of the full pseudonymisation rule lifecycle: CREATE five rules with PRIORITY, SHOW them, SELECT through them, DROP a...
- Pseudonymisation Quickstart: Banking KYCRegister four query-time pseudonymisation rules on a retail-bank KYC table, redact SSN, mask phone, keyed-hash last name, generalize date of birth...
Spatial
- GIS Emergency Response Network: Multi-Step Spatial AnalyticsNYC dispatch scenario combining st_distance nearest-neighbor ranking, multi-algorithm distance comparison, navigation bearings, and WKT polygon...
- GIS Maritime Shipping: PostGIS-Compatible Geospatial FunctionsExercises all 18 PostGIS-compatible st_* functions against 5 cargo vessels navigating 10 world ports with 40 GPS position reports, distances...
- H3 + GIS Delivery Optimization: Cross-Function Spatial AnalyticsCombines H3 hexagonal indexing with GIS distance/bearing functions across 3 warehouses and 15 stores to rank delivery routes and detect suboptimal...
- H3 GPS Fleet Tracker, 10K Pings, 21 H3 Functions10,000 deterministic GPS pings across 5 world cities, indexed with H3 resolution-9 cells and joined against polyfilled city polygons for O(1)...
- H3 Point-in-Polygon, 1M Row Geofencing PerformanceRide-share geofencing across 12 pricing zones and 1,000,000 driver GPS positions in 8 world cities, converts O(n × m) polygon intersection into O(1)...
XML