DIFFERENCE

Return a 0-4 similarity score comparing the Soundex codes of two strings.

Category: stringReturns: INTEGERDialect: PostgreSql

Syntax

DIFFERENCE(str1, str2)

Description

## Overview Computes the Soundex codes of both input strings and returns an integer from 0 to 4 indicating how many of the 4 Soundex characters match. A score of 4 means the Soundex codes are identical (strong phonetic similarity); a score of 0 means none of the characters match. DIFFERENCE is useful as a quick, phonetic-based fuzzy-matching signal for English-language names. Combine DIFFERENCE with LEVENSHTEIN or a trigram-based similarity function when you need more robust matching. Soundex alone is insensitive to vowels, sensitive to the first letter, and tuned for English phonetics. ## Behavior - Returns NULL when either argument is NULL. - Returns an integer in [0, 4]. - Both inputs have their Soundex code computed first (see SOUNDEX). Non-letter characters are ignored during that step. - Empty strings produce a Soundex of '0000'; DIFFERENCE('', '') returns 4. - Case is ignored: Soundex lower-cases the input internally before encoding. - Tuned for English-language phonetics. Non-English names may produce misleading scores. ## Compatibility - Matches the common SQL DIFFERENCE semantics.

Parameters

NameTypeDescription
str1Specifies the first string for Soundex comparison.
str2Specifies the second string for Soundex comparison.

Examples

-- Identical Soundex codes produce 4
SELECT DIFFERENCE('Smith', 'Smyth');  -- 4
-- Similar-sounding names
SELECT DIFFERENCE('Robert', 'Rupert');  -- 3
-- Very different words
SELECT DIFFERENCE('apple', 'orange');  -- 1
-- Same string trivially scores 4
SELECT DIFFERENCE('hello', 'hello');  -- 4
-- NULL propagates
SELECT DIFFERENCE(CAST(NULL AS VARCHAR), 'hello');  -- NULL
-- Fuzzy-match candidate customers by name
SELECT c1.customer_id, c2.customer_id, DIFFERENCE(c1.last_name, c2.last_name) AS score
FROM retail.customers.profiles c1
JOIN retail.customers.profiles c2 ON c1.customer_id < c2.customer_id
WHERE DIFFERENCE(c1.last_name, c2.last_name) >= 3;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →