OCTET_LENGTH

Return the number of bytes in the input string's UTF-8 encoding.

Category: stringReturns: INTEGERDialect: Standard

Syntax

OCTET_LENGTH(str)

Description

## Overview Returns the number of bytes (octets) occupied by the UTF-8 encoding of the input string. Every ASCII character contributes 1 byte, Latin accented characters contribute 2 bytes, most CJK characters contribute 3 bytes, and characters outside the Basic Multilingual Plane (including most emoji) contribute 4 bytes. OCTET_LENGTH is the correct tool for estimating on-disk or on-wire size. Because CHAR_LENGTH counts code points rather than bytes, OCTET_LENGTH is always greater than or equal to CHAR_LENGTH. Equality holds only for pure ASCII strings. ## Behavior - Returns NULL when the input is NULL. - Returns 0 for an empty string. - Always equals BIT_LENGTH(str) / 8. - Identical to CHAR_LENGTH for ASCII-only strings; strictly greater for any input containing non-ASCII characters. - Applies to the UTF-8 canonical form of the string as stored by the engine. Alternative normalisation forms (NFC vs NFD) produce different byte counts. - Accepts VARBINARY and BYTEA types, in which case it simply returns the byte length of the binary value. ## Compatibility - Conforms to the SQL standard OCTET_LENGTH definition. - Assumes UTF-8 as the storage encoding. Other encodings are not supported as input strings.

Parameters

NameTypeDescription
strSpecifies the input string whose UTF-8 byte length is computed.

Examples

-- ASCII: 1 byte per character
SELECT OCTET_LENGTH('hello');  -- 5
-- Empty string
SELECT OCTET_LENGTH('');  -- 0
-- Single 2-byte UTF-8 character
SELECT OCTET_LENGTH('é');  -- 2
-- Mixed ASCII and multi-byte characters
SELECT OCTET_LENGTH('café');  -- 5
-- 4-byte emoji
SELECT OCTET_LENGTH('😀');  -- 4
-- Estimate storage in bytes for a text column
SELECT SUM(OCTET_LENGTH(description)) AS bytes_total
FROM retail.products.catalog;

Pitfalls

See Also

Open in interactive docs →   DeltaForge home →