Paradime | SUBSTRING / SPLIT / CONCAT

SQL Keywords

SUBSTRING / SPLIT / CONCAT

Feb 23, 2026

min read

Category: String Functions

Platform Support:

✅ Snowflake | ✅ BigQuery | ✅ Databricks

Description

String manipulation functions for extracting, splitting, and combining text. SUBSTRING extracts portions of strings, SPLIT divides strings into arrays, and CONCAT combines strings. Essential for text processing, parsing, and data transformation.

Syntax by Function

SUBSTRING:

SUBSTRING(string, start_position, length)
-- OR
SUBSTR(string, start_position, length)

SUBSTRING(string, start_position, length)
-- OR
SUBSTR(string, start_position, length)

SUBSTRING(string, start_position, length)
-- OR
SUBSTR(string, start_position, length)

SPLIT:

-- Snowflake
SPLIT(string, delimiter)

-- BigQuery
SPLIT(string, delimiter)

-- Databricks
SPLIT(string, delimiter)

-- Snowflake
SPLIT(string, delimiter)

-- BigQuery
SPLIT(string, delimiter)

-- Databricks
SPLIT(string, delimiter)

-- Snowflake
SPLIT(string, delimiter)

-- BigQuery
SPLIT(string, delimiter)

-- Databricks
SPLIT(string, delimiter)

CONCAT:

CONCAT(string1, string2, ...)
-- OR
string1 || string2 || ...  -- String concatenation operator

CONCAT(string1, string2, ...)
-- OR
string1 || string2 || ...  -- String concatenation operator

CONCAT(string1, string2, ...)
-- OR
string1 || string2 || ...  -- String concatenation operator

Platform-Specific Notes

Snowflake:

SUBSTRING and SUBSTR are aliases
SPLIT returns ARRAY
SPLIT_PART for specific element
CONCAT works with multiple arguments

BigQuery:

SUBSTR is primary function
SPLIT returns ARRAY
Use ARRAY subscript [OFFSET(n)] or [ORDINAL(n)]
CONCAT has max 5 arguments (use || for more)

Databricks:

Standard Spark SQL functions
SPLIT returns ARRAY
Use bracket notation for array access
CONCAT supports multiple arguments

Example 1: SUBSTRING - Extract Parts of String

All Platforms:

SELECT 
    full_name,
    email,
    -- Extract first 3 characters
    SUBSTRING(full_name, 1, 3) as name_prefix,
    -- Extract from position 5 to end
    SUBSTRING(email, 5) as email_suffix,
    -- Extract domain from email (after @)
    SUBSTRING(email, POSITION('@' IN email) + 1) as email_domain,
    -- Extract area code from phone (555) 123-4567
    SUBSTRING(phone, 2, 3) as area_code
FROM

SELECT 
    full_name,
    email,
    -- Extract first 3 characters
    SUBSTRING(full_name, 1, 3) as name_prefix,
    -- Extract from position 5 to end
    SUBSTRING(email, 5) as email_suffix,
    -- Extract domain from email (after @)
    SUBSTRING(email, POSITION('@' IN email) + 1) as email_domain,
    -- Extract area code from phone (555) 123-4567
    SUBSTRING(phone, 2, 3) as area_code
FROM

SELECT 
    full_name,
    email,
    -- Extract first 3 characters
    SUBSTRING(full_name, 1, 3) as name_prefix,
    -- Extract from position 5 to end
    SUBSTRING(email, 5) as email_suffix,
    -- Extract domain from email (after @)
    SUBSTRING(email, POSITION('@' IN email) + 1) as email_domain,
    -- Extract area code from phone (555) 123-4567
    SUBSTRING(phone, 2, 3) as area_code
FROM

Sample Data (customers table):

full_name	email	phone
John Smith	john@gmail.com	(555) 123-4567
Maria Garcia	maria@company.co.uk	(212) 555-0100
Sarah Johnson	sarah@outlook.com	(415) 555-7890

Result:

full_name	email	name_prefix	email_suffix	email_domain	area_code
John Smith	john@gmail.com	Joh	@gmail.com	gmail.com	555
Maria Garcia	maria@company.co.uk	Mar	a@company.co.uk	company.co.uk	212
Sarah Johnson	sarah@outlook.com	Sar	h@outlook.com	outlook.com	415

Example 2: SPLIT - Parse Delimited Strings

Snowflake:

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT_PART(tags, ',', 1) as first_tag,
    SPLIT_PART(tags, ',', 2) as second_tag,
    ARRAY_SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT_PART(tags, ',', 1) as first_tag,
    SPLIT_PART(tags, ',', 2) as second_tag,
    ARRAY_SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT_PART(tags, ',', 1) as first_tag,
    SPLIT_PART(tags, ',', 2) as second_tag,
    ARRAY_SIZE(SPLIT(tags, ',')) as tag_count
FROM

BigQuery:

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[OFFSET(0)] as first_tag,
    SPLIT(tags, ',')[SAFE_OFFSET(1)] as second_tag,
    ARRAY_LENGTH(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[OFFSET(0)] as first_tag,
    SPLIT(tags, ',')[SAFE_OFFSET(1)] as second_tag,
    ARRAY_LENGTH(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[OFFSET(0)] as first_tag,
    SPLIT(tags, ',')[SAFE_OFFSET(1)] as second_tag,
    ARRAY_LENGTH(SPLIT(tags, ',')) as tag_count
FROM

Databricks:

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[0] as first_tag,
    SPLIT(tags, ',')[1] as second_tag,
    SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[0] as first_tag,
    SPLIT(tags, ',')[1] as second_tag,
    SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[0] as first_tag,
    SPLIT(tags, ',')[1] as second_tag,
    SIZE(SPLIT(tags, ',')) as tag_count
FROM

Sample Data (products table):

product_id	product_name	tags
1	Laptop	electronics,computer,portable
2	Desk	furniture,office,wood
3	Mouse	electronics,accessory

Result:

tags	tags_array	first_tag	second_tag	tag_count
electronics,computer,portable	["electronics","computer","portable"]	electronics	computer	3
furniture,office,wood	["furniture","office","wood"]	furniture	office	3
electronics,accessory	["electronics","accessory"]	electronics	accessory	2

Example 3: CONCAT - Combine Strings

All Platforms:

SELECT 
    first_name,
    last_name,
    -- Using CONCAT function
    CONCAT(first_name, ' ', last_name) as full_name,
    -- Using || operator
    first_name || ' ' || last_name as full_name_alt,
    -- Email generation
    CONCAT(LOWER(first_name), '.', LOWER(last_name), '@company.com') as generated_email,
    -- Address formatting
    CONCAT(street, ', ', city, ', ', state, ' ', zip_code) as full_address
FROM

SELECT 
    first_name,
    last_name,
    -- Using CONCAT function
    CONCAT(first_name, ' ', last_name) as full_name,
    -- Using || operator
    first_name || ' ' || last_name as full_name_alt,
    -- Email generation
    CONCAT(LOWER(first_name), '.', LOWER(last_name), '@company.com') as generated_email,
    -- Address formatting
    CONCAT(street, ', ', city, ', ', state, ' ', zip_code) as full_address
FROM

SELECT 
    first_name,
    last_name,
    -- Using CONCAT function
    CONCAT(first_name, ' ', last_name) as full_name,
    -- Using || operator
    first_name || ' ' || last_name as full_name_alt,
    -- Email generation
    CONCAT(LOWER(first_name), '.', LOWER(last_name), '@company.com') as generated_email,
    -- Address formatting
    CONCAT(street, ', ', city, ', ', state, ' ', zip_code) as full_address
FROM

Sample Data (employees table):

first_name	last_name	street	city	state	zip_code
John	Smith	123 Main St	New York	NY	10001
Maria	Garcia	456 Oak Ave	Los Angeles	CA	90001
Sarah	Johnson	789 Elm Street	Chicago	IL	60601

Result:

first_name	last_name	full_name	generated_email	full_address
John	Smith	John Smith	john.smith@company.com	123 Main St, New York, NY 10001
Maria	Garcia	Maria Garcia	maria.garcia@company.com	456 Oak Ave, Los Angeles, CA 90001
Sarah	Johnson	Sarah Johnson	sarah.johnson@company.com	789 Elm Street, Chicago, IL 60601

Example 4: Combining All Three Functions

All Platforms:

SELECT 
    log_entry,
    -- Extract timestamp (first 19 characters)
    SUBSTRING(log_entry, 1, 19) as timestamp,
    -- Split by delimiter and get action
    SPLIT_PART(log_entry, '|', 2) as action,  -- Snowflake
    -- SPLIT(log_entry, '|')[OFFSET(1)] as action,  -- BigQuery
    -- SPLIT(log_entry, '|')[1] as action,  -- Databricks
    -- Extract user ID (after 'user=' and before '&')
    SUBSTRING(
        log_entry,
        POSITION('user=' IN log_entry) + 5,
        POSITION('&' IN log_entry) - POSITION('user=' IN log_entry) - 5
    ) as user_id,
    -- Reconstruct as formatted string
    CONCAT(
        '[', SUBSTRING(log_entry, 1, 19), '] ',
        SPLIT_PART(log_entry, '|', 2), ' by user ',
        SUBSTRING(log_entry, POSITION('user=' IN log_entry) + 5, 4)
    ) as formatted_log
FROM

SELECT 
    log_entry,
    -- Extract timestamp (first 19 characters)
    SUBSTRING(log_entry, 1, 19) as timestamp,
    -- Split by delimiter and get action
    SPLIT_PART(log_entry, '|', 2) as action,  -- Snowflake
    -- SPLIT(log_entry, '|')[OFFSET(1)] as action,  -- BigQuery
    -- SPLIT(log_entry, '|')[1] as action,  -- Databricks
    -- Extract user ID (after 'user=' and before '&')
    SUBSTRING(
        log_entry,
        POSITION('user=' IN log_entry) + 5,
        POSITION('&' IN log_entry) - POSITION('user=' IN log_entry) - 5
    ) as user_id,
    -- Reconstruct as formatted string
    CONCAT(
        '[', SUBSTRING(log_entry, 1, 19), '] ',
        SPLIT_PART(log_entry, '|', 2), ' by user ',
        SUBSTRING(log_entry, POSITION('user=' IN log_entry) + 5, 4)
    ) as formatted_log
FROM

SELECT 
    log_entry,
    -- Extract timestamp (first 19 characters)
    SUBSTRING(log_entry, 1, 19) as timestamp,
    -- Split by delimiter and get action
    SPLIT_PART(log_entry, '|', 2) as action,  -- Snowflake
    -- SPLIT(log_entry, '|')[OFFSET(1)] as action,  -- BigQuery
    -- SPLIT(log_entry, '|')[1] as action,  -- Databricks
    -- Extract user ID (after 'user=' and before '&')
    SUBSTRING(
        log_entry,
        POSITION('user=' IN log_entry) + 5,
        POSITION('&' IN log_entry) - POSITION('user=' IN log_entry) - 5
    ) as user_id,
    -- Reconstruct as formatted string
    CONCAT(
        '[', SUBSTRING(log_entry, 1, 19), '] ',
        SPLIT_PART(log_entry, '|', 2), ' by user ',
        SUBSTRING(log_entry, POSITION('user=' IN log_entry) + 5, 4)
    ) as formatted_log
FROM

Sample Data (system_logs table):

log_id	log_entry
1	2024-01-15 10:30:45\
2	2024-01-15 10:31:22\
3	2024-01-15 10:32:10\

Result:

log_entry	timestamp	action	user_id	formatted_log
...	2024-01-15 10:30:45	LOGIN	1001	[2024-01-15 10:30:45] LOGIN by user 1001
...	2024-01-15 10:31:22	PURCHASE	2045	[2024-01-15 10:31:22] PURCHASE by user 2045
...	2024-01-15 10:32:10	LOGOUT	1001	[2024-01-15 10:32:10] LOGOUT by user 1001

Example 5: Parsing CSV Data

All Platforms:

WITH parsed_data AS (
    SELECT 
        row_data,
        SPLIT(row_data, ',') as fields
    FROM csv_import
)
SELECT 
    fields[OFFSET(0)] as customer_id,      -- BigQuery
    -- SPLIT_PART(row_data, ',', 1) as customer_id,  -- Snowflake
    -- fields[0] as customer_id,  -- Databricks
    TRIM(fields[OFFSET(1)]) as name,
    TRIM(fields[OFFSET(2)]) as email,
    CAST(fields[OFFSET(3)] AS FLOAT64) as amount
FROM

WITH parsed_data AS (
    SELECT 
        row_data,
        SPLIT(row_data, ',') as fields
    FROM csv_import
)
SELECT 
    fields[OFFSET(0)] as customer_id,      -- BigQuery
    -- SPLIT_PART(row_data, ',', 1) as customer_id,  -- Snowflake
    -- fields[0] as customer_id,  -- Databricks
    TRIM(fields[OFFSET(1)]) as name,
    TRIM(fields[OFFSET(2)]) as email,
    CAST(fields[OFFSET(3)] AS FLOAT64) as amount
FROM

WITH parsed_data AS (
    SELECT 
        row_data,
        SPLIT(row_data, ',') as fields
    FROM csv_import
)
SELECT 
    fields[OFFSET(0)] as customer_id,      -- BigQuery
    -- SPLIT_PART(row_data, ',', 1) as customer_id,  -- Snowflake
    -- fields[0] as customer_id,  -- Databricks
    TRIM(fields[OFFSET(1)]) as name,
    TRIM(fields[OFFSET(2)]) as email,
    CAST(fields[OFFSET(3)] AS FLOAT64) as amount
FROM

Sample Data (csv_import table):

row_id	row_data
1	101, John Smith , john@email.com ,250.00
2	102, Maria Garcia , maria@email.com ,180.50
3	103, Sarah Johnson, sarah@email.com ,320.75

Result:

customer_id	name	email	amount
101	John Smith	john@email.com	250.00
102	Maria Garcia	maria@email.com	180.50
103	Sarah Johnson	sarah@email.com	320.75

Example 6: Building Dynamic URLs

All Platforms:

SELECT 
    product_id,
    product_name,
    category,
    -- Build product URL
    CONCAT(
        'https://store.com/products/',
        LOWER(REPLACE(category, ' ', '-')),
        '/',
        LOWER(REPLACE(product_name, ' ', '-')),
        '?id=',
        CAST(product_id AS STRING)
    ) as product_url,
    -- Build image URL
    CONCAT(
        'https://cdn.store.com/images/',
        product_id,
        '/thumbnail.jpg'
    ) as thumbnail_url
FROM

SELECT 
    product_id,
    product_name,
    category,
    -- Build product URL
    CONCAT(
        'https://store.com/products/',
        LOWER(REPLACE(category, ' ', '-')),
        '/',
        LOWER(REPLACE(product_name, ' ', '-')),
        '?id=',
        CAST(product_id AS STRING)
    ) as product_url,
    -- Build image URL
    CONCAT(
        'https://cdn.store.com/images/',
        product_id,
        '/thumbnail.jpg'
    ) as thumbnail_url
FROM

SELECT 
    product_id,
    product_name,
    category,
    -- Build product URL
    CONCAT(
        'https://store.com/products/',
        LOWER(REPLACE(category, ' ', '-')),
        '/',
        LOWER(REPLACE(product_name, ' ', '-')),
        '?id=',
        CAST(product_id AS STRING)
    ) as product_url,
    -- Build image URL
    CONCAT(
        'https://cdn.store.com/images/',
        product_id,
        '/thumbnail.jpg'
    ) as thumbnail_url
FROM

Sample Data:

product_id	product_name	category
1001	Wireless Mouse	Computer Accessories
2045	Standing Desk	Office Furniture
3099	USB-C Cable	Computer Accessories

Result:

product_id	product_name	product_url	thumbnail_url
1001	Wireless Mouse	https://store.com/products/computer-accessories/wireless-mouse?id=1001	https://cdn.store.com/images/1001/thumbnail.jpg
2045	Standing Desk	https://store.com/products/office-furniture/standing-desk?id=2045	https://cdn.store.com/images/2045/thumbnail.jpg
3099	USB-C Cable	https://store.com/products/computer-accessories/usb-c-cable?id=3099	https://cdn.store.com/images/3099/thumbnail.jpg

Related String Functions

Function	Purpose	Example
TRIM	Remove whitespace	`TRIM(' text ')` → `'text'`
UPPER	Convert to uppercase	`UPPER('text')` → `'TEXT'`
LOWER	Convert to lowercase	`LOWER('TEXT')` → `'text'`
REPLACE	Replace substring	`REPLACE('hello', 'l', 'L')` → `'heLLo'`
LENGTH	String length	`LENGTH('hello')` → `5`
POSITION	Find substring	`POSITION('o' IN 'hello')` → `5`
LEFT	Left n characters	`LEFT('hello', 3)` → `'hel'`
RIGHT	Right n characters	`RIGHT('hello', 3)` → `'llo'`

Common Use Cases

SUBSTRING: Extract codes, parse fixed-width data, truncate text
SPLIT: Parse CSVs, process tags, split names
CONCAT: Build full names, create URLs, format addresses
Combined: Log parsing, data transformation, ETL processes

Performance Tips

Use SPLIT_PART: Faster than SPLIT + array access in Snowflake
Avoid nested functions: Simplify complex string operations
Pre-process data: Clean strings during ingestion when possible
Use appropriate function: SUBSTRING faster than REGEXP for simple extraction
Index frequently searched strings: Improves WHERE performance

Best Practices

Handle NULLs: Use COALESCE for NULL strings
Trim inputs: Remove whitespace before processing
Validate lengths: Check string bounds before SUBSTRING
Test edge cases: Empty strings, missing delimiters
Document format: Explain expected input patterns
Use SAFE functions: BigQuery SAFE_OFFSET prevents errors

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Platform

Resources

ADD-ONs

Industries

About

Legal

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Platform

Resources