SQL Keywords

SQL Keywords

SUBSTRING / SPLIT / CONCAT

Feb 23, 2026

·

5

min read

Category: String Functions

Platform Support:

✅ Snowflake | ✅ BigQuery | ✅ Databricks

Description

String manipulation functions for extracting, splitting, and combining text. SUBSTRING extracts portions of strings, SPLIT divides strings into arrays, and CONCAT combines strings. Essential for text processing, parsing, and data transformation.

Syntax by Function

SUBSTRING:

SUBSTRING(string, start_position, length)
-- OR
SUBSTR(string, start_position, length)
SUBSTRING(string, start_position, length)
-- OR
SUBSTR(string, start_position, length)
SUBSTRING(string, start_position, length)
-- OR
SUBSTR(string, start_position, length)

SPLIT:

-- Snowflake
SPLIT(string, delimiter)

-- BigQuery
SPLIT(string, delimiter)

-- Databricks
SPLIT(string, delimiter)
-- Snowflake
SPLIT(string, delimiter)

-- BigQuery
SPLIT(string, delimiter)

-- Databricks
SPLIT(string, delimiter)
-- Snowflake
SPLIT(string, delimiter)

-- BigQuery
SPLIT(string, delimiter)

-- Databricks
SPLIT(string, delimiter)

CONCAT:

CONCAT(string1, string2, ...)
-- OR
string1 || string2 || ...  -- String concatenation operator
CONCAT(string1, string2, ...)
-- OR
string1 || string2 || ...  -- String concatenation operator
CONCAT(string1, string2, ...)
-- OR
string1 || string2 || ...  -- String concatenation operator

Platform-Specific Notes

Snowflake:

  • SUBSTRING and SUBSTR are aliases

  • SPLIT returns ARRAY

  • SPLIT_PART for specific element

  • CONCAT works with multiple arguments

BigQuery:

  • SUBSTR is primary function

  • SPLIT returns ARRAY

  • Use ARRAY subscript [OFFSET(n)] or [ORDINAL(n)]

  • CONCAT has max 5 arguments (use || for more)

Databricks:

  • Standard Spark SQL functions

  • SPLIT returns ARRAY

  • Use bracket notation for array access

  • CONCAT supports multiple arguments

Example 1: SUBSTRING - Extract Parts of String

All Platforms:

SELECT 
    full_name,
    email,
    -- Extract first 3 characters
    SUBSTRING(full_name, 1, 3) as name_prefix,
    -- Extract from position 5 to end
    SUBSTRING(email, 5) as email_suffix,
    -- Extract domain from email (after @)
    SUBSTRING(email, POSITION('@' IN email) + 1) as email_domain,
    -- Extract area code from phone (555) 123-4567
    SUBSTRING(phone, 2, 3) as area_code
FROM

SELECT 
    full_name,
    email,
    -- Extract first 3 characters
    SUBSTRING(full_name, 1, 3) as name_prefix,
    -- Extract from position 5 to end
    SUBSTRING(email, 5) as email_suffix,
    -- Extract domain from email (after @)
    SUBSTRING(email, POSITION('@' IN email) + 1) as email_domain,
    -- Extract area code from phone (555) 123-4567
    SUBSTRING(phone, 2, 3) as area_code
FROM

SELECT 
    full_name,
    email,
    -- Extract first 3 characters
    SUBSTRING(full_name, 1, 3) as name_prefix,
    -- Extract from position 5 to end
    SUBSTRING(email, 5) as email_suffix,
    -- Extract domain from email (after @)
    SUBSTRING(email, POSITION('@' IN email) + 1) as email_domain,
    -- Extract area code from phone (555) 123-4567
    SUBSTRING(phone, 2, 3) as area_code
FROM

Sample Data (customers table):

full_name

email

phone

John Smith

john@gmail.com

(555) 123-4567

Maria Garcia

maria@company.co.uk

(212) 555-0100

Sarah Johnson

sarah@outlook.com

(415) 555-7890

Result:

full_name

email

name_prefix

email_suffix

email_domain

area_code

John Smith

john@gmail.com

Joh

@gmail.com

gmail.com

555

Maria Garcia

maria@company.co.uk

Mar

a@company.co.uk

company.co.uk

212

Sarah Johnson

sarah@outlook.com

Sar

h@outlook.com

outlook.com

415

Example 2: SPLIT - Parse Delimited Strings

Snowflake:

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT_PART(tags, ',', 1) as first_tag,
    SPLIT_PART(tags, ',', 2) as second_tag,
    ARRAY_SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT_PART(tags, ',', 1) as first_tag,
    SPLIT_PART(tags, ',', 2) as second_tag,
    ARRAY_SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT_PART(tags, ',', 1) as first_tag,
    SPLIT_PART(tags, ',', 2) as second_tag,
    ARRAY_SIZE(SPLIT(tags, ',')) as tag_count
FROM

BigQuery:

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[OFFSET(0)] as first_tag,
    SPLIT(tags, ',')[SAFE_OFFSET(1)] as second_tag,
    ARRAY_LENGTH(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[OFFSET(0)] as first_tag,
    SPLIT(tags, ',')[SAFE_OFFSET(1)] as second_tag,
    ARRAY_LENGTH(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[OFFSET(0)] as first_tag,
    SPLIT(tags, ',')[SAFE_OFFSET(1)] as second_tag,
    ARRAY_LENGTH(SPLIT(tags, ',')) as tag_count
FROM

Databricks:

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[0] as first_tag,
    SPLIT(tags, ',')[1] as second_tag,
    SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[0] as first_tag,
    SPLIT(tags, ',')[1] as second_tag,
    SIZE(SPLIT(tags, ',')) as tag_count
FROM

SELECT 
    tags,
    SPLIT(tags, ',') as tags_array,
    SPLIT(tags, ',')[0] as first_tag,
    SPLIT(tags, ',')[1] as second_tag,
    SIZE(SPLIT(tags, ',')) as tag_count
FROM

Sample Data (products table):

product_id

product_name

tags

1

Laptop

electronics,computer,portable

2

Desk

furniture,office,wood

3

Mouse

electronics,accessory

Result:

tags

tags_array

first_tag

second_tag

tag_count

electronics,computer,portable

["electronics","computer","portable"]

electronics

computer

3

furniture,office,wood

["furniture","office","wood"]

furniture

office

3

electronics,accessory

["electronics","accessory"]

electronics

accessory

2

Example 3: CONCAT - Combine Strings

All Platforms:

SELECT 
    first_name,
    last_name,
    -- Using CONCAT function
    CONCAT(first_name, ' ', last_name) as full_name,
    -- Using || operator
    first_name || ' ' || last_name as full_name_alt,
    -- Email generation
    CONCAT(LOWER(first_name), '.', LOWER(last_name), '@company.com') as generated_email,
    -- Address formatting
    CONCAT(street, ', ', city, ', ', state, ' ', zip_code) as full_address
FROM

SELECT 
    first_name,
    last_name,
    -- Using CONCAT function
    CONCAT(first_name, ' ', last_name) as full_name,
    -- Using || operator
    first_name || ' ' || last_name as full_name_alt,
    -- Email generation
    CONCAT(LOWER(first_name), '.', LOWER(last_name), '@company.com') as generated_email,
    -- Address formatting
    CONCAT(street, ', ', city, ', ', state, ' ', zip_code) as full_address
FROM

SELECT 
    first_name,
    last_name,
    -- Using CONCAT function
    CONCAT(first_name, ' ', last_name) as full_name,
    -- Using || operator
    first_name || ' ' || last_name as full_name_alt,
    -- Email generation
    CONCAT(LOWER(first_name), '.', LOWER(last_name), '@company.com') as generated_email,
    -- Address formatting
    CONCAT(street, ', ', city, ', ', state, ' ', zip_code) as full_address
FROM

Sample Data (employees table):

first_name

last_name

street

city

state

zip_code

John

Smith

123 Main St

New York

NY

10001

Maria

Garcia

456 Oak Ave

Los Angeles

CA

90001

Sarah

Johnson

789 Elm Street

Chicago

IL

60601

Result:

first_name

last_name

full_name

generated_email

full_address

John

Smith

John Smith

john.smith@company.com

123 Main St, New York, NY 10001

Maria

Garcia

Maria Garcia

maria.garcia@company.com

456 Oak Ave, Los Angeles, CA 90001

Sarah

Johnson

Sarah Johnson

sarah.johnson@company.com

789 Elm Street, Chicago, IL 60601

Example 4: Combining All Three Functions

All Platforms:

SELECT 
    log_entry,
    -- Extract timestamp (first 19 characters)
    SUBSTRING(log_entry, 1, 19) as timestamp,
    -- Split by delimiter and get action
    SPLIT_PART(log_entry, '|', 2) as action,  -- Snowflake
    -- SPLIT(log_entry, '|')[OFFSET(1)] as action,  -- BigQuery
    -- SPLIT(log_entry, '|')[1] as action,  -- Databricks
    -- Extract user ID (after 'user=' and before '&')
    SUBSTRING(
        log_entry,
        POSITION('user=' IN log_entry) + 5,
        POSITION('&' IN log_entry) - POSITION('user=' IN log_entry) - 5
    ) as user_id,
    -- Reconstruct as formatted string
    CONCAT(
        '[', SUBSTRING(log_entry, 1, 19), '] ',
        SPLIT_PART(log_entry, '|', 2), ' by user ',
        SUBSTRING(log_entry, POSITION('user=' IN log_entry) + 5, 4)
    ) as formatted_log
FROM

SELECT 
    log_entry,
    -- Extract timestamp (first 19 characters)
    SUBSTRING(log_entry, 1, 19) as timestamp,
    -- Split by delimiter and get action
    SPLIT_PART(log_entry, '|', 2) as action,  -- Snowflake
    -- SPLIT(log_entry, '|')[OFFSET(1)] as action,  -- BigQuery
    -- SPLIT(log_entry, '|')[1] as action,  -- Databricks
    -- Extract user ID (after 'user=' and before '&')
    SUBSTRING(
        log_entry,
        POSITION('user=' IN log_entry) + 5,
        POSITION('&' IN log_entry) - POSITION('user=' IN log_entry) - 5
    ) as user_id,
    -- Reconstruct as formatted string
    CONCAT(
        '[', SUBSTRING(log_entry, 1, 19), '] ',
        SPLIT_PART(log_entry, '|', 2), ' by user ',
        SUBSTRING(log_entry, POSITION('user=' IN log_entry) + 5, 4)
    ) as formatted_log
FROM

SELECT 
    log_entry,
    -- Extract timestamp (first 19 characters)
    SUBSTRING(log_entry, 1, 19) as timestamp,
    -- Split by delimiter and get action
    SPLIT_PART(log_entry, '|', 2) as action,  -- Snowflake
    -- SPLIT(log_entry, '|')[OFFSET(1)] as action,  -- BigQuery
    -- SPLIT(log_entry, '|')[1] as action,  -- Databricks
    -- Extract user ID (after 'user=' and before '&')
    SUBSTRING(
        log_entry,
        POSITION('user=' IN log_entry) + 5,
        POSITION('&' IN log_entry) - POSITION('user=' IN log_entry) - 5
    ) as user_id,
    -- Reconstruct as formatted string
    CONCAT(
        '[', SUBSTRING(log_entry, 1, 19), '] ',
        SPLIT_PART(log_entry, '|', 2), ' by user ',
        SUBSTRING(log_entry, POSITION('user=' IN log_entry) + 5, 4)
    ) as formatted_log
FROM

Sample Data (system_logs table):

log_id

log_entry

1

2024-01-15 10:30:45\

2

2024-01-15 10:31:22\

3

2024-01-15 10:32:10\

Result:

log_entry

timestamp

action

user_id

formatted_log

...

2024-01-15 10:30:45

LOGIN

1001

[2024-01-15 10:30:45] LOGIN by user 1001

...

2024-01-15 10:31:22

PURCHASE

2045

[2024-01-15 10:31:22] PURCHASE by user 2045

...

2024-01-15 10:32:10

LOGOUT

1001

[2024-01-15 10:32:10] LOGOUT by user 1001

Example 5: Parsing CSV Data

All Platforms:

WITH parsed_data AS (
    SELECT 
        row_data,
        SPLIT(row_data, ',') as fields
    FROM csv_import
)
SELECT 
    fields[OFFSET(0)] as customer_id,      -- BigQuery
    -- SPLIT_PART(row_data, ',', 1) as customer_id,  -- Snowflake
    -- fields[0] as customer_id,  -- Databricks
    TRIM(fields[OFFSET(1)]) as name,
    TRIM(fields[OFFSET(2)]) as email,
    CAST(fields[OFFSET(3)] AS FLOAT64) as amount
FROM

WITH parsed_data AS (
    SELECT 
        row_data,
        SPLIT(row_data, ',') as fields
    FROM csv_import
)
SELECT 
    fields[OFFSET(0)] as customer_id,      -- BigQuery
    -- SPLIT_PART(row_data, ',', 1) as customer_id,  -- Snowflake
    -- fields[0] as customer_id,  -- Databricks
    TRIM(fields[OFFSET(1)]) as name,
    TRIM(fields[OFFSET(2)]) as email,
    CAST(fields[OFFSET(3)] AS FLOAT64) as amount
FROM

WITH parsed_data AS (
    SELECT 
        row_data,
        SPLIT(row_data, ',') as fields
    FROM csv_import
)
SELECT 
    fields[OFFSET(0)] as customer_id,      -- BigQuery
    -- SPLIT_PART(row_data, ',', 1) as customer_id,  -- Snowflake
    -- fields[0] as customer_id,  -- Databricks
    TRIM(fields[OFFSET(1)]) as name,
    TRIM(fields[OFFSET(2)]) as email,
    CAST(fields[OFFSET(3)] AS FLOAT64) as amount
FROM

Sample Data (csv_import table):

row_id

row_data

1

101, John Smith , john@email.com ,250.00

2

102, Maria Garcia , maria@email.com ,180.50

3

103, Sarah Johnson, sarah@email.com ,320.75

Result:

customer_id

name

email

amount

101

John Smith

john@email.com

250.00

102

Maria Garcia

maria@email.com

180.50

103

Sarah Johnson

sarah@email.com

320.75

Example 6: Building Dynamic URLs

All Platforms:

SELECT 
    product_id,
    product_name,
    category,
    -- Build product URL
    CONCAT(
        'https://store.com/products/',
        LOWER(REPLACE(category, ' ', '-')),
        '/',
        LOWER(REPLACE(product_name, ' ', '-')),
        '?id=',
        CAST(product_id AS STRING)
    ) as product_url,
    -- Build image URL
    CONCAT(
        'https://cdn.store.com/images/',
        product_id,
        '/thumbnail.jpg'
    ) as thumbnail_url
FROM

SELECT 
    product_id,
    product_name,
    category,
    -- Build product URL
    CONCAT(
        'https://store.com/products/',
        LOWER(REPLACE(category, ' ', '-')),
        '/',
        LOWER(REPLACE(product_name, ' ', '-')),
        '?id=',
        CAST(product_id AS STRING)
    ) as product_url,
    -- Build image URL
    CONCAT(
        'https://cdn.store.com/images/',
        product_id,
        '/thumbnail.jpg'
    ) as thumbnail_url
FROM

SELECT 
    product_id,
    product_name,
    category,
    -- Build product URL
    CONCAT(
        'https://store.com/products/',
        LOWER(REPLACE(category, ' ', '-')),
        '/',
        LOWER(REPLACE(product_name, ' ', '-')),
        '?id=',
        CAST(product_id AS STRING)
    ) as product_url,
    -- Build image URL
    CONCAT(
        'https://cdn.store.com/images/',
        product_id,
        '/thumbnail.jpg'
    ) as thumbnail_url
FROM

Sample Data:

product_id

product_name

category

1001

Wireless Mouse

Computer Accessories

2045

Standing Desk

Office Furniture

3099

USB-C Cable

Computer Accessories

Result:

product_id

product_name

product_url

thumbnail_url

1001

Wireless Mouse

https://store.com/products/computer-accessories/wireless-mouse?id=1001

https://cdn.store.com/images/1001/thumbnail.jpg

2045

Standing Desk

https://store.com/products/office-furniture/standing-desk?id=2045

https://cdn.store.com/images/2045/thumbnail.jpg

3099

USB-C Cable

https://store.com/products/computer-accessories/usb-c-cable?id=3099

https://cdn.store.com/images/3099/thumbnail.jpg

Related String Functions

Function

Purpose

Example

TRIM

Remove whitespace

TRIM(' text ')'text'

UPPER

Convert to uppercase

UPPER('text')'TEXT'

LOWER

Convert to lowercase

LOWER('TEXT')'text'

REPLACE

Replace substring

REPLACE('hello', 'l', 'L')'heLLo'

LENGTH

String length

LENGTH('hello')5

POSITION

Find substring

POSITION('o' IN 'hello')5

LEFT

Left n characters

LEFT('hello', 3)'hel'

RIGHT

Right n characters

RIGHT('hello', 3)'llo'

Common Use Cases

  1. SUBSTRING: Extract codes, parse fixed-width data, truncate text

  2. SPLIT: Parse CSVs, process tags, split names

  3. CONCAT: Build full names, create URLs, format addresses

  4. Combined: Log parsing, data transformation, ETL processes

Performance Tips

  1. Use SPLIT_PART: Faster than SPLIT + array access in Snowflake

  2. Avoid nested functions: Simplify complex string operations

  3. Pre-process data: Clean strings during ingestion when possible

  4. Use appropriate function: SUBSTRING faster than REGEXP for simple extraction

  5. Index frequently searched strings: Improves WHERE performance

Best Practices

  1. Handle NULLs: Use COALESCE for NULL strings

  2. Trim inputs: Remove whitespace before processing

  3. Validate lengths: Check string bounds before SUBSTRING

  4. Test edge cases: Empty strings, missing delimiters

  5. Document format: Explain expected input patterns

  6. Use SAFE functions: BigQuery SAFE_OFFSET prevents errors

Interested to Learn More?
Try Out the Free 14-Days Trial

More Articles

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.