REGEXP_EXTRACT / REGEXP_SUBSTR
Feb 23, 2026
·
5
min read
Category: String Functions
Platform Support:
✅ Snowflake (REGEXP_SUBSTR) | ✅ BigQuery (REGEXP_EXTRACT) | ✅ Databricks (REGEXP_EXTRACT)
Description
Extracts substrings that match a regular expression pattern. Essential for parsing unstructured text, extracting domains from emails, parsing log files, extracting codes from descriptions, and working with semi-structured string data.
Syntax by Platform
Snowflake:
BigQuery:
Databricks:
Platform-Specific Notes
Snowflake:
Function name is
REGEXP_SUBSTR(not EXTRACT)Most flexible with position, occurrence, and parameters
Use
'e'parameter to enable capture groupsGroup 0 returns full match, 1+ returns capture groups
BigQuery:
Use raw string prefix
r'...'for regex patternsReturns only the first capturing group
REGEXP_EXTRACT_ALLreturns array of all matchesREGEXP_CONTAINSchecks if pattern exists
Databricks:
Third parameter specifies capture group (0-based)
Group 0 returns entire match
Compatible with Java regex syntax
Example 1: Extract Domain from Email
Snowflake:
BigQuery:
Databricks:
Sample Data (users table):
user_id | |
|---|---|
1 | |
2 | |
3 | |
4 |
Result (all platforms):
domain | |
|---|---|
Example 2: Extract Product Code from Description
Snowflake:
BigQuery:
Databricks:
Sample Data (products table):
product_id | description |
|---|---|
1 | Premium Laptop SKU-LAP123 with 512GB SSD |
2 | Wireless Mouse SKU-MOU456 black edition |
3 | Gaming Keyboard SKU-KEY789 RGB with 256GB storage |
4 | External Drive SKU-HDD999 2000GB capacity |
Result:
description | product_code | storage_gb |
|---|---|---|
Premium Laptop SKU-LAP123 with 512GB SSD | LAP123 | 512 |
Wireless Mouse SKU-MOU456 black edition | MOU456 | NULL |
Gaming Keyboard SKU-KEY789 RGB with 256GB storage | KEY789 | 256 |
External Drive SKU-HDD999 2000GB capacity | HDD999 | 2000 |
Example 3: Extract Phone Number Components
All Platforms:
Sample Data (contacts table):
contact_id | phone_number |
|---|---|
1 | (555) 123-4567 |
2 | 555-234-5678 |
3 | 555.345.6789 |
4 | (555) 456-7890 |
Result (BigQuery/Databricks syntax):
phone_number | area_code | exchange | line_number |
|---|---|---|---|
(555) 123-4567 | 555 | 123 | 4567 |
555-234-5678 | 555 | 234 | 5678 |
555.345.6789 | 555 | 345 | 6789 |
(555) 456-7890 | 555 | 456 | 7890 |
Example 4: Extract Multiple Matches (BigQuery)
BigQuery Only:
Sample Data (social_posts table):
post_id | text |
|---|---|
1 | Check out our new product! #tech #innovation #cloud |
2 | Great day at the office #worklife #productivity |
3 | No hashtags in this post |
Result:
text | hashtags | hashtag_count |
|---|---|---|
Check out our new product! #tech #innovation #cloud | ["tech", "innovation", "cloud"] | 3 |
Great day at the office #worklife #productivity | ["worklife", "productivity"] | 2 |
No hashtags in this post | [] | 0 |
Common Regex Patterns
Pattern | Description | Example |
|---|---|---|
| Extract domain from email | |
| First 2-3 capital letters | USA123 → USA |
| SSN components | 123-45-6789 → groups |
| Email address | text with email@test.com |
| Extract URL | link: https://example.com |
| Extract hashtags | #DataEngineering → DataEngineering |
Related Functions
Snowflake:
REGEXP_REPLACE- Replace text matching patternREGEXP_INSTR- Find position of patternREGEXP_COUNT- Count pattern occurrences
BigQuery:
REGEXP_REPLACE- Replace text matching patternREGEXP_CONTAINS- Check if pattern exists (returns boolean)REGEXP_EXTRACT_ALL- Extract all matches as array
Databricks:
REGEXP_REPLACE- Replace text matching patternREGEXP_LIKE- Check if string matches pattern
Performance Considerations
Regex operations can be expensive on large datasets
Pre-filter data before applying regex when possible
Cache commonly used patterns
Consider simpler string functions (SUBSTRING, SPLIT) when regex isn't needed
In BigQuery, use
REGEXP_CONTAINSfor existence checks (faster than EXTRACT)
Best Practices
Test patterns thoroughly - Use online regex testers
Use raw strings in BigQuery - Prefix with
r'...'Handle NULLs - Wrap in COALESCE if needed
Document complex patterns - Add comments explaining the regex
Use capture groups wisely - Only capture what you need
Consider case sensitivity - Use flags appropriately





