MERGE performs "upsert" operations - INSERT new rows or UPDATE existing rows based on a matching condition. It's essential for synchronizing tables, incremental loads, and slowly changing dimensions. MERGE is more efficient than separate INSERT and UPDATE statements.
Syntax
MERGE INTO target_table
USING source_table
ON match_condition
WHEN MATCHED THENUPDATESET column1 = value1, column2 = value2
WHENNOT MATCHED THENINSERT(column1, column2)VALUES(value1, value2)[WHENNOT MATCHED BY SOURCE THENDELETE]; -- Snowflake only
MERGE INTO target_table
USING source_table
ON match_condition
WHEN MATCHED THENUPDATESET column1 = value1, column2 = value2
WHENNOT MATCHED THENINSERT(column1, column2)VALUES(value1, value2)[WHENNOT MATCHED BY SOURCE THENDELETE]; -- Snowflake only
MERGE INTO target_table
USING source_table
ON match_condition
WHEN MATCHED THENUPDATESET column1 = value1, column2 = value2
WHENNOT MATCHED THENINSERT(column1, column2)VALUES(value1, value2)[WHENNOT MATCHED BY SOURCE THENDELETE]; -- Snowflake only
Platform-Specific Notes
Snowflake:
Supports all three clauses: MATCHED, NOT MATCHED, NOT MATCHED BY SOURCE
Can have multiple WHEN MATCHED/NOT MATCHED clauses with AND conditions
Optimized for large-scale upserts
Supports DELETE in WHEN NOT MATCHED BY SOURCE
BigQuery:
Supports MATCHED and NOT MATCHED
Does NOT support NOT MATCHED BY SOURCE natively
Requires MERGE quota (DML operations)
Can use UPDATE, INSERT, DELETE in clauses
Databricks:
Requires Delta Lake tables
Supports all MERGE operations
Optimized for large-scale data
Supports schema evolution
Example 1: Basic MERGE (Upsert)
All Platforms:
-- Update existing customers and insert new ones
MERGE INTO customers target
USING customer_updates source
ON target.customer_id = source.customer_id
WHEN MATCHED THENUPDATESET
target.name = source.name,
target.email = source.email,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, name, email, created_at)VALUES(source.customer_id, source.name, source.email,CURRENT_TIMESTAMP)
-- Update existing customers and insert new ones
MERGE INTO customers target
USING customer_updates source
ON target.customer_id = source.customer_id
WHEN MATCHED THENUPDATESET
target.name = source.name,
target.email = source.email,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, name, email, created_at)VALUES(source.customer_id, source.name, source.email,CURRENT_TIMESTAMP)
-- Update existing customers and insert new ones
MERGE INTO customers target
USING customer_updates source
ON target.customer_id = source.customer_id
WHEN MATCHED THENUPDATESET
target.name = source.name,
target.email = source.email,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, name, email, created_at)VALUES(source.customer_id, source.name, source.email,CURRENT_TIMESTAMP)
Row 1: Updated (email changed), Row 2: No change (matched but same), Row 3: Inserted (new)
Example 2: Conditional MERGE
All Platforms:
-- Update only if source is newer, insert new records
MERGE INTO product_inventory target
USING product_updates source
ON target.product_id = source.product_id
WHEN MATCHED AND source.last_updated > target.last_updated THENUPDATESET
target.quantity = source.quantity,
target.price = source.price,
target.last_updated = source.last_updated
WHENNOT MATCHED THENINSERT(product_id, product_name, quantity, price, last_updated)VALUES(source.product_id, source.product_name, source.quantity, source.price, source.last_updated)
-- Update only if source is newer, insert new records
MERGE INTO product_inventory target
USING product_updates source
ON target.product_id = source.product_id
WHEN MATCHED AND source.last_updated > target.last_updated THENUPDATESET
target.quantity = source.quantity,
target.price = source.price,
target.last_updated = source.last_updated
WHENNOT MATCHED THENINSERT(product_id, product_name, quantity, price, last_updated)VALUES(source.product_id, source.product_name, source.quantity, source.price, source.last_updated)
-- Update only if source is newer, insert new records
MERGE INTO product_inventory target
USING product_updates source
ON target.product_id = source.product_id
WHEN MATCHED AND source.last_updated > target.last_updated THENUPDATESET
target.quantity = source.quantity,
target.price = source.price,
target.last_updated = source.last_updated
WHENNOT MATCHED THENINSERT(product_id, product_name, quantity, price, last_updated)VALUES(source.product_id, source.product_name, source.quantity, source.price, source.last_updated)
Sample Data:
product_inventory (target - BEFORE):
product_id
product_name
quantity
price
last_updated
101
Laptop
50
999.99
2024-01-15
102
Mouse
200
29.99
2024-01-10
product_updates (source):
product_id
product_name
quantity
price
last_updated
101
Laptop
45
999.99
2024-01-12
102
Mouse
180
24.99
2024-01-18
103
Keyboard
150
89.99
2024-01-20
product_inventory (target - AFTER):
product_id
product_name
quantity
price
last_updated
101
Laptop
50
999.99
2024-01-15
102
Mouse
180
24.99
2024-01-18
103
Keyboard
150
89.99
2024-01-20
Product 101: Not updated (source is older), Product 102: Updated (source is newer), Product 103: Inserted
Example 3: MERGE with Aggregation Source
All Platforms:
-- Update customer summary from orders
MERGE INTO customer_summary target
USING(SELECT
customer_id,COUNT(*)as total_orders,
SUM(total)as lifetime_value,
AVG(total)as avg_order_value,
MAX(order_date)as last_order_date
FROM orders
WHERE order_date >= '2024-01-01'GROUPBY customer_id
) source
ON target.customer_id = source.customer_id
WHEN MATCHED THENUPDATESET
target.total_orders = target.total_orders + source.total_orders,
target.lifetime_value = target.lifetime_value + source.lifetime_value,
target.last_order_date = source.last_order_date,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, total_orders, lifetime_value, last_order_date, created_at)VALUES(source.customer_id, source.total_orders, source.lifetime_value, source.last_order_date,CURRENT_TIMESTAMP)
-- Update customer summary from orders
MERGE INTO customer_summary target
USING(SELECT
customer_id,COUNT(*)as total_orders,
SUM(total)as lifetime_value,
AVG(total)as avg_order_value,
MAX(order_date)as last_order_date
FROM orders
WHERE order_date >= '2024-01-01'GROUPBY customer_id
) source
ON target.customer_id = source.customer_id
WHEN MATCHED THENUPDATESET
target.total_orders = target.total_orders + source.total_orders,
target.lifetime_value = target.lifetime_value + source.lifetime_value,
target.last_order_date = source.last_order_date,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, total_orders, lifetime_value, last_order_date, created_at)VALUES(source.customer_id, source.total_orders, source.lifetime_value, source.last_order_date,CURRENT_TIMESTAMP)
-- Update customer summary from orders
MERGE INTO customer_summary target
USING(SELECT
customer_id,COUNT(*)as total_orders,
SUM(total)as lifetime_value,
AVG(total)as avg_order_value,
MAX(order_date)as last_order_date
FROM orders
WHERE order_date >= '2024-01-01'GROUPBY customer_id
) source
ON target.customer_id = source.customer_id
WHEN MATCHED THENUPDATESET
target.total_orders = target.total_orders + source.total_orders,
target.lifetime_value = target.lifetime_value + source.lifetime_value,
target.last_order_date = source.last_order_date,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, total_orders, lifetime_value, last_order_date, created_at)VALUES(source.customer_id, source.total_orders, source.lifetime_value, source.last_order_date,CURRENT_TIMESTAMP)
Result:
Existing customers: Updated with new order statistics
New customers: Inserted with initial statistics
Example 4: MERGE with DELETE (Snowflake)
Snowflake Only:
-- Sync product catalog - update, insert, and delete discontinued products
MERGE INTO product_catalog target
USING external_product_feed source
ON target.product_id = source.product_id
WHEN MATCHED THENUPDATESET
target.product_name = source.product_name,
target.price = source.price,
target.status = 'Active',
target.last_updated = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(product_id, product_name, price, status, created_at)VALUES(source.product_id, source.product_name, source.price,'Active',CURRENT_TIMESTAMP)WHENNOT MATCHED BY SOURCE THENUPDATESET
target.status = 'Discontinued',
target.discontinued_at = CURRENT_TIMESTAMP
-- Sync product catalog - update, insert, and delete discontinued products
MERGE INTO product_catalog target
USING external_product_feed source
ON target.product_id = source.product_id
WHEN MATCHED THENUPDATESET
target.product_name = source.product_name,
target.price = source.price,
target.status = 'Active',
target.last_updated = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(product_id, product_name, price, status, created_at)VALUES(source.product_id, source.product_name, source.price,'Active',CURRENT_TIMESTAMP)WHENNOT MATCHED BY SOURCE THENUPDATESET
target.status = 'Discontinued',
target.discontinued_at = CURRENT_TIMESTAMP
-- Sync product catalog - update, insert, and delete discontinued products
MERGE INTO product_catalog target
USING external_product_feed source
ON target.product_id = source.product_id
WHEN MATCHED THENUPDATESET
target.product_name = source.product_name,
target.price = source.price,
target.status = 'Active',
target.last_updated = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(product_id, product_name, price, status, created_at)VALUES(source.product_id, source.product_name, source.price,'Active',CURRENT_TIMESTAMP)WHENNOT MATCHED BY SOURCE THENUPDATESET
target.status = 'Discontinued',
target.discontinued_at = CURRENT_TIMESTAMP
BigQuery/Databricks Alternative:
-- Two-step process for BigQuery/Databricks-- Step 1: Merge
MERGE INTO product_catalog target
USING external_product_feed source
ON target.product_id = source.product_id
WHEN MATCHED THENUPDATESET ...
WHENNOT MATCHED THENINSERT ...;
-- Step 2: Mark discontinued productsUPDATE product_catalog
SET status = 'Discontinued',
discontinued_at = CURRENT_TIMESTAMPWHERE product_id NOTIN(SELECT product_id FROM external_product_feed)AND status = 'Active'
-- Two-step process for BigQuery/Databricks-- Step 1: Merge
MERGE INTO product_catalog target
USING external_product_feed source
ON target.product_id = source.product_id
WHEN MATCHED THENUPDATESET ...
WHENNOT MATCHED THENINSERT ...;
-- Step 2: Mark discontinued productsUPDATE product_catalog
SET status = 'Discontinued',
discontinued_at = CURRENT_TIMESTAMPWHERE product_id NOTIN(SELECT product_id FROM external_product_feed)AND status = 'Active'
-- Two-step process for BigQuery/Databricks-- Step 1: Merge
MERGE INTO product_catalog target
USING external_product_feed source
ON target.product_id = source.product_id
WHEN MATCHED THENUPDATESET ...
WHENNOT MATCHED THENINSERT ...;
-- Step 2: Mark discontinued productsUPDATE product_catalog
SET status = 'Discontinued',
discontinued_at = CURRENT_TIMESTAMPWHERE product_id NOTIN(SELECT product_id FROM external_product_feed)AND status = 'Active'
Example 5: Slowly Changing Dimension Type 2
All Platforms:
-- SCD Type 2: Keep history of changes
MERGE INTO customer_dimension target
USING customer_updates source
ON target.customer_id = source.customer_id AND target.is_current = TRUEWHEN MATCHED AND(
target.name != source.name OR
target.email != source.email OR
target.address != source.address
)THENUPDATESET
target.is_current = FALSE,
target.end_date = CURRENT_DATE,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, name, email, address, is_current, start_date, end_date, created_at)VALUES(source.customer_id, source.name, source.email, source.address,TRUE,CURRENT_DATE,NULL,CURRENT_TIMESTAMP);
-- Insert new versions of changed recordsINSERTINTO customer_dimension (customer_id, name, email, address, is_current, start_date, end_date, created_at)SELECT
source.customer_id,
source.name,
source.email,
source.address,TRUEas is_current,CURRENT_DATEas start_date,NULLas end_date,CURRENT_TIMESTAMPas created_at
FROM customer_updates source
INNERJOIN customer_dimension target
ON target.customer_id = source.customer_id
AND target.is_current = FALSEAND target.end_date = CURRENT_DATE
-- SCD Type 2: Keep history of changes
MERGE INTO customer_dimension target
USING customer_updates source
ON target.customer_id = source.customer_id AND target.is_current = TRUEWHEN MATCHED AND(
target.name != source.name OR
target.email != source.email OR
target.address != source.address
)THENUPDATESET
target.is_current = FALSE,
target.end_date = CURRENT_DATE,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, name, email, address, is_current, start_date, end_date, created_at)VALUES(source.customer_id, source.name, source.email, source.address,TRUE,CURRENT_DATE,NULL,CURRENT_TIMESTAMP);
-- Insert new versions of changed recordsINSERTINTO customer_dimension (customer_id, name, email, address, is_current, start_date, end_date, created_at)SELECT
source.customer_id,
source.name,
source.email,
source.address,TRUEas is_current,CURRENT_DATEas start_date,NULLas end_date,CURRENT_TIMESTAMPas created_at
FROM customer_updates source
INNERJOIN customer_dimension target
ON target.customer_id = source.customer_id
AND target.is_current = FALSEAND target.end_date = CURRENT_DATE
-- SCD Type 2: Keep history of changes
MERGE INTO customer_dimension target
USING customer_updates source
ON target.customer_id = source.customer_id AND target.is_current = TRUEWHEN MATCHED AND(
target.name != source.name OR
target.email != source.email OR
target.address != source.address
)THENUPDATESET
target.is_current = FALSE,
target.end_date = CURRENT_DATE,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(customer_id, name, email, address, is_current, start_date, end_date, created_at)VALUES(source.customer_id, source.name, source.email, source.address,TRUE,CURRENT_DATE,NULL,CURRENT_TIMESTAMP);
-- Insert new versions of changed recordsINSERTINTO customer_dimension (customer_id, name, email, address, is_current, start_date, end_date, created_at)SELECT
source.customer_id,
source.name,
source.email,
source.address,TRUEas is_current,CURRENT_DATEas start_date,NULLas end_date,CURRENT_TIMESTAMPas created_at
FROM customer_updates source
INNERJOIN customer_dimension target
ON target.customer_id = source.customer_id
AND target.is_current = FALSEAND target.end_date = CURRENT_DATE
Result:
Changed records: Old version marked as not current with end_date, new version inserted
Unchanged records: No action
New customers: Inserted as current
Example 6: MERGE with Complex Logic
All Platforms:
MERGE INTO sales_targets target
USING(SELECT
employee_id,
EXTRACT(YEARFROM sale_date)asyear,
EXTRACT(QUARTER FROM sale_date)as quarter,
SUM(amount)as actual_sales
FROM sales
GROUPBY employee_id, EXTRACT(YEARFROM sale_date), EXTRACT(QUARTER FROM sale_date)) actual
ON target.employee_id = actual.employee_id
AND target.year = actual.year
AND target.quarter = actual.quarter
WHEN MATCHED THENUPDATESET
target.actual_sales = actual.actual_sales,
target.variance = actual.actual_sales - target.target_sales,
target.variance_pct = (actual.actual_sales - target.target_sales) * 100.0 / target.target_sales,
target.status = CASEWHEN actual.actual_sales >= target.target_sales THEN'Met'WHEN actual.actual_sales >= target.target_sales * 0.9THEN'Near'ELSE'Below'END,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(employee_id,year, quarter, actual_sales, target_sales, variance, status, created_at)VALUES(actual.employee_id, actual.year, actual.quarter, actual.actual_sales,0, actual.actual_sales,'No Target',CURRENT_TIMESTAMP)
MERGE INTO sales_targets target
USING(SELECT
employee_id,
EXTRACT(YEARFROM sale_date)asyear,
EXTRACT(QUARTER FROM sale_date)as quarter,
SUM(amount)as actual_sales
FROM sales
GROUPBY employee_id, EXTRACT(YEARFROM sale_date), EXTRACT(QUARTER FROM sale_date)) actual
ON target.employee_id = actual.employee_id
AND target.year = actual.year
AND target.quarter = actual.quarter
WHEN MATCHED THENUPDATESET
target.actual_sales = actual.actual_sales,
target.variance = actual.actual_sales - target.target_sales,
target.variance_pct = (actual.actual_sales - target.target_sales) * 100.0 / target.target_sales,
target.status = CASEWHEN actual.actual_sales >= target.target_sales THEN'Met'WHEN actual.actual_sales >= target.target_sales * 0.9THEN'Near'ELSE'Below'END,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(employee_id,year, quarter, actual_sales, target_sales, variance, status, created_at)VALUES(actual.employee_id, actual.year, actual.quarter, actual.actual_sales,0, actual.actual_sales,'No Target',CURRENT_TIMESTAMP)
MERGE INTO sales_targets target
USING(SELECT
employee_id,
EXTRACT(YEARFROM sale_date)asyear,
EXTRACT(QUARTER FROM sale_date)as quarter,
SUM(amount)as actual_sales
FROM sales
GROUPBY employee_id, EXTRACT(YEARFROM sale_date), EXTRACT(QUARTER FROM sale_date)) actual
ON target.employee_id = actual.employee_id
AND target.year = actual.year
AND target.quarter = actual.quarter
WHEN MATCHED THENUPDATESET
target.actual_sales = actual.actual_sales,
target.variance = actual.actual_sales - target.target_sales,
target.variance_pct = (actual.actual_sales - target.target_sales) * 100.0 / target.target_sales,
target.status = CASEWHEN actual.actual_sales >= target.target_sales THEN'Met'WHEN actual.actual_sales >= target.target_sales * 0.9THEN'Near'ELSE'Below'END,
target.updated_at = CURRENT_TIMESTAMPWHENNOT MATCHED THENINSERT(employee_id,year, quarter, actual_sales, target_sales, variance, status, created_at)VALUES(actual.employee_id, actual.year, actual.quarter, actual.actual_sales,0, actual.actual_sales,'No Target',CURRENT_TIMESTAMP)
MERGE Performance Comparison
Approach
Operations
Efficiency
Use Case
MERGE
1 statement
High
Standard upserts
INSERT + UPDATE
2 statements
Medium
Simple scenarios
DELETE + INSERT
2 statements
Low
Full refresh
Performance Example:
MERGE: ~10 seconds for 1M rows
Separate INSERT/UPDATE: ~25 seconds for 1M rows
Common Use Cases
Data synchronization: Keep tables in sync across systems
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.