Mastering dbt Incremental Models: A Practical Guide
Feb 26, 2026
Mastering dbt Incremental Models: A Practical Guide
Every analytics engineer eventually hits the same wall: a dbt™ model that ran in two minutes suddenly takes forty-five. Source tables grow, transformations compound, and warehouse bills climb. Rebuilding millions of rows each run when only a few thousand changed is wasteful—and avoidable.
dbt™ incremental models solve this by processing only new or changed data instead of rebuilding entire tables. They are the single most impactful optimization you can make to a dbt™ pipeline once table sizes cross the threshold where full rebuilds become costly or slow.
This guide walks you through everything you need to configure, operate, and troubleshoot dbt™ incremental models—from choosing the right incremental strategy to handling schema drift with on_schema_change, applying incremental_predicates on billion-row tables, and knowing when to reach for --full-refresh.
What Are dbt Incremental Models
An incremental model is a dbt™ materialization type that appends or updates only new and changed rows in a target table, rather than dropping and recreating the entire table on every run. It is one of three core dbt™ materializations, each designed for different use cases:
Incremental materialization: Only processes new or changed rows since the last successful run. The target table persists between runs and grows over time.
Table materialization: Drops and recreates the entire table from scratch every run. Simple and accurate, but expensive for large datasets.
View materialization: Stores only the SQL query logic—no data is physically persisted. Lightweight but slow to query at scale.
The incremental materialization sits between view (no compute cost, slow queries) and table (high compute cost, fast queries). It gives you the query performance of a table with a fraction of the per-run compute.
Think of it this way: a table materialization rebuilds your entire house every morning. An incremental model only installs the new furniture that arrived overnight.
Why Use Incremental Materialization in dbt
Incremental models exist for one reason: efficiency at scale. When your source data grows into millions or billions of rows, rebuilding everything on every run becomes unsustainable.
Here are the practical benefits:
Reduced compute costs: Processing 10,000 new rows instead of 10 million means dramatically lower warehouse spend on Snowflake, BigQuery, Databricks, or Redshift.
Faster pipeline execution: Smaller data volumes mean quicker builds—turning a 45-minute model run into a 2-minute incremental update.
Lower warehouse load: Less strain on your cluster, freeing up resources for analysts running queries and dashboards.
Reduced strain on source systems: Pulling only recent data from upstream sources instead of performing full extracts.
When Incremental Models Make Sense
Incremental models pay off when:
Source tables have millions or billions of rows
Your pipeline runs frequently (hourly, every 15 minutes)
Transformations involve expensive computations (regex, UDFs, complex joins)
Warehouse costs are material and growing
When They Add Unnecessary Complexity
For small tables (under a few hundred thousand rows), table materialization rebuilds quickly and eliminates the risk of incremental edge cases. The added configuration—unique keys, filter logic, schema change handling—often isn't worth it when a full rebuild takes under a minute.
How dbt Incremental Models Work
The mechanics of an incremental model follow a straightforward two-phase pattern:
How dbt™ decides between a full build and an incremental update on each run.
Here is the step-by-step flow:
First run: dbt™ creates the target table with all source data—identical to a table materialization. No filtering is applied because there is no existing data to compare against.
Subsequent runs: dbt™ checks that the target table exists and the
-full-refreshflag was not passed. It then evaluates your filter logic (wrapped inis_incremental()) to identify only new or changed rows.Strategy application: dbt™ applies your chosen incremental strategy—merge, append, delete+insert, or insert_overwrite—to combine the new rows with the existing target table.
The filter condition that determines which rows are "new" is sometimes called a watermark. The most common watermark is a timestamp column like updated_at or created_at, where you select rows newer than the maximum value already in your target table.
dbt Incremental Strategies
An incremental strategy defines how dbt™ combines new data with existing data in the target table. The strategy you choose affects performance, correctness, and which data warehouses support it.
Strategy | How It Works | Best For |
|---|---|---|
| Inserts new rows without checking for duplicates | Event logs, immutable data |
| Upserts rows based on a unique key | Dimension tables, mutable records |
| Deletes matching rows, then inserts replacements | Large batch updates |
| Replaces entire partitions with new data | Partitioned fact tables |
| Processes data in small time-based batches | High-volume streaming data |
Strategy availability varies by adapter. Here is the support matrix:
Adapter |
|
|
|
|
|
|---|---|---|---|---|---|
dbt-snowflake | ✅ | ✅ | ✅ | ✅ | ✅ |
dbt-bigquery | ✅ | ✅ | — | ✅ | — |
dbt-databricks | ✅ | ✅ | ✅ | ✅ | ✅ |
dbt-postgres | ✅ | ✅ | ✅ | — | ✅ |
dbt-redshift | ✅ | ✅ | ✅ | — | ✅ |
Source: dbt™ incremental strategy docs
Append
The append strategy inserts new rows directly into the target table without checking for duplicates or updating existing records. It is the simplest strategy and requires no unique_key.
Best for: Immutable event streams, click logs, page views—any data where records are never updated after creation. Since there is no deduplication, ensure your source data and filter logic prevent the same event from being inserted twice.
Merge
The merge strategy performs an upsert: it inserts rows with new unique keys and updates rows with existing unique keys. This is the dbt™ merge strategy most teams reach for first.
Best for: Dimension tables, order tables, user profiles—any dataset where records can change over time and you need the target to reflect the latest state. The merge strategy implements SCD Type 1 logic (overwrite with latest values).
Performance note: Merge performs well on tables up to ~50–100M rows. Beyond that, consider switching to delete+insert with partition pruning for better performance.
Delete+Insert
The delete+insert strategy first deletes all rows in the target table that match the unique_key of incoming rows, then inserts the new rows. It separates the update into two distinct SQL operations.
Best for: Large batch updates and reprocessing scenarios where you want to fully replace matching records. Benchmarks show delete+insert can run 60–80% faster than merge on tables with 100M+ rows, especially when combined with partition pruning.
Trade-off: Unlike merge, delete+insert is not fully atomic—there is a brief intermediate state between the delete and insert operations. Most warehouses handle this within a transaction, but it is worth understanding.
Insert_Overwrite
The insert_overwrite strategy replaces entire partitions in the target table with new data. It requires the table to be partitioned (typically by date).
Best for: Date-partitioned fact tables on BigQuery, Spark, or Databricks where you want to overwrite entire days of data. Highly efficient because only affected partitions are touched—unrelated partitions remain untouched.
Microbatch
The dbt™ microbatch strategy is a newer approach that processes data in small, time-based batches. Instead of running a single query for all new data, dbt™ splits the work into multiple independent queries—one per time period (e.g., one per day or hour).
Key differences from other strategies:
You do not need to write
is_incremental()filter logic—dbt™ automatically filters based onevent_timeandbatch_size.Each batch is independent and idempotent—failed batches can be retried without affecting others.
Supports parallel batch execution for faster processing.
Use
-event-time-startand-event-time-endflags for targeted historical backfills.
Best for: High-volume time-series data, near-real-time pipelines, and scenarios where reliability is critical (failed batches don't block the entire run).
How to Configure a dbt Incremental Model
Setting up an incremental model involves five steps. Each builds on the previous one.
The five configuration steps for a dbt™ incremental model.
1. Set the Materialization to Incremental
Tell dbt™ this model should be materialized incrementally by adding the config block at the top of your model file:
You can also set this in dbt_project.yml to apply it to an entire directory of models:
2. Define a Unique Key
The unique_key tells dbt™ which column(s) identify a unique record. This enables merge/upsert behavior—without it, dbt™ can only append rows.
Single column key:
Composite key (multiple columns):
Important: Columns used as unique keys should not contain NULL values. When any unique key column is NULL, dbt™ will insert a new row instead of updating the existing one, leading to duplicates.
3. Add Incremental Logic with is_incremental()
The is_incremental() macro wraps your filter logic so it only applies during incremental runs—not on the first run or when --full-refresh is used.
The {{ this }} Jinja function refers to the current model's existing target table in the warehouse. It lets you query the already-materialized data to determine the watermark.
4. Choose an Incremental Strategy
Set the incremental_strategy in your config block. If you don't specify one, dbt™ uses the default for your adapter (typically merge for most warehouses).
Refer to the strategies comparison table above to select the right strategy for your data characteristics and warehouse.
5. Test and Validate Your Model
After configuration, validate that your incremental model works correctly:
Check that:
Row counts increase as expected (not doubling due to duplicates)
The
unique_keyconstraint holds—run adbt_utils.unique_combination_of_columnstest if using composite keysRun times decrease significantly compared to a full table rebuild
How to Use dbt is_incremental()
The is_incremental() Jinja macro is the gatekeeper of your incremental logic. It returns True only when all three conditions are met:
The model is configured with
materialized='incremental'The target table already exists in the database
The
-full-refreshflag was not passed
On the first run, is_incremental() returns False because the target table doesn't exist yet—so dbt™ builds the full table. On subsequent runs, it returns True, and your filter logic kicks in.
Common is_incremental() Patterns
Timestamp filter (most common):
ID-based filter (for auto-incrementing keys):
Lookback window (handles late-arriving data):
Timestamp with lookback buffer (recommended best practice):
The lookback buffer is strongly recommended by the dbt™ best practices guide. It re-processes the last few days of data to catch records that arrived late. Combined with a unique_key, existing rows get updated instead of duplicated—you process slightly more data but avoid silent data gaps.
Handling dbt on_schema_change
Source schemas evolve. Columns get added, renamed, or removed. The on_schema_change config option controls how dbt™ handles schema drift between your model's current SQL output and the existing target table.
Or set it in dbt_project.yml:
on_schema_change Options
Option | Behavior |
|---|---|
| Ignores schema changes (default) |
| Fails the run if schema changes are detected |
| Adds new columns, ignores removed columns |
| Adds new columns and removes missing columns |
ignore
The default behavior. If your model SQL now selects a new column, it silently won't appear in the target table. Removed columns won't cause an error. This is safe but can lead to silent data loss—you think a column is being captured, but it is not.
fail
Triggers an error when the source and target schemas diverge. This forces a manual review—you must explicitly decide how to handle the change (usually by running --full-refresh). Good for teams that want strict control over schema evolution.
append_new_columns
Automatically adds new columns to the target table. Existing rows will have NULL for the new columns (dbt™ does not backfill). Columns that no longer exist in the model SQL remain in the target table. This is a good middle ground for most teams.
sync_all_columns
The most aggressive option—adds new columns and removes columns that no longer exist in the model SQL. Use this carefully, as dropping columns from your target table is irreversible without a full refresh.
Note: None of these options backfill values in existing rows for newly added columns. To populate historical data for a new column, run
--full-refresh.
When to Run dbt --full-refresh
Incremental models inevitably drift from source data over time. Late-arriving facts, logic changes, and schema updates all create scenarios where you need to rebuild the entire table from scratch.
Scenarios Requiring Full Refresh
Model logic changed: You updated the transformation SQL—existing rows in the target table were built with the old logic and need to be reprocessed.
Backfill needed: Historical data requires reprocessing, either due to source corrections or a new column that needs historical values.
Data quality fix: Erroneous records are already in the target table and can't be corrected through incremental updates alone.
Schema sync: After adding or removing columns when using
on_schema_change: fail.Drift reset: Incremental models naturally drift from source over time. A periodic full refresh (e.g., weekly during low-activity periods) resets this drift.
Decision tree for when to use --full-refresh on a dbt™ incremental model.
The full_refresh config can also be set at the model or project level to permanently override the flag behavior:
Using dbt Incremental Predicates for Large Tables
For tables with billions of rows, even the is_incremental() filter may not prevent expensive full-table scans during the merge or delete step. The incremental_predicates config pushes additional WHERE conditions directly into the merge/delete SQL statement, enabling partition pruning on the target table.
What it does: Adds WHERE conditions to the MERGE or DELETE statement that dbt™ generates, limiting the scan on the existing target table.
Why it matters: Without predicates, a merge statement must scan the entire target table to find matching rows. With predicates, the warehouse can prune partitions and scan only relevant data—potentially reducing costs by 90%+ on large partitioned tables.
When to use: Tables with hundreds of millions to billions of rows, partitioned by date or another key column.
Configuration
The DBT_INTERNAL_DEST alias refers to the existing target table in the generated merge SQL. You can also use DBT_INTERNAL_SOURCE to filter the incoming data:
Tip: Make sure your predicate columns align with your table's partition or cluster columns. A predicate on a non-partitioned column won't enable partition pruning.
Best Practices for dbt Incremental Models
Use Partitioning and Clustering
Warehouse partitioning and clustering are force multipliers for incremental models. Partitioning divides your table into segments (typically by date), and clustering sorts data within partitions by frequently-filtered columns.
On BigQuery, define partitioning in your model config:
On Snowflake, use clustering:
When combined with incremental_predicates, partitioning ensures that merge and delete operations only scan relevant partitions—not the entire table.
Minimize Data Scanned with Targeted Filters
Write narrow, specific filters in your is_incremental() block. Avoid patterns that scan the entire source table:
The second pattern is especially useful when your source table is also large. It limits the source scan to a fixed window, regardless of what is in the target table.
Set Appropriate Unique Keys
Choose keys that genuinely identify unique records. A poorly chosen unique key is the number one cause of duplicate records in incremental models.
Use a single natural key when one exists (e.g.,
order_id,transaction_id)Use composite keys when no single column is unique:
unique_key=['user_id', 'event_date', 'event_type']Ensure key columns are never NULL—null keys cause inserts instead of updates
Test uniqueness with dbt™ tests:
Monitor Incremental Model Performance
Track run times and row counts over time. A sudden spike in run time may indicate:
Your filter logic is selecting too many rows (watermark drift)
Upstream data volumes changed unexpectedly
Partition pruning stopped working due to a config change
Paradime's Code IDE provides column-level lineage visibility, letting you trace how changes in upstream models propagate through your pipeline. This is invaluable for identifying why an incremental model's performance suddenly degraded.
Document Your Incremental Logic
Add descriptions explaining your incremental logic, lookback windows, and unique key choices. Future teammates (and future you) will need to understand why you chose a 3-day lookback versus a 7-day lookback, or why the unique key is a composite of three columns.
Troubleshooting Common dbt Incremental Model Issues
Duplicate Records After Incremental Runs
Symptom: Row counts keep growing, and you find multiple rows with the same business key.
Root causes:
Missing
unique_key—without it, dbt™ defaults to append-only behaviorunique_keycolumn contains NULL values (NULLs don't match in SQL, so dbt™ inserts instead of updates)unique_keydoesn't actually identify unique records (e.g., usinguser_idwhen multiple orders per user exist)Using
appendstrategy whenmergeis needed
Fix: Verify your unique key truly identifies unique records. Use composite keys when needed. Add not_null and unique tests. If duplicates already exist in the target table, run --full-refresh to reset.
Missing Data from Late-Arriving Records
Symptom: Some source records never appear in the target table, especially records from a few days ago.
Root cause: Records arrived in the source after your max(updated_at) watermark had already moved past their timestamp.
Fix: Add a lookback window to your filter:
This re-processes the last 3 days of data on every run. Combined with a unique_key, existing rows get updated (not duplicated), and late-arriving records get captured.
Schema Mismatch Errors
Symptom: dbt™ run fails with an error about mismatched columns between source and target.
Root cause: A column was added to or removed from your source data or model SQL, but on_schema_change is set to fail (or defaults to ignore, which can cause silent issues).
Fix: Either:
Update
on_schema_changetoappend_new_columnsorsync_all_columnsRun
dbt run --full-refresh --select my_modelto rebuild the target table with the new schema
Unexpected Full Table Scans
Symptom: Incremental runs are as slow and expensive as full table builds, even though only a small amount of new data exists.
Root causes:
incremental_predicatesnot configured—merge statement scans the entire target tablePredicate columns don't align with partition/cluster columns, so pruning doesn't occur
is_incremental()filter returns too many rows (overly wide lookback window)
Fix: Add incremental_predicates targeting your partition column. Verify that the predicate column matches your table's partitioning scheme. Narrow your lookback window.
Build Faster dbt Pipelines with Incremental Models
Incremental models are foundational to efficient, cost-effective dbt™ pipelines. They transform what would be expensive, full-table rebuilds into lightweight, targeted updates—reducing warehouse spend, accelerating pipeline execution, and enabling higher-frequency data refreshes.
The key is matching the right strategy to your data: merge for mutable dimension tables, append for immutable event streams, delete+insert for large batch updates, insert_overwrite for partitioned fact tables, and microbatch for high-volume time-series data.
Paradime's Bolt schedules incremental dbt™ runs with TurboCI for slim CI, while Paradime's Code IDE surfaces column-level lineage so teams can confidently iterate on incremental logic without breaking downstream dependencies. When you change a filter condition or update a unique key, you see exactly which downstream models and columns are affected—before you merge the PR.
Start for free to build and orchestrate incremental dbt™ models with AI-powered development tools.
FAQs about dbt Incremental Models
Can I use dbt incremental models with all data warehouses?
Most major warehouses support incremental models, but available dbt™ incremental strategies vary by adapter. Snowflake and Databricks support all five strategies (append, merge, delete+insert, insert_overwrite, microbatch). BigQuery supports append, merge, and insert_overwrite but not delete+insert. Postgres and Redshift support append, merge, delete+insert, and microbatch but not insert_overwrite. Always check the adapter-specific docs before choosing a strategy.
How do I migrate an existing dbt table model to incremental?
Follow these steps: change the materialization to incremental, add a unique_key that identifies unique records, wrap your filter logic in an {% if is_incremental() %} block, then run dbt run --full-refresh --select my_model to establish the baseline table. Subsequent runs will process only new or changed data.
What happens if my dbt incremental model fails mid-run?
Most warehouses wrap incremental operations in transactions, so a mid-run failure typically rolls back changes and leaves your target table unchanged. You can safely retry the run. The dbt™ microbatch strategy takes this further—each batch is independent, so a failure in one batch doesn't affect others, and you can use dbt retry to reprocess only the failed batches.
How do dbt incremental models affect CI/CD pipelines?
In CI environments, incremental models often run against empty or cloned schemas where the target table doesn't exist. Since is_incremental() returns False when the target table is missing, the model builds as a full table in CI. Plan your CI strategy accordingly—consider using the --empty flag in dbt™ 1.8+ to validate SQL compilation without processing data, or accept that CI runs will perform full builds.
Should I use incremental models for small dbt tables?
For small tables (under a few hundred thousand rows), the added complexity of incremental logic—unique keys, filter conditions, schema change handling, lookback windows—often isn't worth it. Table materialization rebuilds quickly, is simpler to maintain, is always accurate, and eliminates the risk of incremental edge cases like late-arriving data and watermark drift. Reserve incremental models for tables where full rebuilds are genuinely too slow or too expensive.