The Best dbt™ Test Packages for Data Teams
Feb 26, 2026
The Best dbt™ Test Packages for Data Teams
dbt™ Core ships with four generic tests—unique, not_null, accepted_values, and relationships—which cover the basics, but production data pipelines demand far more. Validating value ranges, catching schema drift, flagging volume anomalies, and evaluating AI outputs all require purpose-built extensions. That's where dbt™ test packages come in: community-built modules that plug directly into your project via packages.yml, giving your team dozens of new tests without writing custom SQL from scratch.
This guide breaks down the top dbt™ test packages available today, explains how to install and configure them, and walks through practical examples you can copy into your project right away. Whether you're building your first dbt™ project or hardening a production pipeline that runs across hundreds of models, you'll find the right testing toolkit here.
What are dbt™ test packages?
dbt™ test packages are reusable code modules that extend the testing capabilities built into dbt™ Core. Instead of limiting yourself to the four generic tests that ship out of the box, you can install community-maintained packages that add dozens—or even hundreds—of additional tests, all designed to validate data quality at different stages of your pipeline.
Before diving into specific packages, here are three core concepts that frame the discussion:
dbt™ test: A command (
dbt test) that validates data quality by running SQL assertions against your models. If a test query returns rows, the test fails, signaling a data quality issue.dbt™ packages: Reusable code modules installed from the dbt™ Package Hub or directly from Git repositories. Packages can include macros, tests, models, and more.
Generic tests: Parameterized tests you can apply to any column or model through YAML configuration. They're reusable by design—define once, apply everywhere.
Installing a test package is straightforward: you add it to your packages.yml file and run dbt deps. From there, every test in the package becomes available for use across your entire project.
Why native dbt™ tests fall short
dbt™ Core's four generic tests—unique, not_null, accepted_values, and relationships—are a solid starting point, and they are not going anywhere. But production data teams consistently run into scenarios where these four just aren't enough. This isn't a criticism of dbt™ Core; it's a recognition that data quality testing is a broad domain, and no four tests can cover it all.
Limited generic test types
The four built-in tests handle basic integrity checks: Is this column unique? Is it non-null? Does it contain only expected values? Does it reference a valid foreign key? These are necessary, but they leave significant gaps.
What if you need to validate that a numeric column stays within a specific range? Or that a date column is always after a certain cutoff? Or that a composite key made up of three columns is unique across the table? Native dbt™ doesn't provide parameterized tests for these scenarios. You'd have to write singular tests—one-off SQL files in your tests/ directory—which don't scale well when you have hundreds of models.
No built-in schema drift detection
When an upstream source table drops a column, renames a field, or changes a data type, dbt™ Core won't proactively alert you. Your models may still compile and run, but produce incorrect results—or silently drop data. Schema drift is one of the most common causes of broken data pipelines, and catching it requires tests that compare your current schema against an expected state. Native dbt™ doesn't include this capability.
Missing row count and volume anomaly checks
In production, one of the most telling signals that something has gone wrong is an unexpected change in data volume. A table that normally receives 100,000 rows per day suddenly shows 500 rows—or 10 million. Native dbt™ has no mechanism for detecting these anomalies. You'd need to manually set up singular tests with hardcoded thresholds, then update them every time your data patterns change. That's neither scalable nor maintainable.
Top dbt™ testing packages for data quality
This is where the ecosystem shines. The dbt™ community has built a rich library of testing packages, each solving a specific category of data quality challenges. Here are the five packages that cover the widest range of use cases for modern data teams:
Package | Primary Use Case | Best For |
|---|---|---|
dbt_utils | General-purpose testing helpers | All dbt™ projects |
dbt_expectations | Great Expectations-style assertions | Complex validation rules |
Elementary | Anomaly detection and observability | Production monitoring |
dbt_constraints | Database constraint enforcement | Referential integrity |
dbt-llm-evals | LLM output quality scoring | AI/ML pipelines |
dbt_utils
dbt_utils is the foundational package most teams install first, and for good reason. Maintained by dbt™ Labs, it includes 17 generic tests alongside a collection of macros for SQL generation, cross-database compatibility, and more.
The most-used tests from dbt_utils include:
not_null_proportion: Tests that the proportion of non-null values in a column meets a minimum threshold. Unlike the nativenot_nulltest, which fails if even a single NULL exists, this test lets you set an acceptable NULL rate—perfect for columns where partial data is expected.unique_combination_of_columns: Validates that a composite key (made up of two or more columns) is unique across the table. This is essential for models with natural keys that span multiple fields, and it's highly performant even on large datasets.expression_is_true: A flexible test that evaluates any SQL expression you define. This is the Swiss Army knife of dbt_utils—use it when no other test quite fits your scenario.
Additional useful tests include equal_rowcount (verifies two tables have the same number of rows), fewer_rows_than (confirms a target table has fewer rows than a source), and not_accepted_values (the inverse of accepted_values, checking that specific values are not present).
If you install only one testing package, make it dbt_utils.
dbt_expectations
dbt_expectations brings the expressiveness of the Great Expectations Python library into the dbt™ ecosystem. With 62 generic tests organized across seven categories—table shape, missing values, sets and ranges, string matching, aggregate functions, multi-column comparisons, and distributional functions—it's the most comprehensive testing package available.
Key tests to know:
expect_column_values_to_be_between: Validates that all values in a numeric column fall within a specified min/max range. Essential for catching outliers and data entry errors.expect_table_row_count_to_be_between: Asserts that a table's row count falls within an expected range—a simple but effective volume check.expect_column_to_exist: Confirms that a specific column exists in the model, providing a basic schema drift detection mechanism.expect_column_pair_values_A_to_be_greater_than_B: Validates relationships between two columns, such as ensuringend_dateis always afterstart_date.
The naming convention is intentional: each test reads like a sentence ("I expect column values to be between X and Y"), making your YAML configuration self-documenting. For the full list of available tests, refer to the dbt-expectations package documentation on GitHub.
Elementary
Elementary takes a fundamentally different approach to testing. Instead of requiring you to define explicit pass/fail thresholds, Elementary uses statistical methods to automatically detect anomalies in your data. It's available as both an open-source dbt™ package and a cloud product with additional automation features.
Core capabilities include:
volume_anomalies: Monitors row counts over time and flags statistically significant deviations. You can customize the test with parameters liketimestamp_column,where_expression,detection_delay, andseasonality—but you don't have to set manual thresholds. Elementary's algorithms determine what's normal and what isn't.freshness_anomalies: Detects when data stops arriving on schedule or arrives significantly later than expected.schema_changes: A single test that detects deleted tables, deleted or added columns, and data type changes—covering multiple schema drift scenarios in one configuration.
Elementary also includes a built-in observability dashboard that visualizes test results, anomaly trends, and pipeline health over time. For teams running dbt™ in production, it's one of the most impactful packages you can add.
dbt_constraints
dbt_constraints, created by Snowflake Labs, bridges the gap between dbt™ tests and actual database constraints. While dbt™ tests validate data quality at the application level (by running queries), dbt_constraints generates real PRIMARY KEY, FOREIGN KEY, and UNIQUE constraints at the database level.
The package provides three core tests:
primary_key: Enforces primary key constraints, supporting both single-column and multi-column composite keys.unique_key: Enforces unique constraints at the database level.foreign_key: Enforces referential integrity between tables, ensuring that foreign key values always reference valid primary keys.
This is particularly valuable for data warehouses that support constraint enforcement (Snowflake, PostgreSQL, Oracle, and Redshift), and for BI tools that leverage database constraints for query optimization.
dbt-llm-evals for AI and LLM outputs
dbt-llm-evals is Paradime's open-source package for evaluating LLM-generated content directly within your data warehouse. As AI-powered features become standard in data products, data teams need a way to measure and monitor the quality of LLM outputs—just as they test the quality of traditional data transformations.
The package uses an "LLM-as-a-judge" pattern: a judge model evaluates your AI model's outputs on criteria like accuracy, relevance, tone, and completeness, scoring each output on a 1–10 scale with detailed reasoning. Key features include:
Warehouse-native execution: All evaluations run inside Snowflake Cortex, BigQuery Vertex AI, or Databricks AI Functions—no external API calls, no data egress.
Automatic baseline detection: The package automatically establishes baselines, so you can track quality drift over time without manual configuration.
Prompt capture and evaluation: Captures prompts alongside inputs and outputs for comprehensive evaluation.
Configurable criteria: Evaluate any combination of quality dimensions, with flexible sampling and threshold settings.
If your team is building AI or LLM-powered pipelines and wants the same rigor of testing that you apply to traditional data models, dbt-llm-evals fills that gap.
How to install dbt™ testing packages
Installing dbt™ packages follows a consistent two-step process: add the package to your packages.yml file, then run dbt deps to download and install it.
Here's an example packages.yml that installs both dbt_utils and dbt_expectations:
After saving this file, run:
dbt™ will download the specified versions of each package into your dbt_packages/ directory. From there, all tests and macros in those packages become available throughout your project.
A few important notes:
Version pinning matters. Using version ranges (as shown above) protects you from breaking changes while still allowing patch updates.
Some packages have dependencies. For example, dbt_expectations depends on dbt_utils, so you'll need both installed. If you declare dbt_expectations in your
packages.yml, dbt™ will resolve the dependency automatically—but it's good practice to declare both explicitly.Run
dbt depsin CI/CD too. Your CI pipeline needs to install packages before running tests, so make suredbt depsis part of your pipeline configuration.
dbt™ test examples by use case
Theory is useful, but copy-paste examples are better. Here are practical YAML configurations for the most common data quality scenarios.
Testing for NULL value thresholds
Not every NULL is a problem. In many datasets, a column like middle_name or phone_number will legitimately have NULL values for some percentage of rows. The dbt_utils.not_null_proportion test lets you set an acceptable threshold:
This test passes as long as at least 95% of the email column is non-null. Adjust the threshold to match your data's expected completeness.
Testing primary and natural keys
Many data models rely on composite keys—combinations of columns that together uniquely identify a row. The dbt_utils.unique_combination_of_columns test handles this cleanly:
This validates that the combination of order_id and order_line_number is unique across the entire table. Use this for natural keys (business-defined identifiers like order numbers) as opposed to surrogate keys (database-generated identifiers like auto-incrementing IDs), which typically need only a simple unique test.
Detecting schema changes
Schema drift—when upstream columns are renamed, dropped, or have their types changed—is a silent pipeline killer. You can guard against it with a combination of dbt_expectations and Elementary:
The expect_column_to_exist tests catch specific missing columns, while Elementary's schema_changes test provides broader protection by detecting any structural change to the model.
Testing for volume anomalies
Elementary's volume_anomalies test uses statistical methods to flag unusual row counts without requiring hardcoded thresholds:
The seasonality parameter tells Elementary to account for predictable patterns—like lower traffic on weekends—when determining what counts as an anomaly. This eliminates the false positives that plague manually configured volume tests.
dbt™ testing during development
Testing shouldn't be an afterthought that only happens in production. The earlier you catch data quality issues, the cheaper and faster they are to fix. Here's how to integrate testing into your development workflow.
Running tests locally with deferred runs
When you're developing a new model or modifying an existing one, you don't want to rebuild your entire project just to run tests. Deferred execution solves this: it lets you test your changed models against the production state of your data warehouse, without rebuilding upstream dependencies.
In practice, this means you can run:
This runs tests only on the model you changed, using production data for all upstream references. Tools like Paradime's Code IDE support deferred runs natively, making this workflow seamless during development.
Enforcing test coverage standards
As your dbt™ project grows, it becomes increasingly easy for models to slip through without adequate test coverage. The dbt-meta-testing package helps by enforcing test coverage policies:
This configuration requires that fct_revenue has at least one unique test and one not_null test. If a developer creates a model without meeting these minimums, the test coverage check fails—catching gaps before they reach production.
dbt™ testing in CI/CD pipelines
Tests that only run in production are tests that catch problems too late. The most effective data teams run dbt™ tests on every pull request, ensuring that code changes are validated before they merge.
Slim CI for faster pull request validation
Running your entire test suite on every pull request is slow and expensive. Slim CI (called TurboCI in Paradime Bolt) solves this by running tests only on models that were modified in the pull request—plus their downstream dependencies. This approach provides:
Faster feedback loops: Developers get test results in minutes, not hours.
Lower warehouse costs: You're only querying the tables that matter for the specific change.
Higher adoption: When CI is fast, developers actually wait for it to pass before merging.
The key command is:
This selects all modified models and everything downstream of them, ensuring that your changes haven't broken any dependent models or tests.
Column-level lineage for impact analysis
When a pull request modifies a model, which downstream tests are actually relevant? Column-level lineage answers this question by tracing how individual columns flow through your DAG. If you change the logic for a revenue column in a staging model, column-level lineage shows you every downstream model and test that depends on that specific column.
Paradime Bolt includes column-level lineage directly in its CI diff views, so reviewers can immediately see the blast radius of a code change and verify that the right tests are running.
Monitoring and alerting on dbt™ test failures
Running tests is only half the equation. The other half is making sure the right people find out about failures quickly enough to take action.
Centralizing test results
When you're running dbt™ tests across multiple schedules—hourly source freshness checks, daily model tests, weekly full validation runs—test results can scatter across log files, terminal outputs, and CI job histories. A centralized dashboard brings all of this together.
Elementary's open-source UI provides a test results dashboard that aggregates results across runs, highlights trends, and lets you drill into individual failures. Paradime Bolt's execution monitoring offers similar capabilities with additional features like historical run comparisons and SLA tracking.
The goal is a single pane of glass where your team can answer: "What's failing right now, and is it getting better or worse?"
Integrating with Slack, JIRA, and observability tools
Dashboards are useful, but proactive notifications are essential. The most common integration patterns for dbt™ test failures include:
Slack or Microsoft Teams: Immediate alerts when tests fail, routed to the appropriate channel based on the model owner or domain.
JIRA or Linear: Automatic ticket creation for test failures, so issues get tracked and assigned without manual intervention.
DataDog or Monte Carlo: Feeding test results into broader observability platforms for correlation with infrastructure metrics.
Paradime Bolt supports all of these integrations natively, allowing you to configure alert routing rules that match your team's incident response workflow.
dbt-expectations package documentation and configuration
The dbt_expectations package deserves a deeper dive given its breadth—62 tests across seven categories make it the most feature-rich testing package in the dbt™ ecosystem. Here's how to get it set up and configured.
Installing dbt_expectations
Add the following to your packages.yml:
Note that dbt_expectations depends on dbt_utils, so both packages must be present. While dbt™ can resolve this dependency automatically, explicitly declaring both gives you control over version pinning and avoids surprises.
After updating your packages.yml, run dbt deps to install.
Configuring common dbt_expectations tests
Here are the most frequently used tests from the package, with configuration examples:
expect_column_values_to_be_in_set validates that a column contains only allowed values—similar to the native accepted_values test, but with additional options:
expect_column_values_to_match_regex validates string patterns, making it ideal for columns like email addresses, phone numbers, or standardized codes:
expect_table_row_count_to_be_between sets explicit bounds on expected table size—useful when you know the approximate volume of a table and want a simple guardrail:
expect_table_row_count_to_equal_other_table verifies that two tables have identical row counts—essential for validating that a transformation hasn't accidentally filtered or duplicated rows:
expect_column_pair_values_to_be_equal confirms that two columns in the same table contain identical values, useful for validating join keys or redundant fields:
How to choose the right dbt™ test packages
With multiple packages available, here's a practical decision framework based on where your team is:
Starting out? Install
dbt_utils. It covers the most common testing scenarios—composite key validation, NULL proportion checks, row count comparisons—and serves as the foundation that other packages build on.Need complex assertions? Add
dbt_expectations. Its 62 tests cover value ranges, string patterns, table shape, aggregate validations, and multi-column comparisons. If you can describe what you expect from your data, there's probably a dbt_expectations test for it.Running in production? Add Elementary for anomaly detection and observability. Its statistical approach to volume, freshness, and schema monitoring catches issues that static threshold-based tests miss.
Enforcing referential integrity? Add
dbt_constraintsto generate real database constraints from your dbt™ test definitions, particularly valuable if your BI tools or query optimizers leverage constraint metadata.Building AI pipelines? Add
dbt-llm-evalsfor LLM output quality scoring. It brings the same rigor of data testing to AI outputs, running evaluations entirely within your warehouse using native AI functions.
Most mature data teams end up using a combination of dbt_utils, dbt_expectations, and Elementary as their core testing stack, then layer in specialized packages as needed.
Run dbt™ tests at scale with Paradime Bolt
As your dbt™ project grows from tens to hundreds to thousands of models, managing test execution, CI/CD pipelines, and failure alerting becomes its own operational challenge. Paradime Bolt handles this complexity in a single platform:
TurboCI: Runs slim CI on every pull request, testing only modified models and their downstream dependencies for fast feedback loops.
Real-time failure alerts: Routes test failure notifications to Slack, Microsoft Teams, JIRA, Linear, or your observability platform of choice.
Column-level lineage in CI: Shows the exact blast radius of code changes, so reviewers know which tests matter for each pull request.
Execution monitoring: A centralized dashboard for all dbt™ runs across schedules, with historical comparisons and SLA tracking.
If your team is spending more time managing test infrastructure than writing tests, it's worth exploring what Paradime Bolt can automate for you. Start for free.
FAQs about dbt™ test packages
What is the performance impact of adding multiple dbt™ test packages?
Each test you add translates to a SQL query executed against your warehouse, so there is a direct relationship between test count and execution time. However, the impact is manageable with the right approach. Using slim CI to run only tests relevant to changed models keeps pull request validation fast. For production runs, you can use dbt™ test selectors to prioritize critical tests and run comprehensive suites on a less frequent schedule. The cost of a few extra queries is almost always worth the data quality issues they prevent.
Which dbt™ test packages work with Snowflake, BigQuery, or Databricks?
dbt_utils, dbt_expectations, and Elementary are cross-platform and work across all major data warehouses, including Snowflake, BigQuery, Databricks, Redshift, and PostgreSQL. dbt_constraints supports Snowflake, PostgreSQL, Oracle, and Redshift. dbt-llm-evals requires warehouse-native AI functions—specifically Snowflake Cortex, BigQuery Vertex AI, or Databricks AI Functions—so it's limited to those three platforms.
Can dbt_utils and dbt_expectations be used together in the same project?
Yes, and in fact this is the recommended setup. dbt_expectations actually depends on dbt_utils, so both will be installed whenever you add dbt_expectations to your project. Most teams use dbt_utils for general-purpose tests (composite key validation, NULL proportion checks) and layer in dbt_expectations tests for more specific assertions (value ranges, regex matching, row count bounds). There are no conflicts between the two packages.
How should data teams handle dbt™ test failures across hundreds of models?
At scale, you need centralized monitoring and automated triage. Tools like Paradime Bolt and Elementary aggregate test results into a single dashboard, allowing you to sort failures by severity, model, and recency. Set up automated alert routing so that failures on critical models trigger immediate Slack notifications, while lower-priority failures create JIRA tickets for next-sprint investigation. Tag models with owners and domains in your dbt_project.yml so alerts reach the right team automatically.
What is the difference between dbt™ generic tests and singular tests?
Generic tests are reusable assertions that you configure in YAML and can apply to any model or column. All four of dbt™ Core's built-in tests (unique, not_null, accepted_values, relationships) are generic tests, and every test in packages like dbt_utils and dbt_expectations follows this pattern. Singular tests, by contrast, are custom SQL queries saved as individual .sql files in your tests/ directory. They're designed for one-off validations that are too specific to generalize—like a business rule that only applies to a single model. Use generic tests as your default and singular tests as your escape hatch.


