Automating Data Documentation in dbt™: Beyond dbt™ based catalog (with Paradime Catalog and DinoAI)

Feb 26, 2026

Table of Contents

The Complete dbt™ Documentation Generator Guide: From Stale Docs to Near-100% Coverage

Every analytics team has felt the sting. A stakeholder asks what fct_revenue actually measures, and the answer lives in one engineer's head — the same engineer who left last quarter. The schema.yml file says "TODO: add description", the dbt™ docs site hasn't been regenerated since January, and nobody remembers which columns in stg_payments are PII.

Stale docs. Missing context. Tribal knowledge. These aren't edge cases — they're the default state of most dbt™ projects.

This guide makes the pain tangible, then provides a concrete workflow that achieves near-100% documentation coverage using dbt™ best practices, Paradime Catalog, and AI enrichment with DinoAI. Whether you're running dbt Core™ or dbt Cloud™, the patterns here apply.

Why `dbt docs generate` Often Fails in Practice

The dbt docs generate command is powerful in theory. It compiles your project metadata into manifest.json and catalog.json, and dbt docs serve renders a browsable site. But in practice, most teams hit the same three walls.

It's Manual and Gets Skipped

Documentation in dbt™ depends on developers writing descriptions in schema.yml files:

This works well for five models. At 200+ models with 20+ columns each, it becomes a bottleneck. Under sprint pressure, developers skip descriptions, leave "TODO" placeholders, and the gap compounds. Nobody runs dbt docs generate in the PR review because it's not part of the CI pipeline.

Docs Drift from Reality

Even when a team writes thorough documentation, it decays the moment code changes:

Figure 1: The documentation decay timeline — docs become stale within weeks of initial creation.

The static HTML output of dbt docs serve is a snapshot in time. If your warehouse adds columns, your staging model gains a new CTE, or a business rule changes — the docs don't know. There's no automatic feedback loop between code changes and documentation freshness.

Hard to Enforce Coverage Standards

Out of the box, dbt™ has no built-in mechanism to block a PR when documentation is missing. You can measure coverage with community tools like dbt-coverage:

This produces output like:

But integrating this into CI, setting thresholds, and actually failing builds requires custom scripting that most teams never implement. The result? Documentation coverage silently erodes from 80% to 40% over a quarter, and nobody notices until an audit or an incident.

Paradime Catalog: Always-On Documentation (What Changes)

Paradime's Data Catalog fundamentally changes the documentation workflow by removing the manual generate → serve → hope it's current cycle. Instead of static HTML snapshots, you get a live, always-updated catalog that sits inside the development environment.

Automatic Updates and Discoverability

The key shift: documentation is no longer a separate artifact you build and deploy. Paradime Catalog syncs metadata from your dbt™ project and warehouse schema in real time:

Overnight automatic refresh keeps the catalog current without any manual intervention
On-demand refresh via the Paradime Refresh Catalog command in Bolt lets you trigger updates after schema changes or production deployments
Bi-directional YAML sync means edits made in the Catalog UI propagate directly to your schema.yml files (and vice versa), all version-controlled in Git

Figure 2: Traditional dbt™ docs workflow vs. Paradime Catalog's always-on approach.

The catalog covers the full dbt™ asset surface — models, tests, macros, exposures, and sources — with metadata pulled from both your .yml files and the information schema. Each model shows:

Summary: quality score, properties, classification tags, description
Columns: name, data type, description, tests, classification (from meta key-value pairs)
Code: source SQL and compiled SQL
Lineage: upstream and downstream dependencies, expandable and filterable
Quality: test monitoring with status, impacted rows, and result messages

Beyond dbt™ assets, the catalog integrates with Looker, Tableau, Power BI, ThoughtSpot, and Fivetran, giving you cross-platform lineage and discoverability from a single pane.

How Teams Use Catalog Day-to-Day

In practice, teams use Paradime Catalog in three recurring patterns:

Onboarding: New engineers explore model lineage and descriptions to understand the project without relying on tribal knowledge from senior team members.
Impact analysis: Before changing a model, check downstream dependencies in the lineage view — including BI dashboards and pipeline connectors — to understand blast radius.
Stakeholder self-service: Business users get free read-only access, reducing the constant Slack pings of "what does this column mean?"

The Catalog tab is built into the Paradime Code IDE, so developers can view, edit, and save documentation without context-switching. Every save updates the corresponding .yml file in your Git branch.

AI Enrichment: Generate schema.yml at Scale with DinoAI

Paradime's DinoAI is an AI assistant that understands your dbt™ project structure, warehouse metadata, and analytics engineering patterns. For documentation specifically, it eliminates the blank-page problem that keeps schema.yml files empty.

Prompt Patterns for Model and Column Descriptions

DinoAI operates in two modes:

Agent Mode (default): Creates and modifies files directly in your project. Use prompts like "Document all models in the marts/finance folder" or "Update sources.yml to include the new marketing tables".
Ask Mode: Provides guidance and explanations without making file changes.

For documentation, Agent Mode is where the leverage is. DinoAI generates context-specific descriptions by reading your SQL logic, column names, and warehouse metadata.

You can also use the Catalog tab's Autogenerate button: click it, and DinoAI produces both business and technical summaries for the selected model and its columns. Review, edit, and save — the descriptions flow directly into your .yml files.

Here's what DinoAI-generated documentation looks like in practice:

How to Use @context for Higher-Quality Definitions

DinoAI's quality scales with the context you provide. Context types include:

Context Type	What It Provides	When to Use
File Context	Individual files from your project	Targeted tasks on specific models
Directory Context	Entire folders of related files	Documenting a full model layer
Inline File Context	Specific code selections and line numbers	Explaining complex logic
Terminal Context	Terminal output, error messages	Debugging documentation issues

For best results documenting a model, add the model's SQL file and its upstream dependencies as context. DinoAI can then trace the lineage from raw source through staging to the final mart and produce descriptions that accurately reflect the transformation logic.

Bulk Workflows and Review Steps

For large-scale documentation efforts, combine DinoAI with .dinoprompts — a YAML file that serves as your team's reusable prompt library:

The review step is critical. After DinoAI generates descriptions:

Preview before applying: Agent Mode shows all changes in preview boxes — click Accept or Reject per file
Commit to a branch: Changes are Git-tracked, so they go through your normal PR review process
Domain expert review: Route PRs to the team member who owns each model for a factual accuracy check

Figure 3: The AI-assisted documentation workflow with human review checkpoints.

Raise Quality: Add Tests and Ownership Metadata

Documentation without tests is just prose. Tests validate that the descriptions match reality. Ownership metadata ensures someone is accountable when they don't.

High-Signal Tests per Model Type

Not every model needs the same test coverage. Here's a practical matrix:

Model Layer	Essential Tests	Why
Sources	`not_null` on key columns, `freshness` checks	Catch upstream issues before they propagate
Staging	`unique` + `not_null` on primary key	Guarantee the grain — every downstream model depends on this
Intermediate	`not_null` on join keys, `relationships` to parents	Validate referential integrity across joins
Marts	`unique` + `not_null` on PK, `accepted_values` on status/category columns, `relationships` to dimensions	These feed dashboards — failures here impact stakeholders directly

In schema.yml, a well-tested mart model looks like this:

The dbt-project-evaluator package can enforce these patterns automatically. Its fct_missing_primary_key_tests rule flags every model that lacks either a unique + not_null test on a single column or a dbt_utils.unique_combination_of_columns test.

Owner/SLAs and How to Keep Them Current

The dbt™ meta config accepts any key-value pair and compiles into manifest.json. Use it for governance metadata:

Override at the individual model level when needed:

In Paradime Catalog, these meta values appear as searchable classifications — filterable by owner, domain, sensitivity level, or any custom key you define.

To keep ownership current, add a .dinoprompts entry that audits ownership during PR review:

Documentation Coverage Playbook (30/60/90 Days)

Going from 30% coverage to near-100% requires a phased approach. Trying to document everything at once leads to burnout and low-quality descriptions. Here's a battle-tested playbook.

Figure 4: A phased 30/60/90-day rollout to achieve near-100% documentation coverage.

Start with Core Marts (Days 1–30)

Goal: 100% documentation and PK tests on all mart models.

Audit: Run dbt-coverage compute doc to get your baseline. Identify which mart models have zero descriptions.
Bulk generate with DinoAI: Use Agent Mode — "Document all models in the marts/ directory" — with directory context added. Review every generated description for accuracy.
Add primary key tests: Every mart model gets unique + not_null on its primary key. No exceptions.
Set ownership: Add meta.owner to every mart model in dbt_project.yml at the directory level.

Expand to Staging and Sources (Days 31–60)

Goal: 100% documentation on staging; source freshness tests configured.

Staging models: Use DinoAI with the upstream source files as context. Staging descriptions should reference the source system and any renaming/casting applied.
Sources: Document every source table and its key columns. Configure freshness blocks:
Configure .dinorules: Set persistent standards so all future DinoAI generations follow your team's conventions:

Add Governance Guardrails in CI (Days 61–90)

Goal: Automated coverage checks that prevent regression.

Add dbt-coverage to your CI pipeline:

The --cov-fail-under 0.9 flag fails the build if documentation coverage drops below 90%.

Add the dbt-project-evaluator package to catch missing primary key tests and measure the test_coverage_pct project-wide.
Configure Paradime Bolt's Turbo CI to run dbt build --select state:modified+ --target ci on every pull request. This builds and tests only modified models in a temporary schema, giving immediate feedback before merge.

Common Pitfalls and How to Avoid Them

AI Hallucinated Definitions

DinoAI (and any LLM) can generate descriptions that sound plausible but are factually wrong. A column named mrr might get described as "Monthly Recurring Revenue" when in your project it's actually "Marketing Response Rate."

Mitigation strategies:

Always provide context: Add the model's SQL file and upstream dependencies so DinoAI reads the actual transformation logic, not just the column name.
Use .dinorules to define domain-specific terminology: "mrr always means Marketing Response Rate in this project."
Mandate human review: Route every AI-generated PR to the model owner. Never merge auto-generated documentation without a domain expert sign-off.
Regenerate, don't trust the first pass: DinoAI's Autogenerate button lets you regenerate descriptions — iterate until the output matches reality.

Over-Documenting Low-Value Models

Not every model deserves the same documentation investment. Intermediate models that are internal implementation details (e.g., int_orders_pivoted) don't need the same depth as fct_revenue.

A practical heuristic:

Model Layer	Documentation Depth	Who Reads It
Marts	Full: grain, business logic, caveats, SLA	Analysts, stakeholders, downstream consumers
Staging	Medium: source system, key transformations, grain	Engineers working on the pipeline
Intermediate	Light: purpose and relationship to parent mart	Engineers debugging or refactoring
Base/Utility	Minimal: one-liner is sufficient	Engineers only

Spending three hours documenting int_payments_unioned takes time away from documenting fct_revenue — and nobody reads the intermediate docs anyway.

No Review Process

The most dangerous pitfall isn't missing documentation — it's wrong documentation that looks complete. When AI generates descriptions in bulk and they're merged without review, you end up with a catalog full of confident-sounding definitions that mislead stakeholders.

The fix is process, not tooling:

Every documentation PR gets a reviewer who understands the domain (not just the code).
Use .dinoprompts to create a "documentation review" prompt that checks diffs for completeness and flags potential inaccuracies.
Schedule quarterly audits: Pick 10 random model descriptions and verify them against the actual SQL logic. Track accuracy over time.

Figure 5: A continuous improvement loop — AI generates, humans verify, audits catch drift, and .dinorules improve over time.

Bringing It All Together

Achieving near-100% dbt™ documentation coverage isn't about a single tool or a one-time sprint. It's a workflow that combines:

Paradime Catalog for always-on, bi-directional documentation that stays current without manual dbt docs generate cycles
DinoAI for AI-powered bulk generation that understands your warehouse metadata, SQL logic, and project conventions
.dinorules and .dinoprompts for codifying your team's standards so every generated description meets your bar
dbt-coverage and dbt-project-evaluator in CI for automated guardrails that prevent coverage regression
Human review at every step — because documentation that's wrong is worse than documentation that's missing

Start with your mart models. Get those to 100%. Expand outward. Add CI guardrails. And treat documentation not as a chore that follows development, but as a first-class part of the development workflow itself.

Your future self — and the engineer who inherits your project — will thank you.

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Start for free

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Start for free

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Platform

ADD-ONs

DINOAI

NEW

Programmable Agents

Self-Healing Pipelines

Resources

Industries

About

Legal

Responsible Disclosure Policy

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Platform

ADD-ONs

DINOAI

NEW