Automating Data Documentation in dbt™: Beyond dbt™ based catalog (with Paradime Catalog and DinoAI)

Feb 26, 2026

Table of Contents

The Complete dbt™ Documentation Generator Guide: From Stale Docs to Near-100% Coverage

Every analytics team has felt the sting. A stakeholder asks what fct_revenue actually measures, and the answer lives in one engineer's head — the same engineer who left last quarter. The schema.yml file says "TODO: add description", the dbt™ docs site hasn't been regenerated since January, and nobody remembers which columns in stg_payments are PII.

Stale docs. Missing context. Tribal knowledge. These aren't edge cases — they're the default state of most dbt™ projects.

This guide makes the pain tangible, then provides a concrete workflow that achieves near-100% documentation coverage using dbt™ best practices, Paradime Catalog, and AI enrichment with DinoAI. Whether you're running dbt Core™ or dbt Cloud™, the patterns here apply.

Why dbt docs generate Often Fails in Practice

The dbt docs generate command is powerful in theory. It compiles your project metadata into manifest.json and catalog.json, and dbt docs serve renders a browsable site. But in practice, most teams hit the same three walls.

It's Manual and Gets Skipped

Documentation in dbt™ depends on developers writing descriptions in schema.yml files:

This works well for five models. At 200+ models with 20+ columns each, it becomes a bottleneck. Under sprint pressure, developers skip descriptions, leave "TODO" placeholders, and the gap compounds. Nobody runs dbt docs generate in the PR review because it's not part of the CI pipeline.

Docs Drift from Reality

Even when a team writes thorough documentation, it decays the moment code changes:

Figure 1: The documentation decay timeline — docs become stale within weeks of initial creation.

The static HTML output of dbt docs serve is a snapshot in time. If your warehouse adds columns, your staging model gains a new CTE, or a business rule changes — the docs don't know. There's no automatic feedback loop between code changes and documentation freshness.

Hard to Enforce Coverage Standards

Out of the box, dbt™ has no built-in mechanism to block a PR when documentation is missing. You can measure coverage with community tools like dbt-coverage:

This produces output like:

But integrating this into CI, setting thresholds, and actually failing builds requires custom scripting that most teams never implement. The result? Documentation coverage silently erodes from 80% to 40% over a quarter, and nobody notices until an audit or an incident.

Paradime Catalog: Always-On Documentation (What Changes)

Paradime's Data Catalog fundamentally changes the documentation workflow by removing the manual generate → serve → hope it's current cycle. Instead of static HTML snapshots, you get a live, always-updated catalog that sits inside the development environment.

Automatic Updates and Discoverability

The key shift: documentation is no longer a separate artifact you build and deploy. Paradime Catalog syncs metadata from your dbt™ project and warehouse schema in real time:

  • Overnight automatic refresh keeps the catalog current without any manual intervention

  • On-demand refresh via the Paradime Refresh Catalog command in Bolt lets you trigger updates after schema changes or production deployments

  • Bi-directional YAML sync means edits made in the Catalog UI propagate directly to your schema.yml files (and vice versa), all version-controlled in Git

Figure 2: Traditional dbt™ docs workflow vs. Paradime Catalog's always-on approach.

The catalog covers the full dbt™ asset surface — models, tests, macros, exposures, and sources — with metadata pulled from both your .yml files and the information schema. Each model shows:

  • Summary: quality score, properties, classification tags, description

  • Columns: name, data type, description, tests, classification (from meta key-value pairs)

  • Code: source SQL and compiled SQL

  • Lineage: upstream and downstream dependencies, expandable and filterable

  • Quality: test monitoring with status, impacted rows, and result messages

Beyond dbt™ assets, the catalog integrates with Looker, Tableau, Power BI, ThoughtSpot, and Fivetran, giving you cross-platform lineage and discoverability from a single pane.

How Teams Use Catalog Day-to-Day

In practice, teams use Paradime Catalog in three recurring patterns:

  1. Onboarding: New engineers explore model lineage and descriptions to understand the project without relying on tribal knowledge from senior team members.

  2. Impact analysis: Before changing a model, check downstream dependencies in the lineage view — including BI dashboards and pipeline connectors — to understand blast radius.

  3. Stakeholder self-service: Business users get free read-only access, reducing the constant Slack pings of "what does this column mean?"

The Catalog tab is built into the Paradime Code IDE, so developers can view, edit, and save documentation without context-switching. Every save updates the corresponding .yml file in your Git branch.

AI Enrichment: Generate schema.yml at Scale with DinoAI

Paradime's DinoAI is an AI assistant that understands your dbt™ project structure, warehouse metadata, and analytics engineering patterns. For documentation specifically, it eliminates the blank-page problem that keeps schema.yml files empty.

Prompt Patterns for Model and Column Descriptions

DinoAI operates in two modes:

  • Agent Mode (default): Creates and modifies files directly in your project. Use prompts like "Document all models in the marts/finance folder" or "Update sources.yml to include the new marketing tables".

  • Ask Mode: Provides guidance and explanations without making file changes.

For documentation, Agent Mode is where the leverage is. DinoAI generates context-specific descriptions by reading your SQL logic, column names, and warehouse metadata.

You can also use the Catalog tab's Autogenerate button: click it, and DinoAI produces both business and technical summaries for the selected model and its columns. Review, edit, and save — the descriptions flow directly into your .yml files.

Here's what DinoAI-generated documentation looks like in practice:

How to Use @context for Higher-Quality Definitions

DinoAI's quality scales with the context you provide. Context types include:

Context Type

What It Provides

When to Use

File Context

Individual files from your project

Targeted tasks on specific models

Directory Context

Entire folders of related files

Documenting a full model layer

Inline File Context

Specific code selections and line numbers

Explaining complex logic

Terminal Context

Terminal output, error messages

Debugging documentation issues

For best results documenting a model, add the model's SQL file and its upstream dependencies as context. DinoAI can then trace the lineage from raw source through staging to the final mart and produce descriptions that accurately reflect the transformation logic.

Bulk Workflows and Review Steps

For large-scale documentation efforts, combine DinoAI with .dinoprompts — a YAML file that serves as your team's reusable prompt library:

The review step is critical. After DinoAI generates descriptions:

  1. Preview before applying: Agent Mode shows all changes in preview boxes — click Accept or Reject per file

  2. Commit to a branch: Changes are Git-tracked, so they go through your normal PR review process

  3. Domain expert review: Route PRs to the team member who owns each model for a factual accuracy check

Figure 3: The AI-assisted documentation workflow with human review checkpoints.

Raise Quality: Add Tests and Ownership Metadata

Documentation without tests is just prose. Tests validate that the descriptions match reality. Ownership metadata ensures someone is accountable when they don't.

High-Signal Tests per Model Type

Not every model needs the same test coverage. Here's a practical matrix:

Model Layer

Essential Tests

Why

Sources

not_null on key columns, freshness checks

Catch upstream issues before they propagate

Staging

unique + not_null on primary key

Guarantee the grain — every downstream model depends on this

Intermediate

not_null on join keys, relationships to parents

Validate referential integrity across joins

Marts

unique + not_null on PK, accepted_values on status/category columns, relationships to dimensions

These feed dashboards — failures here impact stakeholders directly

In schema.yml, a well-tested mart model looks like this:

The dbt-project-evaluator package can enforce these patterns automatically. Its fct_missing_primary_key_tests rule flags every model that lacks either a unique + not_null test on a single column or a dbt_utils.unique_combination_of_columns test.

Owner/SLAs and How to Keep Them Current

The dbt™ meta config accepts any key-value pair and compiles into manifest.json. Use it for governance metadata:

Override at the individual model level when needed:

In Paradime Catalog, these meta values appear as searchable classifications — filterable by owner, domain, sensitivity level, or any custom key you define.

To keep ownership current, add a .dinoprompts entry that audits ownership during PR review:

Documentation Coverage Playbook (30/60/90 Days)

Going from 30% coverage to near-100% requires a phased approach. Trying to document everything at once leads to burnout and low-quality descriptions. Here's a battle-tested playbook.

Figure 4: A phased 30/60/90-day rollout to achieve near-100% documentation coverage.

Start with Core Marts (Days 1–30)

Goal: 100% documentation and PK tests on all mart models.

  1. Audit: Run dbt-coverage compute doc to get your baseline. Identify which mart models have zero descriptions.

  2. Bulk generate with DinoAI: Use Agent Mode — "Document all models in the marts/ directory" — with directory context added. Review every generated description for accuracy.

  3. Add primary key tests: Every mart model gets unique + not_null on its primary key. No exceptions.

  4. Set ownership: Add meta.owner to every mart model in dbt_project.yml at the directory level.

Expand to Staging and Sources (Days 31–60)

Goal: 100% documentation on staging; source freshness tests configured.

  1. Staging models: Use DinoAI with the upstream source files as context. Staging descriptions should reference the source system and any renaming/casting applied.

  2. Sources: Document every source table and its key columns. Configure freshness blocks:

  3. Configure .dinorules: Set persistent standards so all future DinoAI generations follow your team's conventions:

Add Governance Guardrails in CI (Days 61–90)

Goal: Automated coverage checks that prevent regression.

  1. Add dbt-coverage to your CI pipeline:

The --cov-fail-under 0.9 flag fails the build if documentation coverage drops below 90%.

  1. Add the dbt-project-evaluator package to catch missing primary key tests and measure the test_coverage_pct project-wide.

  2. Configure Paradime Bolt's Turbo CI to run dbt build --select state:modified+ --target ci on every pull request. This builds and tests only modified models in a temporary schema, giving immediate feedback before merge.

Common Pitfalls and How to Avoid Them

AI Hallucinated Definitions

DinoAI (and any LLM) can generate descriptions that sound plausible but are factually wrong. A column named mrr might get described as "Monthly Recurring Revenue" when in your project it's actually "Marketing Response Rate."

Mitigation strategies:

  • Always provide context: Add the model's SQL file and upstream dependencies so DinoAI reads the actual transformation logic, not just the column name.

  • Use .dinorules to define domain-specific terminology: "mrr always means Marketing Response Rate in this project."

  • Mandate human review: Route every AI-generated PR to the model owner. Never merge auto-generated documentation without a domain expert sign-off.

  • Regenerate, don't trust the first pass: DinoAI's Autogenerate button lets you regenerate descriptions — iterate until the output matches reality.

Over-Documenting Low-Value Models

Not every model deserves the same documentation investment. Intermediate models that are internal implementation details (e.g., int_orders_pivoted) don't need the same depth as fct_revenue.

A practical heuristic:

Model Layer

Documentation Depth

Who Reads It

Marts

Full: grain, business logic, caveats, SLA

Analysts, stakeholders, downstream consumers

Staging

Medium: source system, key transformations, grain

Engineers working on the pipeline

Intermediate

Light: purpose and relationship to parent mart

Engineers debugging or refactoring

Base/Utility

Minimal: one-liner is sufficient

Engineers only

Spending three hours documenting int_payments_unioned takes time away from documenting fct_revenue — and nobody reads the intermediate docs anyway.

No Review Process

The most dangerous pitfall isn't missing documentation — it's wrong documentation that looks complete. When AI generates descriptions in bulk and they're merged without review, you end up with a catalog full of confident-sounding definitions that mislead stakeholders.

The fix is process, not tooling:

  1. Every documentation PR gets a reviewer who understands the domain (not just the code).

  2. Use .dinoprompts to create a "documentation review" prompt that checks diffs for completeness and flags potential inaccuracies.

  3. Schedule quarterly audits: Pick 10 random model descriptions and verify them against the actual SQL logic. Track accuracy over time.

Figure 5: A continuous improvement loop — AI generates, humans verify, audits catch drift, and .dinorules improve over time.

Bringing It All Together

Achieving near-100% dbt™ documentation coverage isn't about a single tool or a one-time sprint. It's a workflow that combines:

  1. Paradime Catalog for always-on, bi-directional documentation that stays current without manual dbt docs generate cycles

  2. DinoAI for AI-powered bulk generation that understands your warehouse metadata, SQL logic, and project conventions

  3. .dinorules and .dinoprompts for codifying your team's standards so every generated description meets your bar

  4. dbt-coverage and dbt-project-evaluator in CI for automated guardrails that prevent coverage regression

  5. Human review at every step — because documentation that's wrong is worse than documentation that's missing

Start with your mart models. Get those to 100%. Expand outward. Add CI guardrails. And treat documentation not as a chore that follows development, but as a first-class part of the development workflow itself.

Your future self — and the engineer who inherits your project — will thank you.

Interested to Learn More?
Try Out the Free 14-Days Trial
decorative icon

Future of Data Work
Available Today

decorative icon

Future of Data Work
Available Today

decorative icon

Future of Data Work
Available Today

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.