Automating Data Documentation in dbt™: Beyond dbt™ based catalog (with Paradime Catalog and DinoAI)
Feb 26, 2026
The Complete dbt™ Documentation Generator Guide: From Stale Docs to Near-100% Coverage
Every analytics team has felt the sting. A stakeholder asks what fct_revenue actually measures, and the answer lives in one engineer's head — the same engineer who left last quarter. The schema.yml file says "TODO: add description", the dbt™ docs site hasn't been regenerated since January, and nobody remembers which columns in stg_payments are PII.
Stale docs. Missing context. Tribal knowledge. These aren't edge cases — they're the default state of most dbt™ projects.
This guide makes the pain tangible, then provides a concrete workflow that achieves near-100% documentation coverage using dbt™ best practices, Paradime Catalog, and AI enrichment with DinoAI. Whether you're running dbt Core™ or dbt Cloud™, the patterns here apply.
Why dbt docs generate Often Fails in Practice
The dbt docs generate command is powerful in theory. It compiles your project metadata into manifest.json and catalog.json, and dbt docs serve renders a browsable site. But in practice, most teams hit the same three walls.
It's Manual and Gets Skipped
Documentation in dbt™ depends on developers writing descriptions in schema.yml files:
This works well for five models. At 200+ models with 20+ columns each, it becomes a bottleneck. Under sprint pressure, developers skip descriptions, leave "TODO" placeholders, and the gap compounds. Nobody runs dbt docs generate in the PR review because it's not part of the CI pipeline.
Docs Drift from Reality
Even when a team writes thorough documentation, it decays the moment code changes:
Figure 1: The documentation decay timeline — docs become stale within weeks of initial creation.
The static HTML output of dbt docs serve is a snapshot in time. If your warehouse adds columns, your staging model gains a new CTE, or a business rule changes — the docs don't know. There's no automatic feedback loop between code changes and documentation freshness.
Hard to Enforce Coverage Standards
Out of the box, dbt™ has no built-in mechanism to block a PR when documentation is missing. You can measure coverage with community tools like dbt-coverage:
This produces output like:
But integrating this into CI, setting thresholds, and actually failing builds requires custom scripting that most teams never implement. The result? Documentation coverage silently erodes from 80% to 40% over a quarter, and nobody notices until an audit or an incident.
Paradime Catalog: Always-On Documentation (What Changes)
Paradime's Data Catalog fundamentally changes the documentation workflow by removing the manual generate → serve → hope it's current cycle. Instead of static HTML snapshots, you get a live, always-updated catalog that sits inside the development environment.
Automatic Updates and Discoverability
The key shift: documentation is no longer a separate artifact you build and deploy. Paradime Catalog syncs metadata from your dbt™ project and warehouse schema in real time:
Overnight automatic refresh keeps the catalog current without any manual intervention
On-demand refresh via the Paradime Refresh Catalog command in Bolt lets you trigger updates after schema changes or production deployments
Bi-directional YAML sync means edits made in the Catalog UI propagate directly to your
schema.ymlfiles (and vice versa), all version-controlled in Git
Figure 2: Traditional dbt™ docs workflow vs. Paradime Catalog's always-on approach.
The catalog covers the full dbt™ asset surface — models, tests, macros, exposures, and sources — with metadata pulled from both your .yml files and the information schema. Each model shows:
Summary: quality score, properties, classification tags, description
Columns: name, data type, description, tests, classification (from
metakey-value pairs)Code: source SQL and compiled SQL
Lineage: upstream and downstream dependencies, expandable and filterable
Quality: test monitoring with status, impacted rows, and result messages
Beyond dbt™ assets, the catalog integrates with Looker, Tableau, Power BI, ThoughtSpot, and Fivetran, giving you cross-platform lineage and discoverability from a single pane.
How Teams Use Catalog Day-to-Day
In practice, teams use Paradime Catalog in three recurring patterns:
Onboarding: New engineers explore model lineage and descriptions to understand the project without relying on tribal knowledge from senior team members.
Impact analysis: Before changing a model, check downstream dependencies in the lineage view — including BI dashboards and pipeline connectors — to understand blast radius.
Stakeholder self-service: Business users get free read-only access, reducing the constant Slack pings of "what does this column mean?"
The Catalog tab is built into the Paradime Code IDE, so developers can view, edit, and save documentation without context-switching. Every save updates the corresponding .yml file in your Git branch.
AI Enrichment: Generate schema.yml at Scale with DinoAI
Paradime's DinoAI is an AI assistant that understands your dbt™ project structure, warehouse metadata, and analytics engineering patterns. For documentation specifically, it eliminates the blank-page problem that keeps schema.yml files empty.
Prompt Patterns for Model and Column Descriptions
DinoAI operates in two modes:
Agent Mode (default): Creates and modifies files directly in your project. Use prompts like
"Document all models in the marts/finance folder"or"Update sources.yml to include the new marketing tables".Ask Mode: Provides guidance and explanations without making file changes.
For documentation, Agent Mode is where the leverage is. DinoAI generates context-specific descriptions by reading your SQL logic, column names, and warehouse metadata.
You can also use the Catalog tab's Autogenerate button: click it, and DinoAI produces both business and technical summaries for the selected model and its columns. Review, edit, and save — the descriptions flow directly into your .yml files.
Here's what DinoAI-generated documentation looks like in practice:
How to Use @context for Higher-Quality Definitions
DinoAI's quality scales with the context you provide. Context types include:
Context Type | What It Provides | When to Use |
|---|---|---|
File Context | Individual files from your project | Targeted tasks on specific models |
Directory Context | Entire folders of related files | Documenting a full model layer |
Inline File Context | Specific code selections and line numbers | Explaining complex logic |
Terminal Context | Terminal output, error messages | Debugging documentation issues |
For best results documenting a model, add the model's SQL file and its upstream dependencies as context. DinoAI can then trace the lineage from raw source through staging to the final mart and produce descriptions that accurately reflect the transformation logic.
Bulk Workflows and Review Steps
For large-scale documentation efforts, combine DinoAI with .dinoprompts — a YAML file that serves as your team's reusable prompt library:
The review step is critical. After DinoAI generates descriptions:
Preview before applying: Agent Mode shows all changes in preview boxes — click Accept or Reject per file
Commit to a branch: Changes are Git-tracked, so they go through your normal PR review process
Domain expert review: Route PRs to the team member who owns each model for a factual accuracy check
Figure 3: The AI-assisted documentation workflow with human review checkpoints.
Raise Quality: Add Tests and Ownership Metadata
Documentation without tests is just prose. Tests validate that the descriptions match reality. Ownership metadata ensures someone is accountable when they don't.
High-Signal Tests per Model Type
Not every model needs the same test coverage. Here's a practical matrix:
Model Layer | Essential Tests | Why |
|---|---|---|
Sources |
| Catch upstream issues before they propagate |
Staging |
| Guarantee the grain — every downstream model depends on this |
Intermediate |
| Validate referential integrity across joins |
Marts |
| These feed dashboards — failures here impact stakeholders directly |
In schema.yml, a well-tested mart model looks like this:
The dbt-project-evaluator package can enforce these patterns automatically. Its fct_missing_primary_key_tests rule flags every model that lacks either a unique + not_null test on a single column or a dbt_utils.unique_combination_of_columns test.
Owner/SLAs and How to Keep Them Current
The dbt™ meta config accepts any key-value pair and compiles into manifest.json. Use it for governance metadata:
Override at the individual model level when needed:
In Paradime Catalog, these meta values appear as searchable classifications — filterable by owner, domain, sensitivity level, or any custom key you define.
To keep ownership current, add a .dinoprompts entry that audits ownership during PR review:
Documentation Coverage Playbook (30/60/90 Days)
Going from 30% coverage to near-100% requires a phased approach. Trying to document everything at once leads to burnout and low-quality descriptions. Here's a battle-tested playbook.
Figure 4: A phased 30/60/90-day rollout to achieve near-100% documentation coverage.
Start with Core Marts (Days 1–30)
Goal: 100% documentation and PK tests on all mart models.
Audit: Run
dbt-coverage compute docto get your baseline. Identify which mart models have zero descriptions.Bulk generate with DinoAI: Use Agent Mode —
"Document all models in the marts/ directory"— with directory context added. Review every generated description for accuracy.Add primary key tests: Every mart model gets
unique+not_nullon its primary key. No exceptions.Set ownership: Add
meta.ownerto every mart model indbt_project.ymlat the directory level.
Expand to Staging and Sources (Days 31–60)
Goal: 100% documentation on staging; source freshness tests configured.
Staging models: Use DinoAI with the upstream source files as context. Staging descriptions should reference the source system and any renaming/casting applied.
Sources: Document every source table and its key columns. Configure
freshnessblocks:Configure
.dinorules: Set persistent standards so all future DinoAI generations follow your team's conventions:
Add Governance Guardrails in CI (Days 61–90)
Goal: Automated coverage checks that prevent regression.
Add
dbt-coverageto your CI pipeline:
The --cov-fail-under 0.9 flag fails the build if documentation coverage drops below 90%.
Add the dbt-project-evaluator package to catch missing primary key tests and measure the
test_coverage_pctproject-wide.Configure Paradime Bolt's Turbo CI to run
dbt build --select state:modified+ --target cion every pull request. This builds and tests only modified models in a temporary schema, giving immediate feedback before merge.
Common Pitfalls and How to Avoid Them
AI Hallucinated Definitions
DinoAI (and any LLM) can generate descriptions that sound plausible but are factually wrong. A column named mrr might get described as "Monthly Recurring Revenue" when in your project it's actually "Marketing Response Rate."
Mitigation strategies:
Always provide context: Add the model's SQL file and upstream dependencies so DinoAI reads the actual transformation logic, not just the column name.
Use
.dinorulesto define domain-specific terminology:"mrr always means Marketing Response Rate in this project."Mandate human review: Route every AI-generated PR to the model owner. Never merge auto-generated documentation without a domain expert sign-off.
Regenerate, don't trust the first pass: DinoAI's Autogenerate button lets you regenerate descriptions — iterate until the output matches reality.
Over-Documenting Low-Value Models
Not every model deserves the same documentation investment. Intermediate models that are internal implementation details (e.g., int_orders_pivoted) don't need the same depth as fct_revenue.
A practical heuristic:
Model Layer | Documentation Depth | Who Reads It |
|---|---|---|
Marts | Full: grain, business logic, caveats, SLA | Analysts, stakeholders, downstream consumers |
Staging | Medium: source system, key transformations, grain | Engineers working on the pipeline |
Intermediate | Light: purpose and relationship to parent mart | Engineers debugging or refactoring |
Base/Utility | Minimal: one-liner is sufficient | Engineers only |
Spending three hours documenting int_payments_unioned takes time away from documenting fct_revenue — and nobody reads the intermediate docs anyway.
No Review Process
The most dangerous pitfall isn't missing documentation — it's wrong documentation that looks complete. When AI generates descriptions in bulk and they're merged without review, you end up with a catalog full of confident-sounding definitions that mislead stakeholders.
The fix is process, not tooling:
Every documentation PR gets a reviewer who understands the domain (not just the code).
Use
.dinopromptsto create a "documentation review" prompt that checks diffs for completeness and flags potential inaccuracies.Schedule quarterly audits: Pick 10 random model descriptions and verify them against the actual SQL logic. Track accuracy over time.
Figure 5: A continuous improvement loop — AI generates, humans verify, audits catch drift, and .dinorules improve over time.
Bringing It All Together
Achieving near-100% dbt™ documentation coverage isn't about a single tool or a one-time sprint. It's a workflow that combines:
Paradime Catalog for always-on, bi-directional documentation that stays current without manual
dbt docs generatecyclesDinoAI for AI-powered bulk generation that understands your warehouse metadata, SQL logic, and project conventions
.dinorulesand.dinopromptsfor codifying your team's standards so every generated description meets your bardbt-coverageanddbt-project-evaluatorin CI for automated guardrails that prevent coverage regressionHuman review at every step — because documentation that's wrong is worse than documentation that's missing
Start with your mart models. Get those to 100%. Expand outward. Add CI guardrails. And treat documentation not as a chore that follows development, but as a first-class part of the development workflow itself.
Your future self — and the engineer who inherits your project — will thank you.


