Learn

Get Started with dbt™-llm-evals: Warehouse-Native LLM Evaluation in 15 Minutes

A hands-on tutorial for monitoring AI quality without data egress using dbt™

Fabio Di Leta

Jan 30, 2026

min read

What is dbt™-llm-evals?

dbt™-llm-evals is an open-source package that brings LLM evaluation directly into your data warehouse. Instead of sending data to external APIs, it uses your warehouse's native AI functions—Snowflake Cortex, BigQuery Vertex AI, or Databricks AI Functions—to evaluate AI outputs where your data already lives.

Why warehouse-native evaluation matters:

Zero data egress: Sensitive data never leaves your environment
No external APIs: One less dependency to manage
Native dbt™ integration: Works with your existing workflows
Automatic baselines: No manual curation required

This tutorial walks through installing the package, configuring your first evaluation, and viewing quality scores—all in about 15 minutes.

Prerequisites

Before starting, you'll need:

A dbt™ project connected to Snowflake, BigQuery, or Databricks
Warehouse AI functions enabled (Cortex, Vertex AI, or AI Functions)
Basic familiarity with dbt™ models and configuration

Step 1: Install the Package

Add dbt™-llm-evals to your packages.yml:

# packages.yml
packages:
  - git: "https://github.com/paradime-io/dbt-llm-evals.git"
    revision: 1.0.0

# packages.yml
packages:
  - git: "https://github.com/paradime-io/dbt-llm-evals.git"
    revision: 1.0.0

# packages.yml
packages:
  - git: "https://github.com/paradime-io/dbt-llm-evals.git"
    revision: 1.0.0

Then install dependencies:

Step 2: Run Setup

The package needs storage tables for captures and baselines. Create them with:

dbt run --select

dbt run --select

dbt run --select

This creates two tables in your target schema:

raw_captures: Stores AI inputs, outputs, and prompts
raw_baselines: Stores baseline examples for comparison

Step 3: Configure Global Variables

Add configuration to your dbt_project.yml. The key settings are the judge model and evaluation criteria.

For Snowflake:

vars:
  llm_evals_judge_model: 'gemini-2.5-flash'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7

vars:
  llm_evals_judge_model: 'gemini-2.5-flash'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7

vars:
  llm_evals_judge_model: 'gemini-2.5-flash'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7

For BigQuery:

vars:
  llm_evals_judge_model: 'gemini-2.5-flash'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7
  gcp_project_id: 'your-project'
  gcp_location: 'us-central1'
  ai_connection_id: 'projects/your-project/locations/us-central1/connections/your-connection'
  ai_endpoint: 'gemini-2.5-flash'

vars:
  llm_evals_judge_model: 'gemini-2.5-flash'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7
  gcp_project_id: 'your-project'
  gcp_location: 'us-central1'
  ai_connection_id: 'projects/your-project/locations/us-central1/connections/your-connection'
  ai_endpoint: 'gemini-2.5-flash'

vars:
  llm_evals_judge_model: 'gemini-2.5-flash'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7
  gcp_project_id: 'your-project'
  gcp_location: 'us-central1'
  ai_connection_id: 'projects/your-project/locations/us-central1/connections/your-connection'
  ai_endpoint: 'gemini-2.5-flash'

For Databricks:

vars:
  llm_evals_judge_model: 'databricks-meta-llama-3-1-8b-instruct'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7

vars:
  llm_evals_judge_model: 'databricks-meta-llama-3-1-8b-instruct'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7

vars:
  llm_evals_judge_model: 'databricks-meta-llama-3-1-8b-instruct'
  llm_evals_criteria: '["accuracy", "relevance", "tone", "completeness"]'
  llm_evals_sampling_rate: 0.1
  llm_evals_pass_threshold: 7

What these settings mean:

llm_evals_judge_model: The AI model that evaluates your outputs
llm_evals_criteria: Quality dimensions to measure (accuracy, relevance, tone, completeness)
llm_evals_sampling_rate: Percentage of outputs to evaluate (0.1 = 10%)
llm_evals_pass_threshold: Minimum score considered "passing" (1-10 scale)

Step 4: Configure Your AI Model

The package uses a post-hook to capture AI outputs automatically. Add the configuration to any model that generates AI content.

Create the YAML configuration:

# models/ai_examples/_customer_support_responses.yml
version: 2

models:
  - name: customer_support_responses
    description: "AI-generated customer support responses with automatic quality evaluation"
    config:
      materialized: table
      post_hook: "{{ dbt_llm_evals.capture_and_evaluate() }}"
      meta:
        llm_evals:
          enabled: true
          baseline_version: 'v1.0'
          input_columns:
            - customer_question
            - customer_context
            - ticket_category
          output_column: 'ai_response'
          prompt: |
            You are a helpful customer support agent. 
            Respond to this customer question professionally and helpfully.
            Customer Context: {customer_context}
            Category: {ticket_category}
            Question: {customer_question}
            Response:
          sampling_rate: 0.2

# models/ai_examples/_customer_support_responses.yml
version: 2

models:
  - name: customer_support_responses
    description: "AI-generated customer support responses with automatic quality evaluation"
    config:
      materialized: table
      post_hook: "{{ dbt_llm_evals.capture_and_evaluate() }}"
      meta:
        llm_evals:
          enabled: true
          baseline_version: 'v1.0'
          input_columns:
            - customer_question
            - customer_context
            - ticket_category
          output_column: 'ai_response'
          prompt: |
            You are a helpful customer support agent. 
            Respond to this customer question professionally and helpfully.
            Customer Context: {customer_context}
            Category: {ticket_category}
            Question: {customer_question}
            Response:
          sampling_rate: 0.2

# models/ai_examples/_customer_support_responses.yml
version: 2

models:
  - name: customer_support_responses
    description: "AI-generated customer support responses with automatic quality evaluation"
    config:
      materialized: table
      post_hook: "{{ dbt_llm_evals.capture_and_evaluate() }}"
      meta:
        llm_evals:
          enabled: true
          baseline_version: 'v1.0'
          input_columns:
            - customer_question
            - customer_context
            - ticket_category
          output_column: 'ai_response'
          prompt: |
            You are a helpful customer support agent. 
            Respond to this customer question professionally and helpfully.
            Customer Context: {customer_context}
            Category: {ticket_category}
            Question: {customer_question}
            Response:
          sampling_rate: 0.2

Key configuration options:

enabled: Turn evaluation on/off for this model
baseline_version: Version tag for baseline comparison
input_columns: Which columns contain the AI's input
output_column: Which column contains the AI's output
prompt: The prompt template used (helps the judge understand context)

Step 5: Create Your AI Model

Here's the complete example model that generates AI responses. Choose the version matching your warehouse.

Snowflake (using Cortex)

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call Snowflake Cortex AI function
    AI_COMPLETE(
        '{{ var("llm_evals_judge_model") }}',
        ai_prompt
    ) as ai_response,
    
    current_timestamp() as generated_at
    
from

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call Snowflake Cortex AI function
    AI_COMPLETE(
        '{{ var("llm_evals_judge_model") }}',
        ai_prompt
    ) as ai_response,
    
    current_timestamp() as generated_at
    
from

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call Snowflake Cortex AI function
    AI_COMPLETE(
        '{{ var("llm_evals_judge_model") }}',
        ai_prompt
    ) as ai_response,
    
    current_timestamp() as generated_at
    
from

BigQuery (using Vertex AI)

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call BigQuery Vertex AI function
    AI.GENERATE(
        ai_prompt,
        connection_id => '{{ var("ai_connection_id") }}',
        endpoint => '{{ var("ai_endpoint") }}'
    ).result as ai_response,
    
    current_timestamp() as generated_at
    
from

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call BigQuery Vertex AI function
    AI.GENERATE(
        ai_prompt,
        connection_id => '{{ var("ai_connection_id") }}',
        endpoint => '{{ var("ai_endpoint") }}'
    ).result as ai_response,
    
    current_timestamp() as generated_at
    
from

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call BigQuery Vertex AI function
    AI.GENERATE(
        ai_prompt,
        connection_id => '{{ var("ai_connection_id") }}',
        endpoint => '{{ var("ai_endpoint") }}'
    ).result as ai_response,
    
    current_timestamp() as generated_at
    
from

Databricks (using AI Functions)

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call Databricks AI function
    ai_query(
        '{{ var("llm_evals_judge_model") }}',
        ai_prompt
    ) as ai_response,
    
    current_timestamp() as generated_at
    
from

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call Databricks AI function
    ai_query(
        '{{ var("llm_evals_judge_model") }}',
        ai_prompt
    ) as ai_response,
    
    current_timestamp() as generated_at
    
from

-- models/ai_examples/customer_support_responses.sql
WITH support_tickets AS (
    select
        ticket_id,
        customer_id,
        customer_name,
        customer_question,
        ticket_category,
        ticket_priority,
        customer_tier,
        
        concat(
            'Customer: ', customer_name,
            ', Tier: ', customer_tier,
            ', Previous interactions: ', previous_interaction_count
        ) as customer_context,
        
        -- Pre-calculate prompt
        concat(
            'You are a helpful customer support agent. ',
            'Respond to this customer question professionally and helpfully.\n\n',
            'Customer Context: ', 
            concat(
                'Customer: ', customer_name,
                ', Tier: ', customer_tier,
                ', Previous interactions: ', previous_interaction_count
            ), '\n',
            'Category: ', ticket_category, '\n',
            'Question: ', customer_question, '\n\n',
            'Response:'
        ) as ai_prompt
        
    from {{ ref('support_tickets_seed') }}
    where status = 'pending_ai_response'
)

select
    ticket_id,
    customer_id,
    customer_name,
    customer_question,
    ticket_category,
    customer_context,
    
    -- Call Databricks AI function
    ai_query(
        '{{ var("llm_evals_judge_model") }}',
        ai_prompt
    ) as ai_response,
    
    current_timestamp() as generated_at
    
from

Step 6: Run Your Model

Execute your AI model. The post-hook automatically captures outputs:

dbt run --select

dbt run --select

dbt run --select

What happens on first run:

Your AI model generates responses
The post-hook detects no baseline exists
It creates a baseline from the current outputs
Future runs compare against this baseline

You should see output like:

Step 7: Run Evaluations

Process the captured data through the evaluation engine:

dbt run --select

dbt run --select

dbt run --select

This runs all evaluation models:

Generates judge prompts with context
Calls your warehouse's AI function to score outputs
Stores scores and reasoning

Step 8: View Results

Query the evaluation results to see how your AI is performing.

Performance summary:

select * 
from llm_evals.llm_evals__performance_summary
order by eval_date desc

select * 
from llm_evals.llm_evals__performance_summary
order by eval_date desc

select * 
from llm_evals.llm_evals__performance_summary
order by eval_date desc

Individual scores with reasoning:

select 
    c.input_data,
    c.output_data,
    e.criterion,
    e.score,
    e.reasoning
from llm_evals.llm_evals__captures c
join llm_evals.llm_evals__judge_evaluations e 
    on c.capture_id = e.capture_id
order by e.evaluated_at desc

select 
    c.input_data,
    c.output_data,
    e.criterion,
    e.score,
    e.reasoning
from llm_evals.llm_evals__captures c
join llm_evals.llm_evals__judge_evaluations e 
    on c.capture_id = e.capture_id
order by e.evaluated_at desc

select 
    c.input_data,
    c.output_data,
    e.criterion,
    e.score,
    e.reasoning
from llm_evals.llm_evals__captures c
join llm_evals.llm_evals__judge_evaluations e 
    on c.capture_id = e.capture_id
order by e.evaluated_at desc

Find low-scoring outputs:

select 
    c.input_data,
    c.output_data,
    e.criterion,
    e.score,
    e.reasoning
from llm_evals.llm_evals__captures c
join llm_evals.llm_evals__judge_evaluations e 
    on c.capture_id = e.capture_id
where e.score < 5
order by e.score asc

select 
    c.input_data,
    c.output_data,
    e.criterion,
    e.score,
    e.reasoning
from llm_evals.llm_evals__captures c
join llm_evals.llm_evals__judge_evaluations e 
    on c.capture_id = e.capture_id
where e.score < 5
order by e.score asc

select 
    c.input_data,
    c.output_data,
    e.criterion,
    e.score,
    e.reasoning
from llm_evals.llm_evals__captures c
join llm_evals.llm_evals__judge_evaluations e 
    on c.capture_id = e.capture_id
where e.score < 5
order by e.score asc

Monitor for drift:

select * 
from llm_evals.llm_evals__drift_detection
where drift_status in ('WARNING', 'ALERT')

select * 
from llm_evals.llm_evals__drift_detection
where drift_status in ('WARNING', 'ALERT')

select * 
from llm_evals.llm_evals__drift_detection
where drift_status in ('WARNING', 'ALERT')

What's Being Evaluated?

The package uses an LLM-as-a-Judge approach. For each captured output, a judge model:

Receives the original input, output, and prompt context
Compares against baseline examples
Scores on each criterion (1-10 scale)
Provides reasoning for the score

Default evaluation criteria:

Criterion	What It Measures
Accuracy	Factual correctness
Relevance	Addresses the input directly
Tone	Appropriate style and professionalism
Completeness	Fully addresses all aspects

Automatic Baseline Management

The package handles baselines automatically:

First run: Creates baseline from initial outputs (no manual setup needed)

Subsequent runs: Compares new outputs against the baseline

New baseline version: Change baseline_version in your config:

meta:
  llm_evals:
    baseline_version: 'v2.0'  # Creates new baseline

meta:
  llm_evals:
    baseline_version: 'v2.0'  # Creates new baseline

meta:
  llm_evals:
    baseline_version: 'v2.0'  # Creates new baseline

Force refresh: Add force_rebaseline: true to recreate the current version.

Scheduling in Production

For ongoing monitoring, schedule evaluation runs after your AI models.

With Paradime Bolt: Create a job that runs:

dbt run --select customer_support_responses
dbt run --select

dbt run --select customer_support_responses
dbt run --select

dbt run --select customer_support_responses
dbt run --select

Troubleshooting

Evaluations not running?

Check llm_evals.enabled: true in your model's meta config
Verify the post-hook is configured
Ensure you ran dbt run --select llm_evals__setup first

No captures appearing?

Check your sampling rate isn't set to 0
Verify the input_columns and output_column match your model's columns

Judge returning errors?

Ensure your warehouse AI functions are enabled
Check the judge model name matches your warehouse's available models

Next Steps

You now have warehouse-native LLM evaluation running. Here's what to explore next:

Add more criteria: Customize llm_evals_criteria for your use case
Adjust sampling: Increase sampling_rate for critical models
Set up alerts: Query llm_evals__drift_detection in your monitoring tools
Create dashboards: Build visualizations from llm_evals__performance_summary

Resources

GitHub: paradime-io/dbt-llm-evals
Full documentation: Package Overview
Architecture deep-dive: Architecture Docs

This post is part of a series on LLM evaluation:

What is LLM-as-a-Judge? A Guide to AI Quality Evaluation
LLM Evaluation Criteria: How to Measure AI Quality
Get Started with dbt™-llm-evals (this post)

Star the dbt™-llm-evals repo if this helped! ⭐

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Mar 16, 2026

The F1 Problem: When Everyone Has a Car, Who Needs a Racing Team?

Product

Mar 16, 2026

Your Tickets and Specs Are the Missing Context: Jira and Confluence in DinoAI

Product

Mar 16, 2026

From dbt™ Code to Merged PR: GitHub Pull Request Management with DinoAI

Future of Data Work
Available Today

Start for free

Future of Data Work
Available Today

Start for free

Future of Data Work
Available Today

Start for free

Platform

Resources

ADD-ONs

Industries

About

Legal

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Platform

Resources