Paradime | How We Built AI-Powered Analytics: A Practical Guide

Analytics

How We Built AI-Powered Analytics: A Practical Guide

End-to-end AI analytics architecture: ingestion, transformation, LLMs in SQL, semantic layer, orchestration. Practical guide from Paradime's stack.

Fabio Di Leta

Nov 28, 2025

min read

TL;DR: Built a production AI pipeline serving our customers as a “solo data team”.
Stack: PeerDB + Fivetran + dlt for ingestion, dbt™ for transformation, BigQuery + Vertex AI for native LLM execution, Cube for semantic layer, Loops for activation, Hex for Data Exploration, Paradime Bolt for orchestration. Zero infrastructure overhead.

The Problem: Nobody Knew How They Used Our AI Features

At Paradime, we built DinoAI to power AI-driven data development. Teams were using it to write dbt™ code, debug pipelines, generate documentation. But when managers asked "what's the ROI on this AI thing?"

Users had a blindspot. Didn't know how they were using DinoAI, what it was actually helping them do, or how much time they were saving. Managers couldn't quantify the productivity impact. And we were guessing at which features mattered for our users.

As the “solo data team”, I needed to build something comprehensive but maintainable.

The goal: an end-to-end pipeline that would power customer-facing analytics, user activation and internal intelligence, all while being flexible enough to evolve with our product.

The Stack: Cloud-Native AI, Zero Infrastructure

The architecture is straightforward:

Ingestion: Fivetran, dlthub, PeerDB
Warehouse: BigQuery
Transformation: dbt™ Core
AI Layer: Vertex AI (via BigQuery AI functions)
Semantic Layer: Cube
Orchestration: Paradime Bolt
Activation: Loops (email)
Internal Reporting: Hex

Why this works: no infrastructure to manage, native LLM execution where data lives, single semantic layer serving both internal and external use cases, and open source where lock-in would hurt us (Cube, dbt™).

Building the Data Foundation: Ingestion Strategy

We pull data from four sources, each serving a specific purpose in understanding AI adoption.

Auth0 gives us the authentication layer - who's logging in, when, from where. This matters because AI usage patterns differ across sessions. Is someone using DinoAI in a focused work session or casually exploring?

Stripe connects the money to the features. Which plans include AI capabilities? Who's on what tier? This helps us understand if AI is driving upgrades or if it's table stakes.

Segment captures every product interaction. The AI-assisted deployments, code generations, feature clicks - this is the raw material for understanding what people actually do with DinoAI.

RDS replication brings in the application database state. User profiles, account settings, project configurations. The context that makes event data meaningful.

For ingestion mechanics, we use Fivetran and dlthub for third-party sources, and PeerDB for database replication into BigQuery.

The key decision here was reliable incremental ingestion - simpler architecture, lower costs, easier to debug than real-time streaming.

Transformation: Where dbt™ Becomes the Single Source of Truth

dbt™ is where raw data becomes insight-ready. We're standardizing event taxonomy across sources, applying business logic to filter out noise and focus on meaningful interactions, creating user and team-level aggregations that power our metrics, and building the foundation for AI summarization with context-rich datasets.

The cool part? Thanks to DinoAI, a lot of this modeling could be automated. Give it the right context, and it generates SQL, adds documentation, writes tests. What would've been weeks of manual data modeling turned into days.

How to Run LLMs Directly in SQL (This Changes Everything)

This is the game-changer. Instead of managing separate AI infrastructure - training environments, model serving, APIs, the whole mess - we use BigQuery's native AI functions.

You write SQL. The SQL calls an LLM. The LLM runs where your data lives. No data movement, no API calls, no infrastructure.

Here's how we generate personalized insights at scale:

SELECT 
  user_id, 
  organization_name, 
  reporting_month,
  
  -- Generate personalized benchmark insights using AI
  AI.GENERATE(
    CONCAT(
      'Create a personalized 2-3 sentence benchmark summary for ', FORMAT_DATE('%B %Y', reporting_month), '. ',
      'Address the reader directly in a professional yet approachable tone. ',
      
      -- Company context
      'Your organization', CHR(39), 's DinoAI usage: ', CAST(COALESCE(dino_ai_usage_percentage, 0) AS STRING), '%',
      CASE 
        WHEN previous_month_usage > 0 THEN
          CONCAT(' (', 
            CASE 
              WHEN month_over_month_change > 0 THEN CONCAT('+', CAST(month_over_month_change AS STRING))
              ELSE CAST(month_over_month_change AS STRING)
            END,
            '% vs. last month)')
        ELSE ' (first month tracked)'
      END,
      '. ',
      
      -- Benchmark context
      'Benchmarks amongst other companies - Average: ', CAST(benchmark_average AS STRING), '%, ',
      'Median: ', CAST(benchmark_p50 AS STRING), '%, ',
      '75th percentile: ', CAST(benchmark_p75 AS STRING), '%, ',
      '90th percentile: ', CAST(benchmark_p90 AS STRING), '%. ',
      
      -- Performance tier
      'Your performance tier: ', performance_tier, '. ',
      
      -- Instructions for the AI
      'Writing guidelines: ',
      
      '1) Opening - Contextualize performance naturally',
      '2) Trend analysis - Address month-over-month changes meaningfully: ',
      '- Only mention trends if they are significant (>5% change or first month) ',
      '- For strong growth (>10% month-over-month): highlight the acceleration and its positioning impact ',
      '- For positive growth (0-10% month-over-month): acknowledge steady progress ',
      '- For decline (<0% month-over-month): frame constructively and suggest focus areas ',
      '- For first month: emphasize baseline establishment and future tracking ',
      '- Avoid mentioning trivial changes (within ±2%) ',
      
      '2) Forward-looking insight - End with actionable perspective: ',
      '- For high performers: suggest maintaining momentum or specific optimization opportunities ',
      '- For average performers: identify specific percentile goals to target ',
      '- For lower performers: provide encouraging next steps without being prescriptive ',
      '- Reference specific benchmark gaps when relevant (e.g., "closing the 8% gap to median") ',
      
      '3) Style requirements: ',
      '- Avoid jargon - say "month-over-month" not "MoM" ',
      '- Total length: 2-3 sentences maximum ',
      '- Avoid generic phrases like "great job" or "keep up the good work" ',
      
      '4) Special considerations: ',
      '- For declining adoption: be empathetic but solution-focused ',
      '- For first-time tracking: set expectations for ongoing monitoring ',
      '- For plateau performance: suggest specific incremental goals ',
      
      'Remember: This is a monthly executive summary. Make it insightful, actionable, and worth reading—not just a data recap.'
    ),
    connection_id => '{{ var("ai_connection_id") }}',
    endpoint => '{{ var("ai_endpoint") }}'

The prompt is embedded in the SQL. We pull context from our dbt™ models, feed it to the LLM, and get back personalized summaries like this:

"Your organization is among the leading DinoAI adopters, with a 68.75% usage rate that places you in the top 10% of all companies. With such a strong adoption baseline established, your next opportunity lies in optimizing specific workflows to maximize the value and efficiency gains from your investment."

Feed rich context from our dbt™ pipeline directly into prompts
Generate personalized summaries at scale without managing separate ML infrastructure
Iterate quickly on prompt engineering using familiar SQL tools

The AI summaries help both teams and ourselves understand feature impact without having to manually interpret raw metrics every single time. This allows us to create curated summarization of key metrics for each of our customers at scale.

Why We Chose Cube for the Semantic Layer

Cube solved a critical problem: metric consistency across internal and external use cases.

Why Cube:

True open source - control over our semantic layer without vendor lock-in. Critical for infrastructure this important.

Multiple API interfaces - GraphQL, REST, and SQL APIs. Dashboard uses GraphQL, Hex queries via SQL - same metric definitions.

Environment management - separate dev and prod. Test changes before they hit production when connecting Paradime.

Observability - when exposing metrics externally, we need visibility into:

Which metrics are slow
What's cached vs. computed fresh
Where we need pre-aggregations
Query patterns indicating optimization opportunities

Cube's query history gives us this out of the box.

Security by default - row-level security ensures accounts only access their own data. No complex application logic - baked into the semantic layer using query rewrite and security context.

Clean infrastructure - runs alongside our existing stack. Scale up or down without DevOps overhead.

From dbt™ to Cube with DinoAI

One challenge: translating dbt™ models into Cube's semantic definitions. Different syntax, different mental model.

DinoAI accelerated this. It helped extend dbt™ models into Cube views and metric definitions. What could have been weeks of manual work turned into days from zero to production.

BigQuery + Vertex AI: The GCP Advantage

We chose BigQuery not just as a warehouse, but as an AI execution environment.

The GCP ecosystem advantage:

Unified platform with storage and compute
Native Vertex AI integrations
GCP's Model Garden for flexibility with LLMs
No infrastructure management
Iterate quickly on best models for use cases

Native AI functions: This is the secret sauce. Using BigQuery's AI.GENERATE_TEXT and other LLM functions, we run AI transformations directly in SQL. Check my previous post on what BigQuery enables with other AI Functions with AI.SCORE, AI.If, and AI.CLASSIFY

Activating Insights: Getting Data to Users

Having insights is worthless if they don't reach users. We built an activation pipeline using Poetry and Python that:

Queries the metrics and insights we generated
Formats personalized reports for each user and team
Sends emails via Loops with monthly and weekly summaries

This runs on a schedule using Bolt. Users get in-app summaries of their DinoAI usage, monthly reports highlighting trends and achievements, and actionable insights on underutilized features that could boost productivity.

The Python approach gives us maximum flexibility. Want to A/B test messaging? Done. Customize based on user segments? Easy.

Orchestration: Keeping Everything Running

A pipeline is only as good as its orchestration. We use Bolt, Paradime's CI/CD and scheduling engine, to manage two jobs:

Job 1: Daily Data Sync and Transformation

Triggers Fivetran sync - pull fresh data from Auth0, Stripe, other sources
Runs dbt™ transformations - snapshots for SCD tables, usage metrics, prep data for LLM context
Executes BigQuery AI functions - generate personalized summaries per customer

Bolt's command settings orchestrate the entire pipeline end-to-end. Metadata monitoring, alerting to Slack/Teams, email built in.

Job 2: Data Quality and Email Activation

Tests data for Loops campaigns using dbt™ tests
Validates AI-generated summaries - ensure quality standards
Triggers Loops API - send personalized monthly reports

Command settings in Bolt chain steps with error handling. Data quality tests fail? No emails sent. Team gets Slack alert. Prevents sending incomplete or incorrect insights to customers.

Why Bolt for orchestration?

No infrastructure management
Integrates with Fivetran, Airbyte, dlthub for ingestion
dbt™ commands for transformation
Python support for everything else
Triggers Power BI, Tableau, Hex for automated refresh
Works with Hightouch and Census for reverse ETL
Logs, monitoring, alerts without external tools
Bolt's DinoAI agent helps decipher run logs and reduce MTTR

Internal Intelligence: Hex for Analytics

While customers see polished dashboards, we needed exploratory analytics internally. Hex was perfect:

Notebook experience - experiment, pull data, explore fast.

Cube integration - connects directly to Cube. Internal reports use exact same metrics we serve customers. Single source of truth, no drift.

Speed of iteration - answer "which AI features drive adoption?" in minutes, not days.

SQL, Python, visualizations in one environment. Invaluable for understanding feature performance as we release.

Here's What Works

1. Start batch, go real-time only if needed

Most analytics don't need sub-second latency
Batch is simpler, cheaper, easier to debug
We run daily syncs - works perfectly

2. Use cloud-native AI when possible

BigQuery ML, Snowflake Cortex, Databricks AI
Avoid building separate ML infrastructure
Run models where data lives

3. Semantic layer is non-negotiable

Single source of truth for metrics
Prevents drift between internal and external reporting
Makes exposing metrics externally safe

4. Observability matters for customer-facing metrics

Monitor query performance
Track cache hit rates
Identify slow queries before customers complain
Cube's query history handles this

5. AI agents accelerate infrastructure work

Used DinoAI to extend dbt™ models into Cube
Wrote Python scripts for Loops workflow
Infrastructure-as-code is infrastructure-with-AI-assistance

The Impact: What This Pipeline Powers

Customer dashboards - teams see how AI transforms workflows, with personalized summaries making data actionable.

Product intelligence - informs every feature decision. We know what works, what doesn't, where to invest.

Activation campaigns - drive engagement by showing users value they're getting (and missing).

Internal reporting - keeps leadership aligned on AI adoption trends and business impact.

Key Learnings from Building This Solo

Building this pipeline as a solo data team taught me that modern data tools hit an inflection point. What would've required five engineers five years ago is now doable by one person with the right stack and AI assistance.

Start with questions, not tools. We began by defining what insights mattered, how we wanted to use it, then chose tools to answer them. Not the other way around.

Semantic layers prevent chaos. As soon as you serve metrics both internally and externally, invest in a proper semantic layer. We chose Cube because it's open source and feature-rich.

Batch is effective. Real-time is expensive and complex. Most analytics don't need it. Daily (or higher frequency) batch runs work perfectly for this use case.

Observability isn't optional. Serving external users? You need deep visibility into query performance. Cube's monitoring has been essential.

At Paradime, this architecture transformed how we understand AI adoption - both for customers and internally. And we did it without a data engineering team.

The tools are there. The infrastructure is simpler. AI helps with the heavy lifting. What you need is a clear goal and the right stack to get there.

FAQs

What's the best way to run LLMs on data warehouse data?

Use native AI functions in your warehouse. BigQuery's AI.GENERATE, Snowflake's Cortex, Databricks' AI Functions. No data movement, no API calls, no infrastructure overhead.

How do you build a semantic layer for customer-facing analytics?

Use Cube. It gives you:

Multiple API interfaces (GraphQL, REST, SQL)
Row-level security out of the box
Query monitoring and caching
Environment separation for dev/prod

Critical for serving metrics externally without chaos.

What tools do you need to build AI analytics pipelines?

Minimum stack:

Data warehouse (BigQuery, Snowflake, Databricks)
Transformation layer (dbt™)
Orchestration (Paradime Bolt, Airflow, Dagster)
Optional but recommended: Semantic layer (Cube)

Conclusion

Modern data tools hit an inflection point. What required five engineers five years ago? One person, right stack, AI assistance.

Choose tools that integrate naturally, provide leverage through AI and automation, scale without operational burden, and keep you close to data without drowning in complexity.

At Paradime, this architecture transformed how we understand AI adoption - for customers and internally. No data engineering team required.

Want to see how Paradime helps data teams ship faster with AI? Try paradime.io

Related Resources

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Analytics

Nov 28, 2025

How We Built AI-Powered Analytics: A Practical Guide

Analytics

Nov 28, 2025

How We Built AI-Powered Analytics: A Practical Guide

Analytics

Nov 28, 2025

How We Built AI-Powered Analytics: A Practical Guide

Learn

Nov 28, 2025

The Complete Guide to dbt™ Database Configuration

Learn

Nov 28, 2025

The Complete Guide to dbt™ Database Configuration

Learn

Nov 28, 2025

The Complete Guide to dbt™ Database Configuration

Learn

Nov 28, 2025

The Complete Guide to dbt™ Schema Configuration

Learn

Nov 28, 2025

The Complete Guide to dbt™ Schema Configuration

Learn

Nov 28, 2025

The Complete Guide to dbt™ Schema Configuration

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Platform

Radar

Resources

Data Modeling Challenge

Big Data London 2025

Industries

About

Legal

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Platform

Radar