Paradime | BigQuery Autonomous Embeddings: Google Automated the Annoying Part

Learn

BigQuery Autonomous Embeddings: Google Automated the Annoying Part

BigQuery autonomous embeddings automatically maintain vector embeddings as your data changes. No sync jobs, no stale data—just fresh embeddings 24/7.

Fabio Di Leta

Dec 8, 2025

min read

Embeddings maintenance sucks. You generate them, data changes, suddenly you're running batch jobs at 3am. BigQuery's autonomous embedding generation (Preview) fixes this.

Why You Care

Your support team answers 10,000 tickets a month. You built semantic search over your knowledge base. Works great until someone updates an article and the embeddings are stale for 6 hours. Customer gets wrong answer, escalates, costs you money.

Or you're running e-commerce. 50,000 products, descriptions changing constantly. Your recommendation engine relies on embeddings. Manual sync job runs nightly. Products added at 9am don't show up in recommendations until tomorrow.

Or you're doing RAG over internal docs. Engineers update specs, embeddings lag, AI generates responses based on outdated information. Someone makes a decision based on old data.

BigQuery now handles this automatically.

What It Actually Does

Point BigQuery at a column. Tell it which Vertex AI model to use. It maintains embeddings automatically. Update data? Embeddings update. Insert rows? Embeddings generate. No scripts, no orchestration, no sync jobs.

CREATE TABLE knowledge_base.articles (
  title STRING,
  description STRING,
  description_embedding STRUCT<result ARRAY<FLOAT64>, status STRING>
    GENERATED ALWAYS AS (
      AI.EMBED(description,
        connection_id => 'us.example_connection',
        endpoint => 'text-embedding-005'))
    STORED OPTIONS(asynchronous = TRUE)
)

CREATE TABLE knowledge_base.articles (
  title STRING,
  description STRING,
  description_embedding STRUCT<result ARRAY<FLOAT64>, status STRING>
    GENERATED ALWAYS AS (
      AI.EMBED(description,
        connection_id => 'us.example_connection',
        endpoint => 'text-embedding-005'))
    STORED OPTIONS(asynchronous = TRUE)
)

CREATE TABLE knowledge_base.articles (
  title STRING,
  description STRING,
  description_embedding STRUCT<result ARRAY<FLOAT64>, status STRING>
    GENERATED ALWAYS AS (
      AI.EMBED(description,
        connection_id => 'us.example_connection',
        endpoint => 'text-embedding-005'))
    STORED OPTIONS(asynchronous = TRUE)
)

Done.

Why This Matters for RAG

Your documents live in BigQuery. They change constantly. Product descriptions, support articles, knowledge bases—all updating.

Before: custom sync logic, scheduled jobs, checking timestamps, handling failures, monitoring staleness.

Now: BigQuery handles it. Your embeddings stay current without you thinking about it.

The Smart Bits

Background Processing

asynchronous = TRUE means inserts don't block. BigQuery queues the work, processes it on background slots, updates embeddings when done.

Error Handling

Embedding column returns a struct: {result, status}. If generation fails, you get NULL + error message. No silent failures.

-- Find broken embeddings
SELECT * FROM knowledge_base.articles
WHERE description_embedding.status != ''

-- Find broken embeddings
SELECT * FROM knowledge_base.articles
WHERE description_embedding.status != ''

-- Find broken embeddings
SELECT * FROM knowledge_base.articles
WHERE description_embedding.status != ''

Native Vector Search

Once embeddings exist, semantic search is SQL:

SELECT title, description, distance
FROM AI.SEARCH(TABLE knowledge_base.articles, 'description', "fun toy")

SELECT title, description, distance
FROM AI.SEARCH(TABLE knowledge_base.articles, 'description', "fun toy")

SELECT title, description, distance
FROM AI.SEARCH(TABLE knowledge_base.articles, 'description', "fun toy")

No external vector database. No data movement. Query BigQuery directly using vector search.

The Limits

One column per table

Need embeddings on multiple columns? Multiple tables or manual generation.

CREATE TABLE only - Can't ALTER TABLE to add this. Build it in from the start or recreate the table.

Source must be STRING - Binary data, structured fields—preprocess them first.

Index training waits for 80% - Vector index creation blocks until 80% of rows have embeddings. Large tables = wait time.

Check progress:

SELECT
  COUNTIF(description_embedding.status = '') * 100.0 / COUNT(*) AS percent
FROM

SELECT
  COUNTIF(description_embedding.status = '') * 100.0 / COUNT(*) AS percent
FROM

SELECT
  COUNTIF(description_embedding.status = '') * 100.0 / COUNT(*) AS percent
FROM

For dbt™ Users

Create these in your dbt™ models directly. BigQuery manages the embedding lifecycle. Just remember: you can't modify the generated column setup later, so get your model definition right the first time.

Setup

Three steps:

Create Cloud resource connection
Grant connection's service account roles/aiplatform.user
Create table with generated column

BigQuery starts processing immediately.

Costs

Every row = Vertex AI API call. Track in Cloud Billing under Vertex AI service, filter by label bigquery_ml_job and job IDs starting with gc_.

Worth it? Depends on update frequency. Constantly changing data—yes. Static data embedded once—just generate manually.

When to Use This

Use for:

RAG systems where source docs update regularly
E-commerce with changing product catalogs
Support systems with evolving knowledge bases
Any scenario where embeddings need to stay fresh automatically

Skip for:

Static datasets embedded once
Need embeddings on multiple columns
Need to modify generated columns post-creation
Data that doesn't fit STRING column constraint

Error Tracking

Embedding generation stalls? Query background jobs:

SELECT * FROM `region-REGION.INFORMATION_SCHEMA.JOBS` j
WHERE EXISTS (
  SELECT 1 FROM unnest(j.referenced_tables) t
  WHERE t.table_id = 'YOUR_TABLE'
)
AND starts_with(job_id, 'gc_')
AND error_result IS NOT NULL
ORDER BY creation_time DESC

SELECT * FROM `region-REGION.INFORMATION_SCHEMA.JOBS` j
WHERE EXISTS (
  SELECT 1 FROM unnest(j.referenced_tables) t
  WHERE t.table_id = 'YOUR_TABLE'
)
AND starts_with(job_id, 'gc_')
AND error_result IS NOT NULL
ORDER BY creation_time DESC

SELECT * FROM `region-REGION.INFORMATION_SCHEMA.JOBS` j
WHERE EXISTS (
  SELECT 1 FROM unnest(j.referenced_tables) t
  WHERE t.table_id = 'YOUR_TABLE'
)
AND starts_with(job_id, 'gc_')
AND error_result IS NOT NULL
ORDER BY creation_time DESC

Background jobs prefix with gc_.

The Real Win

Not the embedding generation—Vertex AI already does that. It's the maintenance. Embeddings stay current without orchestration, monitoring, or custom sync logic.

That's the feature.

Resources:

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Analytics

Feb 9, 2026

Context Engineering and AI Quality for Data Teams

Analytics

Feb 9, 2026

Context Engineering and AI Quality for Data Teams

Analytics

Feb 9, 2026

Context Engineering and AI Quality for Data Teams

Product

Feb 9, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 9, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 9, 2026

Accelerate Analytics Development with Paradime and Tableau

Learn

Feb 9, 2026

Get Started with dbt™-llm-evals: Warehouse-Native LLM Evaluation in 15 Minutes

Learn

Feb 9, 2026

Get Started with dbt™-llm-evals: Warehouse-Native LLM Evaluation in 15 Minutes

Learn

Feb 9, 2026

Get Started with dbt™-llm-evals: Warehouse-Native LLM Evaluation in 15 Minutes

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Platform

Resources

ADD-ONs

Industries

About

Legal

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Platform

Resources