AI.GENERATE_EMBEDDING

Feb 23, 2026

·

5

min read

Category: Embedding & Vector Function

Description

Generates embeddings using remote models for text, images, or multimodal content. Alternative to AI.EMBED that works with remote models created via CREATE REMOTE MODEL. Returns an ARRAY vector representation of the semantic meaning of input data.

Use Cases

Semantic Search: Find documents similar to a query

Content Recommendation: Recommend similar articles or products

Document Clustering: Group similar documents automatically

Similarity Detection: Find duplicate or near-duplicate content

RAG Systems: Retrieve relevant context for generative AI

Syntax

AI.GENERATE_EMBEDDING(
  MODEL remote_model_name,
  content_column_or_literal,
  [, STRUCT(param => value, ...) ]
)
AI.GENERATE_EMBEDDING(
  MODEL remote_model_name,
  content_column_or_literal,
  [, STRUCT(param => value, ...) ]
)
AI.GENERATE_EMBEDDING(
  MODEL remote_model_name,
  content_column_or_literal,
  [, STRUCT(param => value, ...) ]
)

Parameters

MODEL: Remote model created with CREATE REMOTE MODEL

content: Text, image ObjectRefRuntime, or multimodal data to embed

STRUCT (optional): Model parameters like task_type, output_dimensionality

Code Examples

Example 1: Create Embedding Model and Generate Embeddings

-- Step 1: Create remote embedding model
CREATE OR REPLACE MODEL my_dataset.text_embedding_model
REMOTE WITH CONNECTION `us.my_vertex_connection`
OPTIONS (
  endpoint = 'text-embedding-004'
);

-- Step 2: Generate embeddings for products
SELECT 
  product_id,
  product_name,
  ml_generate_embedding_result AS product_embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    CONCAT(product_name, ' - ', description),
    STRUCT(
      'RETRIEVAL_DOCUMENT' AS task_type,
      768 AS output_dimensionality
    )
  ),

-- Step 1: Create remote embedding model
CREATE OR REPLACE MODEL my_dataset.text_embedding_model
REMOTE WITH CONNECTION `us.my_vertex_connection`
OPTIONS (
  endpoint = 'text-embedding-004'
);

-- Step 2: Generate embeddings for products
SELECT 
  product_id,
  product_name,
  ml_generate_embedding_result AS product_embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    CONCAT(product_name, ' - ', description),
    STRUCT(
      'RETRIEVAL_DOCUMENT' AS task_type,
      768 AS output_dimensionality
    )
  ),

-- Step 1: Create remote embedding model
CREATE OR REPLACE MODEL my_dataset.text_embedding_model
REMOTE WITH CONNECTION `us.my_vertex_connection`
OPTIONS (
  endpoint = 'text-embedding-004'
);

-- Step 2: Generate embeddings for products
SELECT 
  product_id,
  product_name,
  ml_generate_embedding_result AS product_embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    CONCAT(product_name, ' - ', description),
    STRUCT(
      'RETRIEVAL_DOCUMENT' AS task_type,
      768 AS output_dimensionality
    )
  ),

Example 2: Build Semantic Search System

-- Store embeddings in a table
CREATE OR REPLACE TABLE product_embeddings AS
SELECT 
  product_id,
  product_name,
  description,
  ml_generate_embedding_result AS embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    CONCAT(product_name, ' ', description),
    STRUCT('RETRIEVAL_DOCUMENT' AS task_type)
  ),
  products;

-- Create vector index
CREATE VECTOR INDEX product_idx
ON product_embeddings(embedding)
OPTIONS(
  distance_type = 'COSINE',
  index_type = 'IVF'
);

-- Search with query embedding
DECLARE query_emb ARRAY<float64>;

SET query_emb = (
  SELECT ml_generate_embedding_result
  FROM AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    'laptop with long battery life',
    STRUCT('RETRIEVAL_QUERY' AS task_type)
  )
);

SELECT 
  product_name,
  description,
  VECTOR_SEARCH(
    query_emb,
    embedding
  ) AS similarity
FROM product_embeddings
ORDER BY similarity DESC
LIMIT 10

-- Store embeddings in a table
CREATE OR REPLACE TABLE product_embeddings AS
SELECT 
  product_id,
  product_name,
  description,
  ml_generate_embedding_result AS embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    CONCAT(product_name, ' ', description),
    STRUCT('RETRIEVAL_DOCUMENT' AS task_type)
  ),
  products;

-- Create vector index
CREATE VECTOR INDEX product_idx
ON product_embeddings(embedding)
OPTIONS(
  distance_type = 'COSINE',
  index_type = 'IVF'
);

-- Search with query embedding
DECLARE query_emb ARRAY<float64>;

SET query_emb = (
  SELECT ml_generate_embedding_result
  FROM AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    'laptop with long battery life',
    STRUCT('RETRIEVAL_QUERY' AS task_type)
  )
);

SELECT 
  product_name,
  description,
  VECTOR_SEARCH(
    query_emb,
    embedding
  ) AS similarity
FROM product_embeddings
ORDER BY similarity DESC
LIMIT 10

-- Store embeddings in a table
CREATE OR REPLACE TABLE product_embeddings AS
SELECT 
  product_id,
  product_name,
  description,
  ml_generate_embedding_result AS embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    CONCAT(product_name, ' ', description),
    STRUCT('RETRIEVAL_DOCUMENT' AS task_type)
  ),
  products;

-- Create vector index
CREATE VECTOR INDEX product_idx
ON product_embeddings(embedding)
OPTIONS(
  distance_type = 'COSINE',
  index_type = 'IVF'
);

-- Search with query embedding
DECLARE query_emb ARRAY<float64>;

SET query_emb = (
  SELECT ml_generate_embedding_result
  FROM AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    'laptop with long battery life',
    STRUCT('RETRIEVAL_QUERY' AS task_type)
  )
);

SELECT 
  product_name,
  description,
  VECTOR_SEARCH(
    query_emb,
    embedding
  ) AS similarity
FROM product_embeddings
ORDER BY similarity DESC
LIMIT 10

Example 3: Multimodal Embeddings

-- Create multimodal embedding model
CREATE OR REPLACE MODEL my_dataset.multimodal_embedding
REMOTE WITH CONNECTION `us.my_vertex_connection`
OPTIONS (
  endpoint = 'multimodal-embedding-002'
);

-- Generate embeddings for images
SELECT 
  image_id,
  ml_generate_embedding_result AS image_embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.multimodal_embedding,
    image_data
  ),

-- Create multimodal embedding model
CREATE OR REPLACE MODEL my_dataset.multimodal_embedding
REMOTE WITH CONNECTION `us.my_vertex_connection`
OPTIONS (
  endpoint = 'multimodal-embedding-002'
);

-- Generate embeddings for images
SELECT 
  image_id,
  ml_generate_embedding_result AS image_embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.multimodal_embedding,
    image_data
  ),

-- Create multimodal embedding model
CREATE OR REPLACE MODEL my_dataset.multimodal_embedding
REMOTE WITH CONNECTION `us.my_vertex_connection`
OPTIONS (
  endpoint = 'multimodal-embedding-002'
);

-- Generate embeddings for images
SELECT 
  image_id,
  ml_generate_embedding_result AS image_embedding
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.multimodal_embedding,
    image_data
  ),

Example 4: Batch Processing with Error Handling

SELECT 
  document_id,
  ml_generate_embedding_result AS embedding,
  ml_generate_embedding_status AS status
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    document_text
  ),
  documents
WHERE ml_generate_embedding_status = 'SUCCESS'

SELECT 
  document_id,
  ml_generate_embedding_result AS embedding,
  ml_generate_embedding_status AS status
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    document_text
  ),
  documents
WHERE ml_generate_embedding_status = 'SUCCESS'

SELECT 
  document_id,
  ml_generate_embedding_result AS embedding,
  ml_generate_embedding_status AS status
FROM 
  AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    document_text
  ),
  documents
WHERE ml_generate_embedding_status = 'SUCCESS'

Example 5: Finding Similar Documents

-- Get embedding for source document
DECLARE source_embedding ARRAY<float64>;

SET source_embedding = (
  SELECT ml_generate_embedding_result
  FROM AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    document_text
  ),
  documents
  WHERE document_id = 'DOC-123'
);

-- Find similar documents
SELECT 
  d.document_id,
  d.title,
  COSINE_DISTANCE(source_embedding, e.embedding) AS similarity
FROM document_embeddings e
JOIN documents d USING(document_id)
WHERE d.document_id != 'DOC-123'
ORDER BY similarity DESC
LIMIT 20

-- Get embedding for source document
DECLARE source_embedding ARRAY<float64>;

SET source_embedding = (
  SELECT ml_generate_embedding_result
  FROM AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    document_text
  ),
  documents
  WHERE document_id = 'DOC-123'
);

-- Find similar documents
SELECT 
  d.document_id,
  d.title,
  COSINE_DISTANCE(source_embedding, e.embedding) AS similarity
FROM document_embeddings e
JOIN documents d USING(document_id)
WHERE d.document_id != 'DOC-123'
ORDER BY similarity DESC
LIMIT 20

-- Get embedding for source document
DECLARE source_embedding ARRAY<float64>;

SET source_embedding = (
  SELECT ml_generate_embedding_result
  FROM AI.GENERATE_EMBEDDING(
    MODEL my_dataset.text_embedding_model,
    document_text
  ),
  documents
  WHERE document_id = 'DOC-123'
);

-- Find similar documents
SELECT 
  d.document_id,
  d.title,
  COSINE_DISTANCE(source_embedding, e.embedding) AS similarity
FROM document_embeddings e
JOIN documents d USING(document_id)
WHERE d.document_id != 'DOC-123'
ORDER BY similarity DESC
LIMIT 20

Data Output Examples

Embedding Generation

product_name

embedding_dimensions

embedding_preview

"Laptop Pro 15"

768

[0.123, -0.456, 0.789, ...]

"Wireless Mouse"

768

[-0.234, 0.567, -0.123, ...]

Similarity Search Results

product_name

similarity_score

"UltraBook Pro 15-inch"

0.92

"Premium Laptop 14"

0.87

"Business Notebook"

0.81

Task Types (STRUCT Parameter)

RETRIEVAL_QUERY: For search queries (optimizes for question-like inputs)

RETRIEVAL_DOCUMENT: For documents to be searched (optimizes for content)

SEMANTIC_SIMILARITY: General purpose similarity comparison

CLASSIFICATION: For classification/labeling tasks

CLUSTERING: For grouping similar items

Best Practices

Match task types: Use RETRIEVAL_QUERY for queries, RETRIEVAL_DOCUMENT for content

Consistent dimensions: Use same dimensionality for queries and documents

Persist embeddings: Store generated embeddings to avoid regeneration

Use vector indexes: Essential for large-scale similarity search

Batch generation: Generate embeddings in batches for efficiency

Handle errors: Check ml_generate_embedding_status for failures

When to Use

✅ Use when working with remote models

✅ Use for batch embedding generation

✅ Use when you need embedding metadata (status, errors)

✅ Use for production RAG systems

Alternatives

AI.EMBED: Direct embedding function (simpler syntax)

Pre-trained models: Import custom TensorFlow/ONNX embedding models

External APIs: Generate embeddings outside BigQuery

Supported Models

Text Embedding Models:

  • text-embedding-004 (768 dimensions, latest)

  • text-embedding-003

  • text-multilingual-embedding-002

  • textembedding-gecko (legacy)

Multimodal Models:

  • multimodal-embedding-002 (supports text, images, video)

Platform Support

Regions: Model-dependent, check Vertex AI availability

Preview Status: Generally Available (GA)

Cost: Charged per Vertex AI embedding API call

Performance: Optimized for batch processing

Output Schema

The function returns a table with:

ml_generate_embedding_result: ARRAY embedding vector

ml_generate_embedding_status: Success/error status

ml_generate_embedding_statistics: Token counts and metadata

Distance Metrics

For similarity comparison, use:

COSINE_DISTANCE: Most common, measures angle between vectors

EUCLIDEAN_DISTANCE: Measures straight-line distance

DOT_PRODUCT: Fast but requires normalized vectors

AI.SIMILARITY: Convenience function for cosine similarity

Interested to Learn More?
Try Out the Free 14-Days Trial

More Articles

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.