SQL Keywords

AI.EMBED (2)

Feb 23, 2026

Category: Embedding & Vector Function (Preview)

Description

Generates embeddings (vector representations) for text, images, or multimodal data. Returns an ARRAY that captures the semantic meaning of the input. Used for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).

Use Cases

Semantic Search: Find similar documents or products

Recommendation Systems: Recommend similar items

Clustering: Group similar content together

RAG (Retrieval-Augmented Generation): Retrieve relevant context for LLM queries

Deduplication: Identify duplicate or near-duplicate records

Multimodal Search: Search images with text queries or vice versa

Syntax

AI.EMBED(
  model => 'MODEL_ENDPOINT',
  content => INPUT_DATA
  [, task_type => 'TASK_TYPE' ]
  [, output_dimensionality => DIMENSIONS ]
  [, connection_id => 'CONNECTION' ]
)

AI.EMBED(
  model => 'MODEL_ENDPOINT',
  content => INPUT_DATA
  [, task_type => 'TASK_TYPE' ]
  [, output_dimensionality => DIMENSIONS ]
  [, connection_id => 'CONNECTION' ]
)

AI.EMBED(
  model => 'MODEL_ENDPOINT',
  content => INPUT_DATA
  [, task_type => 'TASK_TYPE' ]
  [, output_dimensionality => DIMENSIONS ]
  [, connection_id => 'CONNECTION' ]
)

Parameters

model: Embedding model endpoint (e.g., 'text-embedding-004', 'multimodal-embedding-002')

content: Text, image ObjectRefRuntime, or multimodal data

task_type (optional): 'RETRIEVAL_QUERY', 'RETRIEVAL_DOCUMENT', 'SEMANTIC_SIMILARITY', 'CLASSIFICATION', 'CLUSTERING'

output_dimensionality (optional): Vector dimensions (e.g., 256, 768, 1024)

connection_id: Vertex AI connection

Code Examples

Example 1: Generate Text Embeddings

SELECT 
  product_id,
  product_name,
  AI.EMBED(
    model => 'text-embedding-004',
    content => CONCAT(product_name, ' ', description),
    task_type => 'RETRIEVAL_DOCUMENT',
    connection_id => 'us.my_vertex_connection'
  ) AS product_embedding
FROM

SELECT 
  product_id,
  product_name,
  AI.EMBED(
    model => 'text-embedding-004',
    content => CONCAT(product_name, ' ', description),
    task_type => 'RETRIEVAL_DOCUMENT',
    connection_id => 'us.my_vertex_connection'
  ) AS product_embedding
FROM

SELECT 
  product_id,
  product_name,
  AI.EMBED(
    model => 'text-embedding-004',
    content => CONCAT(product_name, ' ', description),
    task_type => 'RETRIEVAL_DOCUMENT',
    connection_id => 'us.my_vertex_connection'
  ) AS product_embedding
FROM

Example 2: Semantic Search with Query Embedding

-- Step 1: Generate query embedding
DECLARE query_embedding ARRAY<float64>;

SET query_embedding = (
  SELECT AI.EMBED(
    model => 'text-embedding-004',
    content => 'wireless bluetooth headphones with noise cancellation',
    task_type => 'RETRIEVAL_QUERY',
    connection_id => 'us.my_vertex_connection'
  )
);

-- Step 2: Find similar products using cosine similarity
SELECT 
  product_id,
  product_name,
  AI.SIMILARITY(
    query_embedding,
    product_embedding
  ) AS similarity_score
FROM products_with_embeddings
ORDER BY similarity_score DESC
LIMIT 10

-- Step 1: Generate query embedding
DECLARE query_embedding ARRAY<float64>;

SET query_embedding = (
  SELECT AI.EMBED(
    model => 'text-embedding-004',
    content => 'wireless bluetooth headphones with noise cancellation',
    task_type => 'RETRIEVAL_QUERY',
    connection_id => 'us.my_vertex_connection'
  )
);

-- Step 2: Find similar products using cosine similarity
SELECT 
  product_id,
  product_name,
  AI.SIMILARITY(
    query_embedding,
    product_embedding
  ) AS similarity_score
FROM products_with_embeddings
ORDER BY similarity_score DESC
LIMIT 10

-- Step 1: Generate query embedding
DECLARE query_embedding ARRAY<float64>;

SET query_embedding = (
  SELECT AI.EMBED(
    model => 'text-embedding-004',
    content => 'wireless bluetooth headphones with noise cancellation',
    task_type => 'RETRIEVAL_QUERY',
    connection_id => 'us.my_vertex_connection'
  )
);

-- Step 2: Find similar products using cosine similarity
SELECT 
  product_id,
  product_name,
  AI.SIMILARITY(
    query_embedding,
    product_embedding
  ) AS similarity_score
FROM products_with_embeddings
ORDER BY similarity_score DESC
LIMIT 10

Example 3: Image Embeddings

SELECT 
  image_id,
  AI.EMBED(
    model => 'multimodal-embedding-002',
    content => OBJ.GET_ACCESS_URL(image_ref, 'r'),
    connection_id => 'us.my_vertex_connection'
  ) AS image_embedding
FROM

SELECT 
  image_id,
  AI.EMBED(
    model => 'multimodal-embedding-002',
    content => OBJ.GET_ACCESS_URL(image_ref, 'r'),
    connection_id => 'us.my_vertex_connection'
  ) AS image_embedding
FROM

SELECT 
  image_id,
  AI.EMBED(
    model => 'multimodal-embedding-002',
    content => OBJ.GET_ACCESS_URL(image_ref, 'r'),
    connection_id => 'us.my_vertex_connection'
  ) AS image_embedding
FROM

Example 4: Multimodal Embeddings (Text + Image)

SELECT 
  listing_id,
  AI.EMBED(
    model => 'multimodal-embedding-002',
    content => STRUCT(
      listing_description AS text,
      OBJ.GET_ACCESS_URL(listing_image, 'r') AS image
    ),
    connection_id => 'us.my_vertex_connection'
  ) AS multimodal_embedding
FROM

SELECT 
  listing_id,
  AI.EMBED(
    model => 'multimodal-embedding-002',
    content => STRUCT(
      listing_description AS text,
      OBJ.GET_ACCESS_URL(listing_image, 'r') AS image
    ),
    connection_id => 'us.my_vertex_connection'
  ) AS multimodal_embedding
FROM

SELECT 
  listing_id,
  AI.EMBED(
    model => 'multimodal-embedding-002',
    content => STRUCT(
      listing_description AS text,
      OBJ.GET_ACCESS_URL(listing_image, 'r') AS image
    ),
    connection_id => 'us.my_vertex_connection'
  ) AS multimodal_embedding
FROM

Example 5: Create Vector Index for Fast Search

-- Generate embeddings
CREATE OR REPLACE TABLE products_embedded AS
SELECT 
  product_id,
  product_name,
  description,
  AI.EMBED(
    model => 'text-embedding-004',
    content => CONCAT(product_name, ' ', description),
    task_type => 'RETRIEVAL_DOCUMENT',
    output_dimensionality => 768,
    connection_id => 'us.my_vertex_connection'
  ) AS embedding
FROM products;

-- Create vector index for fast similarity search
CREATE VECTOR INDEX product_embedding_index
ON products_embedded(embedding)
OPTIONS(
  index_type = 'IVF',
  distance_type = 'COSINE',
  ivf_options = '{"num_lists": 100}'
)

-- Generate embeddings
CREATE OR REPLACE TABLE products_embedded AS
SELECT 
  product_id,
  product_name,
  description,
  AI.EMBED(
    model => 'text-embedding-004',
    content => CONCAT(product_name, ' ', description),
    task_type => 'RETRIEVAL_DOCUMENT',
    output_dimensionality => 768,
    connection_id => 'us.my_vertex_connection'
  ) AS embedding
FROM products;

-- Create vector index for fast similarity search
CREATE VECTOR INDEX product_embedding_index
ON products_embedded(embedding)
OPTIONS(
  index_type = 'IVF',
  distance_type = 'COSINE',
  ivf_options = '{"num_lists": 100}'
)

-- Generate embeddings
CREATE OR REPLACE TABLE products_embedded AS
SELECT 
  product_id,
  product_name,
  description,
  AI.EMBED(
    model => 'text-embedding-004',
    content => CONCAT(product_name, ' ', description),
    task_type => 'RETRIEVAL_DOCUMENT',
    output_dimensionality => 768,
    connection_id => 'us.my_vertex_connection'
  ) AS embedding
FROM products;

-- Create vector index for fast similarity search
CREATE VECTOR INDEX product_embedding_index
ON products_embedded(embedding)
OPTIONS(
  index_type = 'IVF',
  distance_type = 'COSINE',
  ivf_options = '{"num_lists": 100}'
)

Data Output Examples

Text Embeddings

product_name	embedding_preview
"Wireless Headphones"	[0.023, -0.145, 0.089, ..., 0.234] (768 dimensions)
"Bluetooth Speaker"	[0.051, -0.112, 0.076, ..., 0.198] (768 dimensions)

Similarity Search Results

product_name	similarity_score
"Noise-Cancelling Wireless Headphones"	0.94
"Bluetooth Over-Ear Headphones"	0.89
"Premium Wireless Earbuds"	0.85

Task Types

RETRIEVAL_QUERY: Optimize for search queries

RETRIEVAL_DOCUMENT: Optimize for documents to be searched

SEMANTIC_SIMILARITY: General similarity comparison

CLASSIFICATION: Optimize for classification tasks

CLUSTERING: Optimize for grouping similar items

Best Practices

Use appropriate task_type: Match task type to your use case

Consistent dimensionality: Use same dimensions for query and documents

Create vector indexes: For large-scale similarity search

Batch generation: Generate embeddings in batch for efficiency

Store embeddings: Persist embeddings to avoid regeneration

Choose right model: text-embedding-004 for text, multimodal for images

When to Use

✅ Use for semantic search and similarity

✅ Use for RAG applications

✅ Use for clustering and classification

✅ Use for multimodal search (text ↔ image)

Alternatives

AI.GENERATE_EMBEDDING: Alternative embedding function

Pre-computed embeddings: Import embeddings from external systems

TensorFlow models: Import custom embedding models

Supported Models

Text Models:

text-embedding-004 (latest, 768 dimensions)
text-embedding-003
text-multilingual-embedding-002

Multimodal Models:

multimodal-embedding-002 (text, image, video)

Legacy Models:

textembedding-gecko

Platform Support

Regions: All Gemini-supported regions + US/EU multi-regions

Preview Status: Currently in Preview (Pre-GA)

Cost: Charged per Vertex AI API call

Vector Search: Requires vector index for large-scale search

Returns

ARRAY representing the embedding vector. Dimensions vary by model (typically 256, 768, or 1024). Returns NULL if the Vertex AI call fails.

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Start for free

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Start for free

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Platform

ADD-ONs

DINOAI

NEW

Programmable Agents

Self-Healing Pipelines

Resources

Industries

About

Legal

Responsible Disclosure Policy

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Platform

ADD-ONs

DINOAI

NEW