Generates embeddings using remote models for text, images, or multimodal content. Alternative to AI.EMBED that works with remote models created via CREATE REMOTE MODEL. Returns an ARRAY vector representation of the semantic meaning of input data.
Use Cases
Semantic Search: Find documents similar to a query
Content Recommendation: Recommend similar articles or products
Document Clustering: Group similar documents automatically
Similarity Detection: Find duplicate or near-duplicate content
RAG Systems: Retrieve relevant context for generative AI
Syntax
AI.GENERATE_EMBEDDING(
MODEL remote_model_name,
content_column_or_literal,[, STRUCT(param => value, ...)])
AI.GENERATE_EMBEDDING(
MODEL remote_model_name,
content_column_or_literal,[, STRUCT(param => value, ...)])
AI.GENERATE_EMBEDDING(
MODEL remote_model_name,
content_column_or_literal,[, STRUCT(param => value, ...)])
Parameters
MODEL: Remote model created with CREATE REMOTE MODEL
content: Text, image ObjectRefRuntime, or multimodal data to embed
STRUCT (optional): Model parameters like task_type, output_dimensionality
Code Examples
Example 1: Create Embedding Model and Generate Embeddings
-- Step 1: Create remote embedding modelCREATEOR REPLACE MODEL my_dataset.text_embedding_model
REMOTE WITHCONNECTION `us.my_vertex_connection`
OPTIONS (
endpoint = 'text-embedding-004');
-- Step 2: Generate embeddings for productsSELECT
product_id,
product_name,
ml_generate_embedding_result AS product_embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
CONCAT(product_name,' - ', description),
STRUCT('RETRIEVAL_DOCUMENT'AS task_type,768AS output_dimensionality
)),
-- Step 1: Create remote embedding modelCREATEOR REPLACE MODEL my_dataset.text_embedding_model
REMOTE WITHCONNECTION `us.my_vertex_connection`
OPTIONS (
endpoint = 'text-embedding-004');
-- Step 2: Generate embeddings for productsSELECT
product_id,
product_name,
ml_generate_embedding_result AS product_embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
CONCAT(product_name,' - ', description),
STRUCT('RETRIEVAL_DOCUMENT'AS task_type,768AS output_dimensionality
)),
-- Step 1: Create remote embedding modelCREATEOR REPLACE MODEL my_dataset.text_embedding_model
REMOTE WITHCONNECTION `us.my_vertex_connection`
OPTIONS (
endpoint = 'text-embedding-004');
-- Step 2: Generate embeddings for productsSELECT
product_id,
product_name,
ml_generate_embedding_result AS product_embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
CONCAT(product_name,' - ', description),
STRUCT('RETRIEVAL_DOCUMENT'AS task_type,768AS output_dimensionality
)),
Example 2: Build Semantic Search System
-- Store embeddings in a tableCREATEOR REPLACE TABLE product_embeddings ASSELECT
product_id,
product_name,
description,
ml_generate_embedding_result AS embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
CONCAT(product_name,' ', description),
STRUCT('RETRIEVAL_DOCUMENT'AS task_type)),
products;
-- Create vector indexCREATE VECTOR INDEX product_idx
ON product_embeddings(embedding)
OPTIONS(
distance_type = 'COSINE',
index_type = 'IVF');
-- Search with query embeddingDECLARE query_emb ARRAY<float64>;
SET query_emb = (SELECT ml_generate_embedding_result
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,'laptop with long battery life',
STRUCT('RETRIEVAL_QUERY'AS task_type)));
SELECT
product_name,
description,
VECTOR_SEARCH(
query_emb,
embedding
)AS similarity
FROM product_embeddings
ORDERBY similarity DESCLIMIT10
-- Store embeddings in a tableCREATEOR REPLACE TABLE product_embeddings ASSELECT
product_id,
product_name,
description,
ml_generate_embedding_result AS embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
CONCAT(product_name,' ', description),
STRUCT('RETRIEVAL_DOCUMENT'AS task_type)),
products;
-- Create vector indexCREATE VECTOR INDEX product_idx
ON product_embeddings(embedding)
OPTIONS(
distance_type = 'COSINE',
index_type = 'IVF');
-- Search with query embeddingDECLARE query_emb ARRAY<float64>;
SET query_emb = (SELECT ml_generate_embedding_result
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,'laptop with long battery life',
STRUCT('RETRIEVAL_QUERY'AS task_type)));
SELECT
product_name,
description,
VECTOR_SEARCH(
query_emb,
embedding
)AS similarity
FROM product_embeddings
ORDERBY similarity DESCLIMIT10
-- Store embeddings in a tableCREATEOR REPLACE TABLE product_embeddings ASSELECT
product_id,
product_name,
description,
ml_generate_embedding_result AS embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
CONCAT(product_name,' ', description),
STRUCT('RETRIEVAL_DOCUMENT'AS task_type)),
products;
-- Create vector indexCREATE VECTOR INDEX product_idx
ON product_embeddings(embedding)
OPTIONS(
distance_type = 'COSINE',
index_type = 'IVF');
-- Search with query embeddingDECLARE query_emb ARRAY<float64>;
SET query_emb = (SELECT ml_generate_embedding_result
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,'laptop with long battery life',
STRUCT('RETRIEVAL_QUERY'AS task_type)));
SELECT
product_name,
description,
VECTOR_SEARCH(
query_emb,
embedding
)AS similarity
FROM product_embeddings
ORDERBY similarity DESCLIMIT10
Example 3: Multimodal Embeddings
-- Create multimodal embedding modelCREATEOR REPLACE MODEL my_dataset.multimodal_embedding
REMOTE WITHCONNECTION `us.my_vertex_connection`
OPTIONS (
endpoint = 'multimodal-embedding-002');
-- Generate embeddings for imagesSELECT
image_id,
ml_generate_embedding_result AS image_embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.multimodal_embedding,
image_data
),
-- Create multimodal embedding modelCREATEOR REPLACE MODEL my_dataset.multimodal_embedding
REMOTE WITHCONNECTION `us.my_vertex_connection`
OPTIONS (
endpoint = 'multimodal-embedding-002');
-- Generate embeddings for imagesSELECT
image_id,
ml_generate_embedding_result AS image_embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.multimodal_embedding,
image_data
),
-- Create multimodal embedding modelCREATEOR REPLACE MODEL my_dataset.multimodal_embedding
REMOTE WITHCONNECTION `us.my_vertex_connection`
OPTIONS (
endpoint = 'multimodal-embedding-002');
-- Generate embeddings for imagesSELECT
image_id,
ml_generate_embedding_result AS image_embedding
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.multimodal_embedding,
image_data
),
Example 4: Batch Processing with Error Handling
SELECT
document_id,
ml_generate_embedding_result AS embedding,
ml_generate_embedding_status AS status
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
document_text
),
documents
WHERE ml_generate_embedding_status = 'SUCCESS'
SELECT
document_id,
ml_generate_embedding_result AS embedding,
ml_generate_embedding_status AS status
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
document_text
),
documents
WHERE ml_generate_embedding_status = 'SUCCESS'
SELECT
document_id,
ml_generate_embedding_result AS embedding,
ml_generate_embedding_status AS status
FROM
AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
document_text
),
documents
WHERE ml_generate_embedding_status = 'SUCCESS'
Example 5: Finding Similar Documents
-- Get embedding for source documentDECLARE source_embedding ARRAY<float64>;
SET source_embedding = (SELECT ml_generate_embedding_result
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
document_text
),
documents
WHERE document_id = 'DOC-123');
-- Find similar documentsSELECT
d.document_id,
d.title,
COSINE_DISTANCE(source_embedding, e.embedding)AS similarity
FROM document_embeddings e
JOIN documents d USING(document_id)WHERE d.document_id != 'DOC-123'ORDERBY similarity DESCLIMIT20
-- Get embedding for source documentDECLARE source_embedding ARRAY<float64>;
SET source_embedding = (SELECT ml_generate_embedding_result
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
document_text
),
documents
WHERE document_id = 'DOC-123');
-- Find similar documentsSELECT
d.document_id,
d.title,
COSINE_DISTANCE(source_embedding, e.embedding)AS similarity
FROM document_embeddings e
JOIN documents d USING(document_id)WHERE d.document_id != 'DOC-123'ORDERBY similarity DESCLIMIT20
-- Get embedding for source documentDECLARE source_embedding ARRAY<float64>;
SET source_embedding = (SELECT ml_generate_embedding_result
FROM AI.GENERATE_EMBEDDING(
MODEL my_dataset.text_embedding_model,
document_text
),
documents
WHERE document_id = 'DOC-123');
-- Find similar documentsSELECT
d.document_id,
d.title,
COSINE_DISTANCE(source_embedding, e.embedding)AS similarity
FROM document_embeddings e
JOIN documents d USING(document_id)WHERE d.document_id != 'DOC-123'ORDERBY similarity DESCLIMIT20
Data Output Examples
Embedding Generation
product_name
embedding_dimensions
embedding_preview
"Laptop Pro 15"
768
[0.123, -0.456, 0.789, ...]
"Wireless Mouse"
768
[-0.234, 0.567, -0.123, ...]
Similarity Search Results
product_name
similarity_score
"UltraBook Pro 15-inch"
0.92
"Premium Laptop 14"
0.87
"Business Notebook"
0.81
Task Types (STRUCT Parameter)
RETRIEVAL_QUERY: For search queries (optimizes for question-like inputs)
RETRIEVAL_DOCUMENT: For documents to be searched (optimizes for content)
SEMANTIC_SIMILARITY: General purpose similarity comparison
CLASSIFICATION: For classification/labeling tasks
CLUSTERING: For grouping similar items
Best Practices
Match task types: Use RETRIEVAL_QUERY for queries, RETRIEVAL_DOCUMENT for content
Consistent dimensions: Use same dimensionality for queries and documents
Persist embeddings: Store generated embeddings to avoid regeneration
Use vector indexes: Essential for large-scale similarity search
Batch generation: Generate embeddings in batches for efficiency
Handle errors: Check ml_generate_embedding_status for failures
When to Use
✅ Use when working with remote models
✅ Use for batch embedding generation
✅ Use when you need embedding metadata (status, errors)
✅ Use for production RAG systems
Alternatives
AI.EMBED: Direct embedding function (simpler syntax)
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.