SQL Keywords

SQL Keywords

AI_EMBED

Feb 23, 2026

·

5

min read

AI_EMBED

Overview

Generates embedding vectors for text or images, which can be used for similarity search, clustering, classification, and semantic search tasks.

Syntax

AI_EMBED(
  model_name,
  input
)
AI_EMBED(
  model_name,
  input
)
AI_EMBED(
  model_name,
  input
)

Parameters

  • model_name (VARCHAR): Embedding model to use (e.g., 'snowflake-arctic-embed-l-v2.0', 'multilingual-e5-large')

  • input (VARCHAR or FILE): Text string or image file to embed

Use Cases

  • Semantic search and similarity matching

  • Document clustering and grouping

  • Recommendation systems

  • Duplicate detection

  • Content-based filtering

  • Multi-modal search (text + images)

Available Models

Text Embedding Models (768 dimensions)

  • e5-base-v2 - Context window: 512 tokens

  • snowflake-arctic-embed-m - Context window: 512 tokens

  • snowflake-arctic-embed-m-v1.5 - Context window: 512 tokens

Text Embedding Models (1024 dimensions)

  • snowflake-arctic-embed-l-v2.0 - Context window: 128K tokens

  • snowflake-arctic-embed-l-v2.0-8k - Context window: 8K tokens

  • nv-embed-qa-4 - Context window: 512 tokens

  • multilingual-e5-large - Context window: 512 tokens

  • voyage-multilingual-2 - Context window: 32K tokens

Code Examples

Example 1: Generate Text Embeddings

SELECT 
    product_id,
    product_description,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        product_description
    ) AS product_embedding
FROM

SELECT 
    product_id,
    product_description,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        product_description
    ) AS product_embedding
FROM

SELECT 
    product_id,
    product_description,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        product_description
    ) AS product_embedding
FROM

Example 2: Find Similar Documents

WITH document_embeddings AS (
    SELECT 
        doc_id,
        doc_text,
        AI_EMBED('multilingual-e5-large', doc_text) AS embedding
    FROM documents
),
query_embedding AS (
    SELECT AI_EMBED(
        'multilingual-e5-large',
        'machine learning best practices'
    ) AS query_vec
)
SELECT 
    d.doc_id,
    d.doc_text,
    VECTOR_COSINE_SIMILARITY(d.embedding, q.query_vec) AS similarity_score
FROM document_embeddings d, query_embedding q
ORDER BY similarity_score DESC
LIMIT 10

WITH document_embeddings AS (
    SELECT 
        doc_id,
        doc_text,
        AI_EMBED('multilingual-e5-large', doc_text) AS embedding
    FROM documents
),
query_embedding AS (
    SELECT AI_EMBED(
        'multilingual-e5-large',
        'machine learning best practices'
    ) AS query_vec
)
SELECT 
    d.doc_id,
    d.doc_text,
    VECTOR_COSINE_SIMILARITY(d.embedding, q.query_vec) AS similarity_score
FROM document_embeddings d, query_embedding q
ORDER BY similarity_score DESC
LIMIT 10

WITH document_embeddings AS (
    SELECT 
        doc_id,
        doc_text,
        AI_EMBED('multilingual-e5-large', doc_text) AS embedding
    FROM documents
),
query_embedding AS (
    SELECT AI_EMBED(
        'multilingual-e5-large',
        'machine learning best practices'
    ) AS query_vec
)
SELECT 
    d.doc_id,
    d.doc_text,
    VECTOR_COSINE_SIMILARITY(d.embedding, q.query_vec) AS similarity_score
FROM document_embeddings d, query_embedding q
ORDER BY similarity_score DESC
LIMIT 10

Example 3: Create Embedding Table for Vector Search

CREATE TABLE product_embeddings AS
SELECT 
    product_id,
    product_name,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        product_name || ' ' || description
    ) AS embedding_vector
FROM

CREATE TABLE product_embeddings AS
SELECT 
    product_id,
    product_name,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        product_name || ' ' || description
    ) AS embedding_vector
FROM

CREATE TABLE product_embeddings AS
SELECT 
    product_id,
    product_name,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        product_name || ' ' || description
    ) AS embedding_vector
FROM

Example 4: Multilingual Embeddings

SELECT 
    article_id,
    language,
    AI_EMBED(
        'multilingual-e5-large',
        article_content
    ) AS content_embedding
FROM international_articles
WHERE language IN ('en', 'es', 'fr', 'de')

SELECT 
    article_id,
    language,
    AI_EMBED(
        'multilingual-e5-large',
        article_content
    ) AS content_embedding
FROM international_articles
WHERE language IN ('en', 'es', 'fr', 'de')

SELECT 
    article_id,
    language,
    AI_EMBED(
        'multilingual-e5-large',
        article_content
    ) AS content_embedding
FROM international_articles
WHERE language IN ('en', 'es', 'fr', 'de')

Example 5: Image Embeddings

SELECT 
    image_id,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        TO_FILE('@images/' || image_filename)
    ) AS image_embedding
FROM

SELECT 
    image_id,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        TO_FILE('@images/' || image_filename)
    ) AS image_embedding
FROM

SELECT 
    image_id,
    AI_EMBED(
        'snowflake-arctic-embed-l-v2.0',
        TO_FILE('@images/' || image_filename)
    ) AS image_embedding
FROM

Data Output Examples

Text Embedding Output

Input: "Snowflake is a cloud data platform"
Model: snowflake-arctic-embed-l-v2.0

Output: [0.023, -0.156, 0.089, ..., 0.234]  // 1024-dimensional vector
Input: "Snowflake is a cloud data platform"
Model: snowflake-arctic-embed-l-v2.0

Output: [0.023, -0.156, 0.089, ..., 0.234]  // 1024-dimensional vector
Input: "Snowflake is a cloud data platform"
Model: snowflake-arctic-embed-l-v2.0

Output: [0.023, -0.156, 0.089, ..., 0.234]  // 1024-dimensional vector

Similarity Search Result

Query: "How to optimize SQL queries?"

Top Matches:
1. "SQL Query Performance Tuning" - similarity: 0.89
2. "Database Optimization Techniques" - similarity: 0.85
3. "Best Practices for Query Efficiency" - similarity: 0.82
Query: "How to optimize SQL queries?"

Top Matches:
1. "SQL Query Performance Tuning" - similarity: 0.89
2. "Database Optimization Techniques" - similarity: 0.85
3. "Best Practices for Query Efficiency" - similarity: 0.82
Query: "How to optimize SQL queries?"

Top Matches:
1. "SQL Query Performance Tuning" - similarity: 0.89
2. "Database Optimization Techniques" - similarity: 0.85
3. "Best Practices for Query Efficiency" - similarity: 0.82

Limitations & Considerations

Token Limits

  • Each model has specific context window limits

  • Exceeding limits causes truncation or errors

  • Use AI_COUNT_TOKENS to verify input size

Dimension Sizes

  • 768-dimensional models: Faster, less storage

  • 1024-dimensional models: More accurate, larger storage

  • Choose based on accuracy vs. performance needs

Cost

  • Billing based on input tokens only

  • No output token charges

  • Embedding generation is relatively inexpensive

Regional Availability

  • AWS US West/East: ✓

  • Azure East US: ✓

  • EU regions: ✓

  • Cross-region inference: ✓

Best Practices

1. Choose the Right Model

-- For long documents
AI_EMBED('snowflake-arctic-embed-l-v2.0', long_text)  -- 128K context

-- For short texts (queries, titles)
AI_EMBED('snowflake-arctic-embed-m', short_text)  -- 512 tokens, faster

-- For multilingual content
AI_EMBED('multilingual-e5-large', text)  -- Supports 100+ languages
-- For long documents
AI_EMBED('snowflake-arctic-embed-l-v2.0', long_text)  -- 128K context

-- For short texts (queries, titles)
AI_EMBED('snowflake-arctic-embed-m', short_text)  -- 512 tokens, faster

-- For multilingual content
AI_EMBED('multilingual-e5-large', text)  -- Supports 100+ languages
-- For long documents
AI_EMBED('snowflake-arctic-embed-l-v2.0', long_text)  -- 128K context

-- For short texts (queries, titles)
AI_EMBED('snowflake-arctic-embed-m', short_text)  -- 512 tokens, faster

-- For multilingual content
AI_EMBED('multilingual-e5-large', text)  -- Supports 100+ languages

2. Store Embeddings for Reuse

-- Create materialized embedding table
CREATE TABLE cached_embeddings AS
SELECT 
    id,
    content,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding,
    CURRENT_TIMESTAMP() AS embedding_created_at
FROM source_data;

-- Add index for faster vector search
CREATE INDEX embedding_idx ON cached_embeddings(embedding)

-- Create materialized embedding table
CREATE TABLE cached_embeddings AS
SELECT 
    id,
    content,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding,
    CURRENT_TIMESTAMP() AS embedding_created_at
FROM source_data;

-- Add index for faster vector search
CREATE INDEX embedding_idx ON cached_embeddings(embedding)

-- Create materialized embedding table
CREATE TABLE cached_embeddings AS
SELECT 
    id,
    content,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding,
    CURRENT_TIMESTAMP() AS embedding_created_at
FROM source_data;

-- Add index for faster vector search
CREATE INDEX embedding_idx ON cached_embeddings(embedding)

3. Normalize Input Text

-- Clean and prepare text before embedding
SELECT AI_EMBED(
    'snowflake-arctic-embed-l-v2.0',
    LOWER(TRIM(REGEXP_REPLACE(text, '[^a-zA-Z0-9\s]', '')))
) AS embedding
FROM

-- Clean and prepare text before embedding
SELECT AI_EMBED(
    'snowflake-arctic-embed-l-v2.0',
    LOWER(TRIM(REGEXP_REPLACE(text, '[^a-zA-Z0-9\s]', '')))
) AS embedding
FROM

-- Clean and prepare text before embedding
SELECT AI_EMBED(
    'snowflake-arctic-embed-l-v2.0',
    LOWER(TRIM(REGEXP_REPLACE(text, '[^a-zA-Z0-9\s]', '')))
) AS embedding
FROM

4. Batch Processing

-- Process embeddings in batches for large datasets
CREATE OR REPLACE TABLE product_embeddings AS
SELECT 
    product_id,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', description) AS embedding
FROM products
WHERE embedding_version IS NULL
LIMIT 10000;  -- Process in chunks
-- Process embeddings in batches for large datasets
CREATE OR REPLACE TABLE product_embeddings AS
SELECT 
    product_id,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', description) AS embedding
FROM products
WHERE embedding_version IS NULL
LIMIT 10000;  -- Process in chunks
-- Process embeddings in batches for large datasets
CREATE OR REPLACE TABLE product_embeddings AS
SELECT 
    product_id,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', description) AS embedding
FROM products
WHERE embedding_version IS NULL
LIMIT 10000;  -- Process in chunks

Use with Cortex Search

-- Create Cortex Search Service with embeddings
CREATE CORTEX SEARCH SERVICE product_search
ON embedding_vector
ATTRIBUTES product_name, category
WAREHOUSE = compute_wh
TARGET_LAG = '1 hour'
AS (
    SELECT 
        product_id,
        product_name,
        category,
        AI_EMBED('snowflake-arctic-embed-l-v2.0', description) AS embedding_vector
    FROM products
)

-- Create Cortex Search Service with embeddings
CREATE CORTEX SEARCH SERVICE product_search
ON embedding_vector
ATTRIBUTES product_name, category
WAREHOUSE = compute_wh
TARGET_LAG = '1 hour'
AS (
    SELECT 
        product_id,
        product_name,
        category,
        AI_EMBED('snowflake-arctic-embed-l-v2.0', description) AS embedding_vector
    FROM products
)

-- Create Cortex Search Service with embeddings
CREATE CORTEX SEARCH SERVICE product_search
ON embedding_vector
ATTRIBUTES product_name, category
WAREHOUSE = compute_wh
TARGET_LAG = '1 hour'
AS (
    SELECT 
        product_id,
        product_name,
        category,
        AI_EMBED('snowflake-arctic-embed-l-v2.0', description) AS embedding_vector
    FROM products
)

Related Functions

  • AI_SIMILARITY - Calculate similarity between embeddings

  • VECTOR_COSINE_SIMILARITY - Compute cosine similarity

  • AI_COUNT_TOKENS - Check input token count

Interested to Learn More?
Try Out the Free 14-Days Trial

More Articles

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.