AI_EMBED
Feb 23, 2026
·
5
min read
AI_EMBED
Overview
Generates embedding vectors for text or images, which can be used for similarity search, clustering, classification, and semantic search tasks.
Syntax
Parameters
model_name (VARCHAR): Embedding model to use (e.g., 'snowflake-arctic-embed-l-v2.0', 'multilingual-e5-large')
input (VARCHAR or FILE): Text string or image file to embed
Use Cases
Semantic search and similarity matching
Document clustering and grouping
Recommendation systems
Duplicate detection
Content-based filtering
Multi-modal search (text + images)
Available Models
Text Embedding Models (768 dimensions)
e5-base-v2- Context window: 512 tokenssnowflake-arctic-embed-m- Context window: 512 tokenssnowflake-arctic-embed-m-v1.5- Context window: 512 tokens
Text Embedding Models (1024 dimensions)
snowflake-arctic-embed-l-v2.0- Context window: 128K tokenssnowflake-arctic-embed-l-v2.0-8k- Context window: 8K tokensnv-embed-qa-4- Context window: 512 tokensmultilingual-e5-large- Context window: 512 tokensvoyage-multilingual-2- Context window: 32K tokens
Code Examples
Example 1: Generate Text Embeddings
Example 2: Find Similar Documents
Example 3: Create Embedding Table for Vector Search
Example 4: Multilingual Embeddings
Example 5: Image Embeddings
Data Output Examples
Text Embedding Output
Similarity Search Result
Limitations & Considerations
Token Limits
Each model has specific context window limits
Exceeding limits causes truncation or errors
Use AI_COUNT_TOKENS to verify input size
Dimension Sizes
768-dimensional models: Faster, less storage
1024-dimensional models: More accurate, larger storage
Choose based on accuracy vs. performance needs
Cost
Billing based on input tokens only
No output token charges
Embedding generation is relatively inexpensive
Regional Availability
AWS US West/East: ✓
Azure East US: ✓
EU regions: ✓
Cross-region inference: ✓
Best Practices
1. Choose the Right Model
2. Store Embeddings for Reuse
3. Normalize Input Text
4. Batch Processing
Use with Cortex Search
Related Functions
AI_SIMILARITY - Calculate similarity between embeddings
VECTOR_COSINE_SIMILARITY - Compute cosine similarity
AI_COUNT_TOKENS - Check input token count





