SQL Keywords
SQL Keywords
AI_SIMILARITY
Feb 23, 2026
·
5
min read
AI_SIMILARITY
Overview
Calculates the embedding similarity between two text or image inputs using cosine similarity.
Syntax
AI_SIMILARITY( input1, input2 )
AI_SIMILARITY( input1, input2 )
AI_SIMILARITY( input1, input2 )
Parameters
input1 (VARCHAR or FILE): First text string or file
input2 (VARCHAR or FILE): Second text string or file
Use Cases
Find duplicate or near-duplicate content
Content recommendation systems
Plagiarism detection
Similar document matching
Query matching
Semantic search
Code Examples
Example 1: Calculate Text Similarity
SELECT AI_SIMILARITY( 'Snowflake is a cloud data platform', 'Snowflake is a cloud-based data warehouse' ) AS
SELECT AI_SIMILARITY( 'Snowflake is a cloud data platform', 'Snowflake is a cloud-based data warehouse' ) AS
SELECT AI_SIMILARITY( 'Snowflake is a cloud data platform', 'Snowflake is a cloud-based data warehouse' ) AS
Output:
similarity_score ---------------- 0.92
similarity_score ---------------- 0.92
similarity_score ---------------- 0.92
Example 2: Find Similar Products
WITH target_product AS ( SELECT description FROM products WHERE product_id = 101 ) SELECT p.product_id, p.product_name, AI_SIMILARITY(p.description, t.description) AS similarity FROM products p, target_product t WHERE p.product_id != 101 ORDER BY similarity DESC LIMIT 10
WITH target_product AS ( SELECT description FROM products WHERE product_id = 101 ) SELECT p.product_id, p.product_name, AI_SIMILARITY(p.description, t.description) AS similarity FROM products p, target_product t WHERE p.product_id != 101 ORDER BY similarity DESC LIMIT 10
WITH target_product AS ( SELECT description FROM products WHERE product_id = 101 ) SELECT p.product_id, p.product_name, AI_SIMILARITY(p.description, t.description) AS similarity FROM products p, target_product t WHERE p.product_id != 101 ORDER BY similarity DESC LIMIT 10
Example 3: Detect Duplicate Support Tickets
SELECT t1.ticket_id AS ticket_1, t2.ticket_id AS ticket_2, AI_SIMILARITY(t1.description, t2.description) AS similarity FROM support_tickets t1 JOIN support_tickets t2 ON t1.ticket_id < t2.ticket_id WHERE AI_SIMILARITY(t1.description, t2.description) > 0.85 AND t1.created_date >= CURRENT_DATE - 7
SELECT t1.ticket_id AS ticket_1, t2.ticket_id AS ticket_2, AI_SIMILARITY(t1.description, t2.description) AS similarity FROM support_tickets t1 JOIN support_tickets t2 ON t1.ticket_id < t2.ticket_id WHERE AI_SIMILARITY(t1.description, t2.description) > 0.85 AND t1.created_date >= CURRENT_DATE - 7
SELECT t1.ticket_id AS ticket_1, t2.ticket_id AS ticket_2, AI_SIMILARITY(t1.description, t2.description) AS similarity FROM support_tickets t1 JOIN support_tickets t2 ON t1.ticket_id < t2.ticket_id WHERE AI_SIMILARITY(t1.description, t2.description) > 0.85 AND t1.created_date >= CURRENT_DATE - 7
Example 4: Content Recommendation
WITH user_preferences AS ( SELECT 'I enjoy reading about machine learning and AI' AS preference ) SELECT article_id, title, AI_SIMILARITY(content_summary, u.preference) AS relevance_score FROM articles a, user_preferences u WHERE published_date >= CURRENT_DATE - 30 ORDER BY relevance_score DESC LIMIT 20
WITH user_preferences AS ( SELECT 'I enjoy reading about machine learning and AI' AS preference ) SELECT article_id, title, AI_SIMILARITY(content_summary, u.preference) AS relevance_score FROM articles a, user_preferences u WHERE published_date >= CURRENT_DATE - 30 ORDER BY relevance_score DESC LIMIT 20
WITH user_preferences AS ( SELECT 'I enjoy reading about machine learning and AI' AS preference ) SELECT article_id, title, AI_SIMILARITY(content_summary, u.preference) AS relevance_score FROM articles a, user_preferences u WHERE published_date >= CURRENT_DATE - 30 ORDER BY relevance_score DESC LIMIT 20
Example 5: Image Similarity
SELECT img1.image_id, img2.image_id, AI_SIMILARITY( TO_FILE('@images/' || img1.filename), TO_FILE('@images/' || img2.filename) ) AS image_similarity FROM image_catalog img1 CROSS JOIN image_catalog img2 WHERE img1.image_id < img2.image_id AND AI_SIMILARITY( TO_FILE('@images/' || img1.filename), TO_FILE('@images/' || img2.filename) ) > 0.9
SELECT img1.image_id, img2.image_id, AI_SIMILARITY( TO_FILE('@images/' || img1.filename), TO_FILE('@images/' || img2.filename) ) AS image_similarity FROM image_catalog img1 CROSS JOIN image_catalog img2 WHERE img1.image_id < img2.image_id AND AI_SIMILARITY( TO_FILE('@images/' || img1.filename), TO_FILE('@images/' || img2.filename) ) > 0.9
SELECT img1.image_id, img2.image_id, AI_SIMILARITY( TO_FILE('@images/' || img1.filename), TO_FILE('@images/' || img2.filename) ) AS image_similarity FROM image_catalog img1 CROSS JOIN image_catalog img2 WHERE img1.image_id < img2.image_id AND AI_SIMILARITY( TO_FILE('@images/' || img1.filename), TO_FILE('@images/' || img2.filename) ) > 0.9
Data Output Examples
Similarity Scores
Text 1: "How to optimize SQL queries" Text 2: "SQL query optimization techniques" Similarity: 0.94 (very similar) Text 1: "Machine learning algorithms" Text 2: "Database management systems" Similarity: 0.32 (somewhat related) Text 1: "Python programming tutorial" Text 2: "Chocolate cake recipe" Similarity: 0.05 (not related)
Text 1: "How to optimize SQL queries" Text 2: "SQL query optimization techniques" Similarity: 0.94 (very similar) Text 1: "Machine learning algorithms" Text 2: "Database management systems" Similarity: 0.32 (somewhat related) Text 1: "Python programming tutorial" Text 2: "Chocolate cake recipe" Similarity: 0.05 (not related)
Text 1: "How to optimize SQL queries" Text 2: "SQL query optimization techniques" Similarity: 0.94 (very similar) Text 1: "Machine learning algorithms" Text 2: "Database management systems" Similarity: 0.32 (somewhat related) Text 1: "Python programming tutorial" Text 2: "Chocolate cake recipe" Similarity: 0.05 (not related)
Duplicate Detection
Original Ticket: "Unable to login to application" Potential Duplicates: - "Cannot log into the app" - 0.91 similarity - "Login page not working" - 0.87 similarity - "Authentication failure" - 0.73 similarity
Original Ticket: "Unable to login to application" Potential Duplicates: - "Cannot log into the app" - 0.91 similarity - "Login page not working" - 0.87 similarity - "Authentication failure" - 0.73 similarity
Original Ticket: "Unable to login to application" Potential Duplicates: - "Cannot log into the app" - 0.91 similarity - "Login page not working" - 0.87 similarity - "Authentication failure" - 0.73 similarity
Model Information
Similarity Metric: Cosine similarity
Output Range: 0.0 (no similarity) to 1.0 (identical)
Supported Inputs: Text and images
Limitations & Considerations
Input Size
Subject to embedding model's context window
Typically 512 to 128K tokens depending on model
Use AI_COUNT_TOKENS to verify
Cost
Generates embeddings for both inputs
Billing based on input tokens for both
Consider caching embeddings for repeated comparisons
Performance
Suitable for batch processing
Use MEDIUM warehouse or smaller
For large-scale similarity, pre-compute embeddings
Regional Availability
AWS US West/East: ✓
Azure East US: ✓
EU regions: ✓
Cross-region inference: ✓
Best Practices
1. Set Appropriate Thresholds
-- Define similarity categories SELECT CASE WHEN AI_SIMILARITY(text1, text2) > 0.9 THEN 'Near Duplicate' WHEN AI_SIMILARITY(text1, text2) > 0.7 THEN 'Highly Similar' WHEN AI_SIMILARITY(text1, text2) > 0.5 THEN 'Moderately Similar' ELSE 'Different' END AS similarity_category FROM
-- Define similarity categories SELECT CASE WHEN AI_SIMILARITY(text1, text2) > 0.9 THEN 'Near Duplicate' WHEN AI_SIMILARITY(text1, text2) > 0.7 THEN 'Highly Similar' WHEN AI_SIMILARITY(text1, text2) > 0.5 THEN 'Moderately Similar' ELSE 'Different' END AS similarity_category FROM
-- Define similarity categories SELECT CASE WHEN AI_SIMILARITY(text1, text2) > 0.9 THEN 'Near Duplicate' WHEN AI_SIMILARITY(text1, text2) > 0.7 THEN 'Highly Similar' WHEN AI_SIMILARITY(text1, text2) > 0.5 THEN 'Moderately Similar' ELSE 'Different' END AS similarity_category FROM
2. Cache Embeddings for Efficiency
-- Instead of computing similarity repeatedly -- Pre-compute and store embeddings CREATE TABLE article_embeddings AS SELECT article_id, AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding FROM articles; -- Then use VECTOR_COSINE_SIMILARITY for faster comparisons SELECT a1.article_id, a2.article_id, VECTOR_COSINE_SIMILARITY(a1.embedding, a2.embedding) AS similarity FROM article_embeddings a1, article_embeddings a2 WHERE
-- Instead of computing similarity repeatedly -- Pre-compute and store embeddings CREATE TABLE article_embeddings AS SELECT article_id, AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding FROM articles; -- Then use VECTOR_COSINE_SIMILARITY for faster comparisons SELECT a1.article_id, a2.article_id, VECTOR_COSINE_SIMILARITY(a1.embedding, a2.embedding) AS similarity FROM article_embeddings a1, article_embeddings a2 WHERE
-- Instead of computing similarity repeatedly -- Pre-compute and store embeddings CREATE TABLE article_embeddings AS SELECT article_id, AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding FROM articles; -- Then use VECTOR_COSINE_SIMILARITY for faster comparisons SELECT a1.article_id, a2.article_id, VECTOR_COSINE_SIMILARITY(a1.embedding, a2.embedding) AS similarity FROM article_embeddings a1, article_embeddings a2 WHERE
3. Optimize Comparison Queries
-- Limit comparisons to relevant subsets SELECT t1.ticket_id, t2.ticket_id, AI_SIMILARITY(t1.description, t2.description) AS similarity FROM support_tickets t1 JOIN support_tickets t2 ON t1.category = t2.category -- Same category only AND t1.ticket_id < t2.ticket_id AND ABS(DATEDIFF('day', t1.created_date, t2.created_date)) <= 7 -- Within week WHERE AI_SIMILARITY(t1.description, t2.description) > 0.8
-- Limit comparisons to relevant subsets SELECT t1.ticket_id, t2.ticket_id, AI_SIMILARITY(t1.description, t2.description) AS similarity FROM support_tickets t1 JOIN support_tickets t2 ON t1.category = t2.category -- Same category only AND t1.ticket_id < t2.ticket_id AND ABS(DATEDIFF('day', t1.created_date, t2.created_date)) <= 7 -- Within week WHERE AI_SIMILARITY(t1.description, t2.description) > 0.8
-- Limit comparisons to relevant subsets SELECT t1.ticket_id, t2.ticket_id, AI_SIMILARITY(t1.description, t2.description) AS similarity FROM support_tickets t1 JOIN support_tickets t2 ON t1.category = t2.category -- Same category only AND t1.ticket_id < t2.ticket_id AND ABS(DATEDIFF('day', t1.created_date, t2.created_date)) <= 7 -- Within week WHERE AI_SIMILARITY(t1.description, t2.description) > 0.8
4. Combine with Filters
WITH similar_products AS ( SELECT p1.product_id AS product_a, p2.product_id AS product_b, AI_SIMILARITY(p1.description, p2.description) AS similarity FROM products p1, products p2 WHERE p1.product_id < p2.product_id AND p1.category = p2.category -- Pre-filter by category ) SELECT * FROM similar_products WHERE similarity > 0.75
WITH similar_products AS ( SELECT p1.product_id AS product_a, p2.product_id AS product_b, AI_SIMILARITY(p1.description, p2.description) AS similarity FROM products p1, products p2 WHERE p1.product_id < p2.product_id AND p1.category = p2.category -- Pre-filter by category ) SELECT * FROM similar_products WHERE similarity > 0.75
WITH similar_products AS ( SELECT p1.product_id AS product_a, p2.product_id AS product_b, AI_SIMILARITY(p1.description, p2.description) AS similarity FROM products p1, products p2 WHERE p1.product_id < p2.product_id AND p1.category = p2.category -- Pre-filter by category ) SELECT * FROM similar_products WHERE similarity > 0.75
Common Use Cases
Deduplication
-- Find and flag duplicate customer inquiries SELECT newer.inquiry_id, older.inquiry_id AS potential_duplicate, AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) AS similarity FROM inquiries newer JOIN inquiries older ON newer.created_at > older.created_at AND older.created_at >= newer.created_at - INTERVAL '7 days' WHERE AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) > 0.9
-- Find and flag duplicate customer inquiries SELECT newer.inquiry_id, older.inquiry_id AS potential_duplicate, AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) AS similarity FROM inquiries newer JOIN inquiries older ON newer.created_at > older.created_at AND older.created_at >= newer.created_at - INTERVAL '7 days' WHERE AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) > 0.9
-- Find and flag duplicate customer inquiries SELECT newer.inquiry_id, older.inquiry_id AS potential_duplicate, AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) AS similarity FROM inquiries newer JOIN inquiries older ON newer.created_at > older.created_at AND older.created_at >= newer.created_at - INTERVAL '7 days' WHERE AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) > 0.9
Recommendation Engine
WITH user_history AS ( SELECT array_agg(article_content) AS read_articles FROM reading_history WHERE user_id = 123 ) SELECT a.article_id, a.title, MAX(AI_SIMILARITY(a.content, h.read_article)) AS max_similarity FROM articles a, user_history, LATERAL FLATTEN(read_articles) h WHERE a.article_id NOT IN (SELECT article_id FROM reading_history WHERE user_id = 123) GROUP BY a.article_id, a.title ORDER BY max_similarity DESC LIMIT 10
WITH user_history AS ( SELECT array_agg(article_content) AS read_articles FROM reading_history WHERE user_id = 123 ) SELECT a.article_id, a.title, MAX(AI_SIMILARITY(a.content, h.read_article)) AS max_similarity FROM articles a, user_history, LATERAL FLATTEN(read_articles) h WHERE a.article_id NOT IN (SELECT article_id FROM reading_history WHERE user_id = 123) GROUP BY a.article_id, a.title ORDER BY max_similarity DESC LIMIT 10
WITH user_history AS ( SELECT array_agg(article_content) AS read_articles FROM reading_history WHERE user_id = 123 ) SELECT a.article_id, a.title, MAX(AI_SIMILARITY(a.content, h.read_article)) AS max_similarity FROM articles a, user_history, LATERAL FLATTEN(read_articles) h WHERE a.article_id NOT IN (SELECT article_id FROM reading_history WHERE user_id = 123) GROUP BY a.article_id, a.title ORDER BY max_similarity DESC LIMIT 10
Related Functions
AI_EMBED - Generate embeddings for storage
VECTOR_COSINE_SIMILARITY - Direct vector similarity
AI_CLASSIFY - For categorical similarity
More Articles

Learn
·
Feb 18, 2026
BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Learn
·
Feb 18, 2026
BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Learn
·
Feb 18, 2026
BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Analytics
·
Feb 18, 2026
Context Engineering and AI Quality for Data Teams

Analytics
·
Feb 18, 2026
Context Engineering and AI Quality for Data Teams

Analytics
·
Feb 18, 2026
Context Engineering and AI Quality for Data Teams

Product
·
Feb 18, 2026
Accelerate Analytics Development with Paradime and Tableau

Product
·
Feb 18, 2026
Accelerate Analytics Development with Paradime and Tableau

Product
·
Feb 18, 2026
Accelerate Analytics Development with Paradime and Tableau
Experience Analytics for the AI-Era
Start your 14-day trial today - it's free and no credit card needed
Experience Analytics for the AI-Era
Start your 14-day trial today - it's free and no credit card needed
Experience Analytics for the AI-Era
Start your 14-day trial today - it's free and no credit card needed
Platform
ADD-ONs
Industries
Copyright © 2026 Paradime Labs, Inc.
Made with ❤️ in San Francisco ・ London
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
Platform
ADD-ONs
Industries


Copyright © 2026 Paradime Labs, Inc.
Made with ❤️ in San Francisco ・ London
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
Platform
ADD-ONs
Industries


Copyright © 2026 Paradime Labs, Inc.
Made with ❤️ in San Francisco ・ London
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
