SQL Keywords

SQL Keywords

AI_SIMILARITY

Feb 23, 2026

·

5

min read

AI_SIMILARITY

Overview

Calculates the embedding similarity between two text or image inputs using cosine similarity.

Syntax

AI_SIMILARITY(
  input1,
  input2
)
AI_SIMILARITY(
  input1,
  input2
)
AI_SIMILARITY(
  input1,
  input2
)

Parameters

  • input1 (VARCHAR or FILE): First text string or file

  • input2 (VARCHAR or FILE): Second text string or file

Use Cases

  • Find duplicate or near-duplicate content

  • Content recommendation systems

  • Plagiarism detection

  • Similar document matching

  • Query matching

  • Semantic search

Code Examples

Example 1: Calculate Text Similarity

SELECT AI_SIMILARITY(
    'Snowflake is a cloud data platform',
    'Snowflake is a cloud-based data warehouse'
) AS

SELECT AI_SIMILARITY(
    'Snowflake is a cloud data platform',
    'Snowflake is a cloud-based data warehouse'
) AS

SELECT AI_SIMILARITY(
    'Snowflake is a cloud data platform',
    'Snowflake is a cloud-based data warehouse'
) AS

Output:

similarity_score
----------------
0.92
similarity_score
----------------
0.92
similarity_score
----------------
0.92

Example 2: Find Similar Products

WITH target_product AS (
    SELECT description
    FROM products
    WHERE product_id = 101
)
SELECT 
    p.product_id,
    p.product_name,
    AI_SIMILARITY(p.description, t.description) AS similarity
FROM products p, target_product t
WHERE p.product_id != 101
ORDER BY similarity DESC
LIMIT 10

WITH target_product AS (
    SELECT description
    FROM products
    WHERE product_id = 101
)
SELECT 
    p.product_id,
    p.product_name,
    AI_SIMILARITY(p.description, t.description) AS similarity
FROM products p, target_product t
WHERE p.product_id != 101
ORDER BY similarity DESC
LIMIT 10

WITH target_product AS (
    SELECT description
    FROM products
    WHERE product_id = 101
)
SELECT 
    p.product_id,
    p.product_name,
    AI_SIMILARITY(p.description, t.description) AS similarity
FROM products p, target_product t
WHERE p.product_id != 101
ORDER BY similarity DESC
LIMIT 10

Example 3: Detect Duplicate Support Tickets

SELECT 
    t1.ticket_id AS ticket_1,
    t2.ticket_id AS ticket_2,
    AI_SIMILARITY(t1.description, t2.description) AS similarity
FROM support_tickets t1
JOIN support_tickets t2 
    ON t1.ticket_id < t2.ticket_id
WHERE AI_SIMILARITY(t1.description, t2.description) > 0.85
  AND t1.created_date >= CURRENT_DATE - 7

SELECT 
    t1.ticket_id AS ticket_1,
    t2.ticket_id AS ticket_2,
    AI_SIMILARITY(t1.description, t2.description) AS similarity
FROM support_tickets t1
JOIN support_tickets t2 
    ON t1.ticket_id < t2.ticket_id
WHERE AI_SIMILARITY(t1.description, t2.description) > 0.85
  AND t1.created_date >= CURRENT_DATE - 7

SELECT 
    t1.ticket_id AS ticket_1,
    t2.ticket_id AS ticket_2,
    AI_SIMILARITY(t1.description, t2.description) AS similarity
FROM support_tickets t1
JOIN support_tickets t2 
    ON t1.ticket_id < t2.ticket_id
WHERE AI_SIMILARITY(t1.description, t2.description) > 0.85
  AND t1.created_date >= CURRENT_DATE - 7

Example 4: Content Recommendation

WITH user_preferences AS (
    SELECT 'I enjoy reading about machine learning and AI' AS preference
)
SELECT 
    article_id,
    title,
    AI_SIMILARITY(content_summary, u.preference) AS relevance_score
FROM articles a, user_preferences u
WHERE published_date >= CURRENT_DATE - 30
ORDER BY relevance_score DESC
LIMIT 20

WITH user_preferences AS (
    SELECT 'I enjoy reading about machine learning and AI' AS preference
)
SELECT 
    article_id,
    title,
    AI_SIMILARITY(content_summary, u.preference) AS relevance_score
FROM articles a, user_preferences u
WHERE published_date >= CURRENT_DATE - 30
ORDER BY relevance_score DESC
LIMIT 20

WITH user_preferences AS (
    SELECT 'I enjoy reading about machine learning and AI' AS preference
)
SELECT 
    article_id,
    title,
    AI_SIMILARITY(content_summary, u.preference) AS relevance_score
FROM articles a, user_preferences u
WHERE published_date >= CURRENT_DATE - 30
ORDER BY relevance_score DESC
LIMIT 20

Example 5: Image Similarity

SELECT 
    img1.image_id,
    img2.image_id,
    AI_SIMILARITY(
        TO_FILE('@images/' || img1.filename),
        TO_FILE('@images/' || img2.filename)
    ) AS image_similarity
FROM image_catalog img1
CROSS JOIN image_catalog img2
WHERE img1.image_id < img2.image_id
  AND AI_SIMILARITY(
        TO_FILE('@images/' || img1.filename),
        TO_FILE('@images/' || img2.filename)
    ) > 0.9

SELECT 
    img1.image_id,
    img2.image_id,
    AI_SIMILARITY(
        TO_FILE('@images/' || img1.filename),
        TO_FILE('@images/' || img2.filename)
    ) AS image_similarity
FROM image_catalog img1
CROSS JOIN image_catalog img2
WHERE img1.image_id < img2.image_id
  AND AI_SIMILARITY(
        TO_FILE('@images/' || img1.filename),
        TO_FILE('@images/' || img2.filename)
    ) > 0.9

SELECT 
    img1.image_id,
    img2.image_id,
    AI_SIMILARITY(
        TO_FILE('@images/' || img1.filename),
        TO_FILE('@images/' || img2.filename)
    ) AS image_similarity
FROM image_catalog img1
CROSS JOIN image_catalog img2
WHERE img1.image_id < img2.image_id
  AND AI_SIMILARITY(
        TO_FILE('@images/' || img1.filename),
        TO_FILE('@images/' || img2.filename)
    ) > 0.9

Data Output Examples

Similarity Scores

Text 1: "How to optimize SQL queries"
Text 2: "SQL query optimization techniques"
Similarity: 0.94 (very similar)

Text 1: "Machine learning algorithms"
Text 2: "Database management systems"
Similarity: 0.32 (somewhat related)

Text 1: "Python programming tutorial"
Text 2: "Chocolate cake recipe"
Similarity: 0.05 (not related)
Text 1: "How to optimize SQL queries"
Text 2: "SQL query optimization techniques"
Similarity: 0.94 (very similar)

Text 1: "Machine learning algorithms"
Text 2: "Database management systems"
Similarity: 0.32 (somewhat related)

Text 1: "Python programming tutorial"
Text 2: "Chocolate cake recipe"
Similarity: 0.05 (not related)
Text 1: "How to optimize SQL queries"
Text 2: "SQL query optimization techniques"
Similarity: 0.94 (very similar)

Text 1: "Machine learning algorithms"
Text 2: "Database management systems"
Similarity: 0.32 (somewhat related)

Text 1: "Python programming tutorial"
Text 2: "Chocolate cake recipe"
Similarity: 0.05 (not related)

Duplicate Detection

Original Ticket: "Unable to login to application"
Potential Duplicates:
- "Cannot log into the app" - 0.91 similarity
- "Login page not working" - 0.87 similarity
- "Authentication failure" - 0.73 similarity
Original Ticket: "Unable to login to application"
Potential Duplicates:
- "Cannot log into the app" - 0.91 similarity
- "Login page not working" - 0.87 similarity
- "Authentication failure" - 0.73 similarity
Original Ticket: "Unable to login to application"
Potential Duplicates:
- "Cannot log into the app" - 0.91 similarity
- "Login page not working" - 0.87 similarity
- "Authentication failure" - 0.73 similarity

Model Information

  • Similarity Metric: Cosine similarity

  • Output Range: 0.0 (no similarity) to 1.0 (identical)

  • Supported Inputs: Text and images

Limitations & Considerations

Input Size

  • Subject to embedding model's context window

  • Typically 512 to 128K tokens depending on model

  • Use AI_COUNT_TOKENS to verify

Cost

  • Generates embeddings for both inputs

  • Billing based on input tokens for both

  • Consider caching embeddings for repeated comparisons

Performance

  • Suitable for batch processing

  • Use MEDIUM warehouse or smaller

  • For large-scale similarity, pre-compute embeddings

Regional Availability

  • AWS US West/East: ✓

  • Azure East US: ✓

  • EU regions: ✓

  • Cross-region inference: ✓

Best Practices

1. Set Appropriate Thresholds

-- Define similarity categories
SELECT 
    CASE 
        WHEN AI_SIMILARITY(text1, text2) > 0.9 THEN 'Near Duplicate'
        WHEN AI_SIMILARITY(text1, text2) > 0.7 THEN 'Highly Similar'
        WHEN AI_SIMILARITY(text1, text2) > 0.5 THEN 'Moderately Similar'
        ELSE 'Different'
    END AS similarity_category
FROM

-- Define similarity categories
SELECT 
    CASE 
        WHEN AI_SIMILARITY(text1, text2) > 0.9 THEN 'Near Duplicate'
        WHEN AI_SIMILARITY(text1, text2) > 0.7 THEN 'Highly Similar'
        WHEN AI_SIMILARITY(text1, text2) > 0.5 THEN 'Moderately Similar'
        ELSE 'Different'
    END AS similarity_category
FROM

-- Define similarity categories
SELECT 
    CASE 
        WHEN AI_SIMILARITY(text1, text2) > 0.9 THEN 'Near Duplicate'
        WHEN AI_SIMILARITY(text1, text2) > 0.7 THEN 'Highly Similar'
        WHEN AI_SIMILARITY(text1, text2) > 0.5 THEN 'Moderately Similar'
        ELSE 'Different'
    END AS similarity_category
FROM

2. Cache Embeddings for Efficiency

-- Instead of computing similarity repeatedly
-- Pre-compute and store embeddings
CREATE TABLE article_embeddings AS
SELECT 
    article_id,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding
FROM articles;

-- Then use VECTOR_COSINE_SIMILARITY for faster comparisons
SELECT 
    a1.article_id,
    a2.article_id,
    VECTOR_COSINE_SIMILARITY(a1.embedding, a2.embedding) AS similarity
FROM article_embeddings a1, article_embeddings a2
WHERE

-- Instead of computing similarity repeatedly
-- Pre-compute and store embeddings
CREATE TABLE article_embeddings AS
SELECT 
    article_id,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding
FROM articles;

-- Then use VECTOR_COSINE_SIMILARITY for faster comparisons
SELECT 
    a1.article_id,
    a2.article_id,
    VECTOR_COSINE_SIMILARITY(a1.embedding, a2.embedding) AS similarity
FROM article_embeddings a1, article_embeddings a2
WHERE

-- Instead of computing similarity repeatedly
-- Pre-compute and store embeddings
CREATE TABLE article_embeddings AS
SELECT 
    article_id,
    AI_EMBED('snowflake-arctic-embed-l-v2.0', content) AS embedding
FROM articles;

-- Then use VECTOR_COSINE_SIMILARITY for faster comparisons
SELECT 
    a1.article_id,
    a2.article_id,
    VECTOR_COSINE_SIMILARITY(a1.embedding, a2.embedding) AS similarity
FROM article_embeddings a1, article_embeddings a2
WHERE

3. Optimize Comparison Queries

-- Limit comparisons to relevant subsets
SELECT 
    t1.ticket_id,
    t2.ticket_id,
    AI_SIMILARITY(t1.description, t2.description) AS similarity
FROM support_tickets t1
JOIN support_tickets t2 
    ON t1.category = t2.category  -- Same category only
    AND t1.ticket_id < t2.ticket_id
    AND ABS(DATEDIFF('day', t1.created_date, t2.created_date)) <= 7  -- Within week
WHERE AI_SIMILARITY(t1.description, t2.description) > 0.8

-- Limit comparisons to relevant subsets
SELECT 
    t1.ticket_id,
    t2.ticket_id,
    AI_SIMILARITY(t1.description, t2.description) AS similarity
FROM support_tickets t1
JOIN support_tickets t2 
    ON t1.category = t2.category  -- Same category only
    AND t1.ticket_id < t2.ticket_id
    AND ABS(DATEDIFF('day', t1.created_date, t2.created_date)) <= 7  -- Within week
WHERE AI_SIMILARITY(t1.description, t2.description) > 0.8

-- Limit comparisons to relevant subsets
SELECT 
    t1.ticket_id,
    t2.ticket_id,
    AI_SIMILARITY(t1.description, t2.description) AS similarity
FROM support_tickets t1
JOIN support_tickets t2 
    ON t1.category = t2.category  -- Same category only
    AND t1.ticket_id < t2.ticket_id
    AND ABS(DATEDIFF('day', t1.created_date, t2.created_date)) <= 7  -- Within week
WHERE AI_SIMILARITY(t1.description, t2.description) > 0.8

4. Combine with Filters

WITH similar_products AS (
    SELECT 
        p1.product_id AS product_a,
        p2.product_id AS product_b,
        AI_SIMILARITY(p1.description, p2.description) AS similarity
    FROM products p1, products p2
    WHERE p1.product_id < p2.product_id
      AND p1.category = p2.category  -- Pre-filter by category
)
SELECT *
FROM similar_products
WHERE similarity > 0.75

WITH similar_products AS (
    SELECT 
        p1.product_id AS product_a,
        p2.product_id AS product_b,
        AI_SIMILARITY(p1.description, p2.description) AS similarity
    FROM products p1, products p2
    WHERE p1.product_id < p2.product_id
      AND p1.category = p2.category  -- Pre-filter by category
)
SELECT *
FROM similar_products
WHERE similarity > 0.75

WITH similar_products AS (
    SELECT 
        p1.product_id AS product_a,
        p2.product_id AS product_b,
        AI_SIMILARITY(p1.description, p2.description) AS similarity
    FROM products p1, products p2
    WHERE p1.product_id < p2.product_id
      AND p1.category = p2.category  -- Pre-filter by category
)
SELECT *
FROM similar_products
WHERE similarity > 0.75

Common Use Cases

Deduplication

-- Find and flag duplicate customer inquiries
SELECT 
    newer.inquiry_id,
    older.inquiry_id AS potential_duplicate,
    AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) AS similarity
FROM inquiries newer
JOIN inquiries older
    ON newer.created_at > older.created_at
    AND older.created_at >= newer.created_at - INTERVAL '7 days'
WHERE AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) > 0.9

-- Find and flag duplicate customer inquiries
SELECT 
    newer.inquiry_id,
    older.inquiry_id AS potential_duplicate,
    AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) AS similarity
FROM inquiries newer
JOIN inquiries older
    ON newer.created_at > older.created_at
    AND older.created_at >= newer.created_at - INTERVAL '7 days'
WHERE AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) > 0.9

-- Find and flag duplicate customer inquiries
SELECT 
    newer.inquiry_id,
    older.inquiry_id AS potential_duplicate,
    AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) AS similarity
FROM inquiries newer
JOIN inquiries older
    ON newer.created_at > older.created_at
    AND older.created_at >= newer.created_at - INTERVAL '7 days'
WHERE AI_SIMILARITY(newer.inquiry_text, older.inquiry_text) > 0.9

Recommendation Engine

WITH user_history AS (
    SELECT array_agg(article_content) AS read_articles
    FROM reading_history
    WHERE user_id = 123
)
SELECT 
    a.article_id,
    a.title,
    MAX(AI_SIMILARITY(a.content, h.read_article)) AS max_similarity
FROM articles a, user_history, LATERAL FLATTEN(read_articles) h
WHERE a.article_id NOT IN (SELECT article_id FROM reading_history WHERE user_id = 123)
GROUP BY a.article_id, a.title
ORDER BY max_similarity DESC
LIMIT 10

WITH user_history AS (
    SELECT array_agg(article_content) AS read_articles
    FROM reading_history
    WHERE user_id = 123
)
SELECT 
    a.article_id,
    a.title,
    MAX(AI_SIMILARITY(a.content, h.read_article)) AS max_similarity
FROM articles a, user_history, LATERAL FLATTEN(read_articles) h
WHERE a.article_id NOT IN (SELECT article_id FROM reading_history WHERE user_id = 123)
GROUP BY a.article_id, a.title
ORDER BY max_similarity DESC
LIMIT 10

WITH user_history AS (
    SELECT array_agg(article_content) AS read_articles
    FROM reading_history
    WHERE user_id = 123
)
SELECT 
    a.article_id,
    a.title,
    MAX(AI_SIMILARITY(a.content, h.read_article)) AS max_similarity
FROM articles a, user_history, LATERAL FLATTEN(read_articles) h
WHERE a.article_id NOT IN (SELECT article_id FROM reading_history WHERE user_id = 123)
GROUP BY a.article_id, a.title
ORDER BY max_similarity DESC
LIMIT 10

Related Functions

  • AI_EMBED - Generate embeddings for storage

  • VECTOR_COSINE_SIMILARITY - Direct vector similarity

  • AI_CLASSIFY - For categorical similarity

Interested to Learn More?
Try Out the Free 14-Days Trial

More Articles

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.