SQL Keywords

AI_EXTRACT

Feb 23, 2026

min read

AI_EXTRACT

Overview

Extracts structured information from text, images, or documents based on natural language instructions. Supports multiple languages and file formats.

Syntax

AI_EXTRACT(
  input,
  instruction,
  [response_format]
)

AI_EXTRACT(
  input,
  instruction,
  [response_format]
)

AI_EXTRACT(
  input,
  instruction,
  [response_format]
)

Parameters

input (VARCHAR or FILE): Text string or file reference from a stage
instruction (VARCHAR): Natural language description of what to extract
response_format (OBJECT): Optional JSON schema defining the expected output structure

Use Cases

Extract entities from documents (names, dates, amounts)
Parse invoices and receipts
Extract key information from customer feedback
Structure unstructured data
Form filling and data entry automation
Contract analysis

Code Examples

Example 1: Extract Information from Text

SELECT AI_EXTRACT(
  'John Smith, age 45, lives in Seattle and works as a software engineer earning $120,000 annually.',
  'Extract the person''s name, age, city, occupation, and salary'
) AS

SELECT AI_EXTRACT(
  'John Smith, age 45, lives in Seattle and works as a software engineer earning $120,000 annually.',
  'Extract the person''s name, age, city, occupation, and salary'
) AS

SELECT AI_EXTRACT(
  'John Smith, age 45, lives in Seattle and works as a software engineer earning $120,000 annually.',
  'Extract the person''s name, age, city, occupation, and salary'
) AS

Output:

{
  "name": "John Smith",
  "age": 45,
  "city": "Seattle",
  "occupation": "software engineer",
  "salary": "$120,000"
}

{
  "name": "John Smith",
  "age": 45,
  "city": "Seattle",
  "occupation": "software engineer",
  "salary": "$120,000"
}

{
  "name": "John Smith",
  "age": 45,
  "city": "Seattle",
  "occupation": "software engineer",
  "salary": "$120,000"
}

Example 2: Extract from Multiple Records

SELECT 
    review_id,
    review_text,
    AI_EXTRACT(
        review_text,
        'Extract the product mentioned, rating (1-5), and main complaint if any'
    ) AS extracted_info
FROM customer_reviews
LIMIT 100

SELECT 
    review_id,
    review_text,
    AI_EXTRACT(
        review_text,
        'Extract the product mentioned, rating (1-5), and main complaint if any'
    ) AS extracted_info
FROM customer_reviews
LIMIT 100

SELECT 
    review_id,
    review_text,
    AI_EXTRACT(
        review_text,
        'Extract the product mentioned, rating (1-5), and main complaint if any'
    ) AS extracted_info
FROM customer_reviews
LIMIT 100

Output:

review_id | review_text | extracted_info
----------|-------------|---------------
1 | "The XYZ laptop is great but expensive" | {"product": "XYZ laptop", "rating": 4, "complaint": "expensive"}
2 | "Terrible service, phone broke after 1 week" | {"product": "phone", "rating": 1, "complaint": "broke after 1 week"}

review_id | review_text | extracted_info
----------|-------------|---------------
1 | "The XYZ laptop is great but expensive" | {"product": "XYZ laptop", "rating": 4, "complaint": "expensive"}
2 | "Terrible service, phone broke after 1 week" | {"product": "phone", "rating": 1, "complaint": "broke after 1 week"}

review_id | review_text | extracted_info
----------|-------------|---------------
1 | "The XYZ laptop is great but expensive" | {"product": "XYZ laptop", "rating": 4, "complaint": "expensive"}
2 | "Terrible service, phone broke after 1 week" | {"product": "phone", "rating": 1, "complaint": "broke after 1 week"}

Example 3: Extract from PDF Documents

-- First, create a stage and upload files
CREATE STAGE invoices_stage 
  DIRECTORY = (ENABLE = true)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Extract from invoices
SELECT 
    relative_path AS filename,
    AI_EXTRACT(
        TO_FILE('@invoices_stage/' || relative_path),
        'Extract invoice number, date, total amount, and vendor name'
    ) AS invoice_data
FROM DIRECTORY('@invoices_stage')
WHERE relative_path LIKE '%.pdf'

-- First, create a stage and upload files
CREATE STAGE invoices_stage 
  DIRECTORY = (ENABLE = true)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Extract from invoices
SELECT 
    relative_path AS filename,
    AI_EXTRACT(
        TO_FILE('@invoices_stage/' || relative_path),
        'Extract invoice number, date, total amount, and vendor name'
    ) AS invoice_data
FROM DIRECTORY('@invoices_stage')
WHERE relative_path LIKE '%.pdf'

-- First, create a stage and upload files
CREATE STAGE invoices_stage 
  DIRECTORY = (ENABLE = true)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Extract from invoices
SELECT 
    relative_path AS filename,
    AI_EXTRACT(
        TO_FILE('@invoices_stage/' || relative_path),
        'Extract invoice number, date, total amount, and vendor name'
    ) AS invoice_data
FROM DIRECTORY('@invoices_stage')
WHERE relative_path LIKE '%.pdf'

Output:

{
  "invoice_number": "INV-2024-001",
  "date": "2024-01-15",
  "total_amount": "$1,250.00",
  "vendor_name": "Acme Corporation"
}

{
  "invoice_number": "INV-2024-001",
  "date": "2024-01-15",
  "total_amount": "$1,250.00",
  "vendor_name": "Acme Corporation"
}

{
  "invoice_number": "INV-2024-001",
  "date": "2024-01-15",
  "total_amount": "$1,250.00",
  "vendor_name": "Acme Corporation"
}

Example 4: Structured Output with Schema

SELECT AI_EXTRACT(
  'Meeting scheduled for January 15th at 2 PM with Sarah Johnson to discuss Q1 budget',
  'Extract meeting details',
  {
    'type': 'object',
    'properties': {
      'date': {'type': 'string', 'format': 'date'},
      'time': {'type': 'string'},
      'attendee': {'type': 'string'},
      'topic': {'type': 'string'}
    }
  }
) AS

SELECT AI_EXTRACT(
  'Meeting scheduled for January 15th at 2 PM with Sarah Johnson to discuss Q1 budget',
  'Extract meeting details',
  {
    'type': 'object',
    'properties': {
      'date': {'type': 'string', 'format': 'date'},
      'time': {'type': 'string'},
      'attendee': {'type': 'string'},
      'topic': {'type': 'string'}
    }
  }
) AS

SELECT AI_EXTRACT(
  'Meeting scheduled for January 15th at 2 PM with Sarah Johnson to discuss Q1 budget',
  'Extract meeting details',
  {
    'type': 'object',
    'properties': {
      'date': {'type': 'string', 'format': 'date'},
      'time': {'type': 'string'},
      'attendee': {'type': 'string'},
      'topic': {'type': 'string'}
    }
  }
) AS

Output:

{
  "date": "2024-01-15",
  "time": "2:00 PM",
  "attendee": "Sarah Johnson",
  "topic": "Q1 budget"
}

{
  "date": "2024-01-15",
  "time": "2:00 PM",
  "attendee": "Sarah Johnson",
  "topic": "Q1 budget"
}

{
  "date": "2024-01-15",
  "time": "2:00 PM",
  "attendee": "Sarah Johnson",
  "topic": "Q1 budget"
}

Example 5: Batch Processing Emails

SELECT 
    email_id,
    sender,
    AI_EXTRACT(
        email_body,
        'Extract: sender intent (question/complaint/request), urgency (high/medium/low), and requested action'
    ) AS email_analysis
FROM support_emails
WHERE received_date >= CURRENT_DATE - 7

SELECT 
    email_id,
    sender,
    AI_EXTRACT(
        email_body,
        'Extract: sender intent (question/complaint/request), urgency (high/medium/low), and requested action'
    ) AS email_analysis
FROM support_emails
WHERE received_date >= CURRENT_DATE - 7

SELECT 
    email_id,
    sender,
    AI_EXTRACT(
        email_body,
        'Extract: sender intent (question/complaint/request), urgency (high/medium/low), and requested action'
    ) AS email_analysis
FROM support_emails
WHERE received_date >= CURRENT_DATE - 7

Data Output Examples

Simple Extraction

Input: "Order #12345 shipped to 123 Main St, Boston, MA on 2024-02-01"
Instruction: "Extract order number, address, and ship date"

Output:
{
  "order_number": "12345",
  "address": "123 Main St, Boston, MA",
  "ship_date": "2024-02-01"
}

Input: "Order #12345 shipped to 123 Main St, Boston, MA on 2024-02-01"
Instruction: "Extract order number, address, and ship date"

Output:
{
  "order_number": "12345",
  "address": "123 Main St, Boston, MA",
  "ship_date": "2024-02-01"
}

Input: "Order #12345 shipped to 123 Main St, Boston, MA on 2024-02-01"
Instruction: "Extract order number, address, and ship date"

Output:
{
  "order_number": "12345",
  "address": "123 Main St, Boston, MA",
  "ship_date": "2024-02-01"
}

Complex Document Parsing

Input: PDF contract document
Instruction: "Extract all parties, contract start date, end date, and payment terms"

Output:
{
  "parties": ["ABC Corp", "XYZ Ltd"],
  "start_date": "2024-01-01",
  "end_date": "2025-12-31",
  "payment_terms": "Net 30 days from invoice date"
}

Input: PDF contract document
Instruction: "Extract all parties, contract start date, end date, and payment terms"

Output:
{
  "parties": ["ABC Corp", "XYZ Ltd"],
  "start_date": "2024-01-01",
  "end_date": "2025-12-31",
  "payment_terms": "Net 30 days from invoice date"
}

Input: PDF contract document
Instruction: "Extract all parties, contract start date, end date, and payment terms"

Output:
{
  "parties": ["ABC Corp", "XYZ Ltd"],
  "start_date": "2024-01-01",
  "end_date": "2025-12-31",
  "payment_terms": "Net 30 days from invoice date"
}

Model Information

Model Used: arctic-extract
Context Window: 128,000 tokens
Max Output: 51,200 tokens
Supported Languages: Multiple (English, Spanish, French, German, etc.)

File Format Support

Text files (.txt, .md)
Documents (.pdf, .docx)
Images (.jpg, .png) - requires OCR
Structured files (.json, .xml, .csv)

Limitations & Considerations

Input Size

Maximum 128,000 tokens per input
For documents with pages: Each page = 970 tokens
Use AI_COUNT_TOKENS to check input size

Cost

Billing based on input AND output tokens
Response format schema counts as input tokens
Document pages are billed at 970 tokens per page

Accuracy

Works best with clear, specific instructions
Complex extractions may require schema definition
Results may vary with poor quality scans/images

Performance

Optimized for batch processing
Use MEDIUM or smaller warehouse
Processing time increases with document complexity

Regional Availability

AWS US West 2 (Oregon): ✓
AWS US East 1 (N. Virginia): ✓
Azure East US 2: ✓
Europe regions: ✓
Cross-region inference: ✓

Best Practices

1. Be Specific in Instructions

-- Good
'Extract customer name, email, and phone number in E.164 format'

-- Less effective
'Get contact info'

-- Good
'Extract customer name, email, and phone number in E.164 format'

-- Less effective
'Get contact info'

-- Good
'Extract customer name, email, and phone number in E.164 format'

-- Less effective
'Get contact info'

2. Use Response Format for Consistency

Define JSON schema when you need structured, predictable output
Helps with downstream processing

3. Handle Large Documents

-- Check token count first
SELECT AI_COUNT_TOKENS('arctic-extract', file_content) AS token_count
FROM

-- Check token count first
SELECT AI_COUNT_TOKENS('arctic-extract', file_content) AS token_count
FROM

-- Check token count first
SELECT AI_COUNT_TOKENS('arctic-extract', file_content) AS token_count
FROM

4. Error Handling

SELECT 
    TRY_CAST(AI_EXTRACT(text, instruction) AS VARIANT) AS extracted,
    CASE 
        WHEN extracted IS NULL THEN 'Extraction failed'
        ELSE 'Success'
    END AS status
FROM

SELECT 
    TRY_CAST(AI_EXTRACT(text, instruction) AS VARIANT) AS extracted,
    CASE 
        WHEN extracted IS NULL THEN 'Extraction failed'
        ELSE 'Success'
    END AS status
FROM

SELECT 
    TRY_CAST(AI_EXTRACT(text, instruction) AS VARIANT) AS extracted,
    CASE 
        WHEN extracted IS NULL THEN 'Extraction failed'
        ELSE 'Success'
    END AS status
FROM

Related Functions

AI_PARSE_DOCUMENT - For OCR and layout extraction
AI_COMPLETE - For more complex text generation
TO_FILE - For referencing staged files
AI_COUNT_TOKENS - Estimate token usage

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Platform

Resources

ADD-ONs

Industries

About

Legal

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Platform

Resources