AI_EXTRACT

Feb 23, 2026

·

5

min read

AI_EXTRACT

Overview

Extracts structured information from text, images, or documents based on natural language instructions. Supports multiple languages and file formats.

Syntax

AI_EXTRACT(
  input,
  instruction,
  [response_format]
)
AI_EXTRACT(
  input,
  instruction,
  [response_format]
)
AI_EXTRACT(
  input,
  instruction,
  [response_format]
)

Parameters

  • input (VARCHAR or FILE): Text string or file reference from a stage

  • instruction (VARCHAR): Natural language description of what to extract

  • response_format (OBJECT): Optional JSON schema defining the expected output structure

Use Cases

  • Extract entities from documents (names, dates, amounts)

  • Parse invoices and receipts

  • Extract key information from customer feedback

  • Structure unstructured data

  • Form filling and data entry automation

  • Contract analysis

Code Examples

Example 1: Extract Information from Text

SELECT AI_EXTRACT(
  'John Smith, age 45, lives in Seattle and works as a software engineer earning $120,000 annually.',
  'Extract the person''s name, age, city, occupation, and salary'
) AS

SELECT AI_EXTRACT(
  'John Smith, age 45, lives in Seattle and works as a software engineer earning $120,000 annually.',
  'Extract the person''s name, age, city, occupation, and salary'
) AS

SELECT AI_EXTRACT(
  'John Smith, age 45, lives in Seattle and works as a software engineer earning $120,000 annually.',
  'Extract the person''s name, age, city, occupation, and salary'
) AS

Output:

{
  "name": "John Smith",
  "age": 45,
  "city": "Seattle",
  "occupation": "software engineer",
  "salary": "$120,000"
}
{
  "name": "John Smith",
  "age": 45,
  "city": "Seattle",
  "occupation": "software engineer",
  "salary": "$120,000"
}
{
  "name": "John Smith",
  "age": 45,
  "city": "Seattle",
  "occupation": "software engineer",
  "salary": "$120,000"
}

Example 2: Extract from Multiple Records

SELECT 
    review_id,
    review_text,
    AI_EXTRACT(
        review_text,
        'Extract the product mentioned, rating (1-5), and main complaint if any'
    ) AS extracted_info
FROM customer_reviews
LIMIT 100

SELECT 
    review_id,
    review_text,
    AI_EXTRACT(
        review_text,
        'Extract the product mentioned, rating (1-5), and main complaint if any'
    ) AS extracted_info
FROM customer_reviews
LIMIT 100

SELECT 
    review_id,
    review_text,
    AI_EXTRACT(
        review_text,
        'Extract the product mentioned, rating (1-5), and main complaint if any'
    ) AS extracted_info
FROM customer_reviews
LIMIT 100

Output:

review_id | review_text | extracted_info
----------|-------------|---------------
1 | "The XYZ laptop is great but expensive" | {"product": "XYZ laptop", "rating": 4, "complaint": "expensive"}
2 | "Terrible service, phone broke after 1 week" | {"product": "phone", "rating": 1, "complaint": "broke after 1 week"}
review_id | review_text | extracted_info
----------|-------------|---------------
1 | "The XYZ laptop is great but expensive" | {"product": "XYZ laptop", "rating": 4, "complaint": "expensive"}
2 | "Terrible service, phone broke after 1 week" | {"product": "phone", "rating": 1, "complaint": "broke after 1 week"}
review_id | review_text | extracted_info
----------|-------------|---------------
1 | "The XYZ laptop is great but expensive" | {"product": "XYZ laptop", "rating": 4, "complaint": "expensive"}
2 | "Terrible service, phone broke after 1 week" | {"product": "phone", "rating": 1, "complaint": "broke after 1 week"}

Example 3: Extract from PDF Documents

-- First, create a stage and upload files
CREATE STAGE invoices_stage 
  DIRECTORY = (ENABLE = true)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Extract from invoices
SELECT 
    relative_path AS filename,
    AI_EXTRACT(
        TO_FILE('@invoices_stage/' || relative_path),
        'Extract invoice number, date, total amount, and vendor name'
    ) AS invoice_data
FROM DIRECTORY('@invoices_stage')
WHERE relative_path LIKE '%.pdf'

-- First, create a stage and upload files
CREATE STAGE invoices_stage 
  DIRECTORY = (ENABLE = true)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Extract from invoices
SELECT 
    relative_path AS filename,
    AI_EXTRACT(
        TO_FILE('@invoices_stage/' || relative_path),
        'Extract invoice number, date, total amount, and vendor name'
    ) AS invoice_data
FROM DIRECTORY('@invoices_stage')
WHERE relative_path LIKE '%.pdf'

-- First, create a stage and upload files
CREATE STAGE invoices_stage 
  DIRECTORY = (ENABLE = true)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

-- Extract from invoices
SELECT 
    relative_path AS filename,
    AI_EXTRACT(
        TO_FILE('@invoices_stage/' || relative_path),
        'Extract invoice number, date, total amount, and vendor name'
    ) AS invoice_data
FROM DIRECTORY('@invoices_stage')
WHERE relative_path LIKE '%.pdf'

Output:

{
  "invoice_number": "INV-2024-001",
  "date": "2024-01-15",
  "total_amount": "$1,250.00",
  "vendor_name": "Acme Corporation"
}
{
  "invoice_number": "INV-2024-001",
  "date": "2024-01-15",
  "total_amount": "$1,250.00",
  "vendor_name": "Acme Corporation"
}
{
  "invoice_number": "INV-2024-001",
  "date": "2024-01-15",
  "total_amount": "$1,250.00",
  "vendor_name": "Acme Corporation"
}

Example 4: Structured Output with Schema

SELECT AI_EXTRACT(
  'Meeting scheduled for January 15th at 2 PM with Sarah Johnson to discuss Q1 budget',
  'Extract meeting details',
  {
    'type': 'object',
    'properties': {
      'date': {'type': 'string', 'format': 'date'},
      'time': {'type': 'string'},
      'attendee': {'type': 'string'},
      'topic': {'type': 'string'}
    }
  }
) AS

SELECT AI_EXTRACT(
  'Meeting scheduled for January 15th at 2 PM with Sarah Johnson to discuss Q1 budget',
  'Extract meeting details',
  {
    'type': 'object',
    'properties': {
      'date': {'type': 'string', 'format': 'date'},
      'time': {'type': 'string'},
      'attendee': {'type': 'string'},
      'topic': {'type': 'string'}
    }
  }
) AS

SELECT AI_EXTRACT(
  'Meeting scheduled for January 15th at 2 PM with Sarah Johnson to discuss Q1 budget',
  'Extract meeting details',
  {
    'type': 'object',
    'properties': {
      'date': {'type': 'string', 'format': 'date'},
      'time': {'type': 'string'},
      'attendee': {'type': 'string'},
      'topic': {'type': 'string'}
    }
  }
) AS

Output:

{
  "date": "2024-01-15",
  "time": "2:00 PM",
  "attendee": "Sarah Johnson",
  "topic": "Q1 budget"
}
{
  "date": "2024-01-15",
  "time": "2:00 PM",
  "attendee": "Sarah Johnson",
  "topic": "Q1 budget"
}
{
  "date": "2024-01-15",
  "time": "2:00 PM",
  "attendee": "Sarah Johnson",
  "topic": "Q1 budget"
}

Example 5: Batch Processing Emails

SELECT 
    email_id,
    sender,
    AI_EXTRACT(
        email_body,
        'Extract: sender intent (question/complaint/request), urgency (high/medium/low), and requested action'
    ) AS email_analysis
FROM support_emails
WHERE received_date >= CURRENT_DATE - 7

SELECT 
    email_id,
    sender,
    AI_EXTRACT(
        email_body,
        'Extract: sender intent (question/complaint/request), urgency (high/medium/low), and requested action'
    ) AS email_analysis
FROM support_emails
WHERE received_date >= CURRENT_DATE - 7

SELECT 
    email_id,
    sender,
    AI_EXTRACT(
        email_body,
        'Extract: sender intent (question/complaint/request), urgency (high/medium/low), and requested action'
    ) AS email_analysis
FROM support_emails
WHERE received_date >= CURRENT_DATE - 7

Data Output Examples

Simple Extraction

Input: "Order #12345 shipped to 123 Main St, Boston, MA on 2024-02-01"
Instruction: "Extract order number, address, and ship date"

Output:
{
  "order_number": "12345",
  "address": "123 Main St, Boston, MA",
  "ship_date": "2024-02-01"
}
Input: "Order #12345 shipped to 123 Main St, Boston, MA on 2024-02-01"
Instruction: "Extract order number, address, and ship date"

Output:
{
  "order_number": "12345",
  "address": "123 Main St, Boston, MA",
  "ship_date": "2024-02-01"
}
Input: "Order #12345 shipped to 123 Main St, Boston, MA on 2024-02-01"
Instruction: "Extract order number, address, and ship date"

Output:
{
  "order_number": "12345",
  "address": "123 Main St, Boston, MA",
  "ship_date": "2024-02-01"
}

Complex Document Parsing

Input: PDF contract document
Instruction: "Extract all parties, contract start date, end date, and payment terms"

Output:
{
  "parties": ["ABC Corp", "XYZ Ltd"],
  "start_date": "2024-01-01",
  "end_date": "2025-12-31",
  "payment_terms": "Net 30 days from invoice date"
}
Input: PDF contract document
Instruction: "Extract all parties, contract start date, end date, and payment terms"

Output:
{
  "parties": ["ABC Corp", "XYZ Ltd"],
  "start_date": "2024-01-01",
  "end_date": "2025-12-31",
  "payment_terms": "Net 30 days from invoice date"
}
Input: PDF contract document
Instruction: "Extract all parties, contract start date, end date, and payment terms"

Output:
{
  "parties": ["ABC Corp", "XYZ Ltd"],
  "start_date": "2024-01-01",
  "end_date": "2025-12-31",
  "payment_terms": "Net 30 days from invoice date"
}

Model Information

  • Model Used: arctic-extract

  • Context Window: 128,000 tokens

  • Max Output: 51,200 tokens

  • Supported Languages: Multiple (English, Spanish, French, German, etc.)

File Format Support

  • Text files (.txt, .md)

  • Documents (.pdf, .docx)

  • Images (.jpg, .png) - requires OCR

  • Structured files (.json, .xml, .csv)

Limitations & Considerations

Input Size

  • Maximum 128,000 tokens per input

  • For documents with pages: Each page = 970 tokens

  • Use AI_COUNT_TOKENS to check input size

Cost

  • Billing based on input AND output tokens

  • Response format schema counts as input tokens

  • Document pages are billed at 970 tokens per page

Accuracy

  • Works best with clear, specific instructions

  • Complex extractions may require schema definition

  • Results may vary with poor quality scans/images

Performance

  • Optimized for batch processing

  • Use MEDIUM or smaller warehouse

  • Processing time increases with document complexity

Regional Availability

  • AWS US West 2 (Oregon): ✓

  • AWS US East 1 (N. Virginia): ✓

  • Azure East US 2: ✓

  • Europe regions: ✓

  • Cross-region inference: ✓

Best Practices

1. Be Specific in Instructions

-- Good
'Extract customer name, email, and phone number in E.164 format'

-- Less effective
'Get contact info'
-- Good
'Extract customer name, email, and phone number in E.164 format'

-- Less effective
'Get contact info'
-- Good
'Extract customer name, email, and phone number in E.164 format'

-- Less effective
'Get contact info'

2. Use Response Format for Consistency

  • Define JSON schema when you need structured, predictable output

  • Helps with downstream processing

3. Handle Large Documents

-- Check token count first
SELECT AI_COUNT_TOKENS('arctic-extract', file_content) AS token_count
FROM

-- Check token count first
SELECT AI_COUNT_TOKENS('arctic-extract', file_content) AS token_count
FROM

-- Check token count first
SELECT AI_COUNT_TOKENS('arctic-extract', file_content) AS token_count
FROM

4. Error Handling

SELECT 
    TRY_CAST(AI_EXTRACT(text, instruction) AS VARIANT) AS extracted,
    CASE 
        WHEN extracted IS NULL THEN 'Extraction failed'
        ELSE 'Success'
    END AS status
FROM

SELECT 
    TRY_CAST(AI_EXTRACT(text, instruction) AS VARIANT) AS extracted,
    CASE 
        WHEN extracted IS NULL THEN 'Extraction failed'
        ELSE 'Success'
    END AS status
FROM

SELECT 
    TRY_CAST(AI_EXTRACT(text, instruction) AS VARIANT) AS extracted,
    CASE 
        WHEN extracted IS NULL THEN 'Extraction failed'
        ELSE 'Success'
    END AS status
FROM

Related Functions

  • AI_PARSE_DOCUMENT - For OCR and layout extraction

  • AI_COMPLETE - For more complex text generation

  • TO_FILE - For referencing staged files

  • AI_COUNT_TOKENS - Estimate token usage

Interested to Learn More?
Try Out the Free 14-Days Trial

More Articles

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.