Paradime | COPY INTO

SQL Keywords

COPY INTO

Feb 23, 2026

min read

Category: Data Manipulation Language (DML)

Platform Support:

✅ Snowflake | ✅ BigQuery (as LOAD DATA) | ✅ Databricks

Description

COPY INTO efficiently loads data from cloud storage (S3, GCS, Azure Blob Storage) into tables. It's optimized for bulk loading with built-in error handling, file tracking, and transformation capabilities. This is the recommended method for loading large volumes of data.

Syntax by Platform

Snowflake:

COPY INTO [database.]schema.table_name
FROM { @stage_name[/path] | 's3://bucket/path' | ... }
FILE_FORMAT = (TYPE = format_type [options])
[PATTERN = 'regex_pattern']
[ON_ERROR = { CONTINUE | SKIP_FILE | ABORT_STATEMENT }]
[VALIDATION_MODE = RETURN_ERRORS]

COPY INTO [database.]schema.table_name
FROM { @stage_name[/path] | 's3://bucket/path' | ... }
FILE_FORMAT = (TYPE = format_type [options])
[PATTERN = 'regex_pattern']
[ON_ERROR = { CONTINUE | SKIP_FILE | ABORT_STATEMENT }]
[VALIDATION_MODE = RETURN_ERRORS]

COPY INTO [database.]schema.table_name
FROM { @stage_name[/path] | 's3://bucket/path' | ... }
FILE_FORMAT = (TYPE = format_type [options])
[PATTERN = 'regex_pattern']
[ON_ERROR = { CONTINUE | SKIP_FILE | ABORT_STATEMENT }]
[VALIDATION_MODE = RETURN_ERRORS]

BigQuery:

LOAD DATA INTO [project.]dataset.table_name
FROM FILES (
  format = 'format_type',
  uris = ['gs://bucket/path/*.ext'],
  [options]
)

LOAD DATA INTO [project.]dataset.table_name
FROM FILES (
  format = 'format_type',
  uris = ['gs://bucket/path/*.ext'],
  [options]
)

LOAD DATA INTO [project.]dataset.table_name
FROM FILES (
  format = 'format_type',
  uris = ['gs://bucket/path/*.ext'],
  [options]
)

Databricks:

COPY INTO [catalog.]schema.table_name
FROM 'storage_path'
FILEFORMAT = format_type
FORMAT_OPTIONS ('key' = 'value', ...)
COPY_OPTIONS ('key' = 'value', ...)

COPY INTO [catalog.]schema.table_name
FROM 'storage_path'
FILEFORMAT = format_type
FORMAT_OPTIONS ('key' = 'value', ...)
COPY_OPTIONS ('key' = 'value', ...)

COPY INTO [catalog.]schema.table_name
FROM 'storage_path'
FILEFORMAT = format_type
FORMAT_OPTIONS ('key' = 'value', ...)
COPY_OPTIONS ('key' = 'value', ...)

Platform-Specific Notes

Snowflake:

Supports internal and external stages
Tracks loaded files automatically (idempotent)
Can transform data during load with SELECT
Best for incremental loads
Supports VALIDATION_MODE to test before loading

BigQuery:

Uses LOAD DATA statement (not COPY INTO)
Automatic schema detection available
Can append or overwrite tables
Also accessible via bq load CLI command
Supports CSV, JSON, Avro, Parquet, ORC

Databricks:

Requires Delta Lake format for best features
Automatic file tracking (idempotent by default)
Schema evolution support with mergeSchema
Compatible with all cloud storage providers
Supports COPY_OPTIONS like force to reload files

Example 1: Load CSV from Cloud Storage

Snowflake:

-- Create file format (reusable)
CREATE OR REPLACE FILE FORMAT csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    COMPRESSION = 'AUTO';

-- Load data
COPY INTO customers
FROM @my_s3_stage/data/customers/
FILE_FORMAT = csv_format
PATTERN = '.*customer_.*[.]csv'
ON_ERROR = 'SKIP_FILE'
RETURN_FAILED_ONLY = TRUE

-- Create file format (reusable)
CREATE OR REPLACE FILE FORMAT csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    COMPRESSION = 'AUTO';

-- Load data
COPY INTO customers
FROM @my_s3_stage/data/customers/
FILE_FORMAT = csv_format
PATTERN = '.*customer_.*[.]csv'
ON_ERROR = 'SKIP_FILE'
RETURN_FAILED_ONLY = TRUE

-- Create file format (reusable)
CREATE OR REPLACE FILE FORMAT csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    COMPRESSION = 'AUTO';

-- Load data
COPY INTO customers
FROM @my_s3_stage/data/customers/
FILE_FORMAT = csv_format
PATTERN = '.*customer_.*[.]csv'
ON_ERROR = 'SKIP_FILE'
RETURN_FAILED_ONLY = TRUE

BigQuery:

LOAD DATA INTO my_dataset.customers
FROM FILES (
  format = 'CSV',
  uris = ['gs://my-bucket/data/customers/*.csv'],
  skip_leading_rows = 1,
  field_delimiter = ',',
  null_marker = 'NULL',
  allow_jagged_rows = false,
  allow_quoted_newlines = true
)

LOAD DATA INTO my_dataset.customers
FROM FILES (
  format = 'CSV',
  uris = ['gs://my-bucket/data/customers/*.csv'],
  skip_leading_rows = 1,
  field_delimiter = ',',
  null_marker = 'NULL',
  allow_jagged_rows = false,
  allow_quoted_newlines = true
)

LOAD DATA INTO my_dataset.customers
FROM FILES (
  format = 'CSV',
  uris = ['gs://my-bucket/data/customers/*.csv'],
  skip_leading_rows = 1,
  field_delimiter = ',',
  null_marker = 'NULL',
  allow_jagged_rows = false,
  allow_quoted_newlines = true
)

Databricks:

COPY INTO customers
FROM 's3://my-bucket/data/customers/'
FILEFORMAT = CSV
FORMAT_OPTIONS (
    'header' = 'true',
    'delimiter' = ',',
    'nullValue' = 'NULL',
    'inferSchema' = 'false'
)
COPY_OPTIONS (
    'mergeSchema' = 'false',
    'force' = 'false'  -- skip already loaded files
)

COPY INTO customers
FROM 's3://my-bucket/data/customers/'
FILEFORMAT = CSV
FORMAT_OPTIONS (
    'header' = 'true',
    'delimiter' = ',',
    'nullValue' = 'NULL',
    'inferSchema' = 'false'
)
COPY_OPTIONS (
    'mergeSchema' = 'false',
    'force' = 'false'  -- skip already loaded files
)

COPY INTO customers
FROM 's3://my-bucket/data/customers/'
FILEFORMAT = CSV
FORMAT_OPTIONS (
    'header' = 'true',
    'delimiter' = ',',
    'nullValue' = 'NULL',
    'inferSchema' = 'false'
)
COPY_OPTIONS (
    'mergeSchema' = 'false',
    'force' = 'false'  -- skip already loaded files
)

Sample CSV File (customer_001.csv):

customer_id,name,email,country,signup_date
1,John Smith,john@email.com,USA,2024-01-15
2,Maria Garcia,maria@email.com,Spain,2024-01-16
3,NULL,sarah@email.com,USA,2024-01-17
4,Ahmed Hassan,ahmed@email.com,Egypt,2024-01-18

customer_id,name,email,country,signup_date
1,John Smith,john@email.com,USA,2024-01-15
2,Maria Garcia,maria@email.com,Spain,2024-01-16
3,NULL,sarah@email.com,USA,2024-01-17
4,Ahmed Hassan,ahmed@email.com,Egypt,2024-01-18

customer_id,name,email,country,signup_date
1,John Smith,john@email.com,USA,2024-01-15
2,Maria Garcia,maria@email.com,Spain,2024-01-16
3,NULL,sarah@email.com,USA,2024-01-17
4,Ahmed Hassan,ahmed@email.com,Egypt,2024-01-18

Result in customers table:

customer_id	name	email	country	signup_date
1	John Smith	john@email.com	USA	2024-01-15
2	Maria Garcia	maria@email.com	Spain	2024-01-16
3	NULL	sarah@email.com	USA	2024-01-17
4	Ahmed Hassan	ahmed@email.com	Egypt	2024-01-18

Example 2: Load JSON with Transformation

Snowflake:

COPY INTO events
FROM (
    SELECT 
        $1:event_id::VARCHAR as event_id,
        $1:user_id::INTEGER as user_id,
        $1:event_type::VARCHAR as event_type,
        $1:timestamp::TIMESTAMP as event_timestamp,
        $1:properties::VARIANT as properties
    FROM @json_stage/events/
)
FILE_FORMAT = (TYPE = 'JSON')
ON_ERROR = 'CONTINUE'

COPY INTO events
FROM (
    SELECT 
        $1:event_id::VARCHAR as event_id,
        $1:user_id::INTEGER as user_id,
        $1:event_type::VARCHAR as event_type,
        $1:timestamp::TIMESTAMP as event_timestamp,
        $1:properties::VARIANT as properties
    FROM @json_stage/events/
)
FILE_FORMAT = (TYPE = 'JSON')
ON_ERROR = 'CONTINUE'

COPY INTO events
FROM (
    SELECT 
        $1:event_id::VARCHAR as event_id,
        $1:user_id::INTEGER as user_id,
        $1:event_type::VARCHAR as event_type,
        $1:timestamp::TIMESTAMP as event_timestamp,
        $1:properties::VARIANT as properties
    FROM @json_stage/events/
)
FILE_FORMAT = (TYPE = 'JSON')
ON_ERROR = 'CONTINUE'

BigQuery:

LOAD DATA INTO analytics.events
FROM FILES (
  format = 'JSON',
  uris = ['gs://my-bucket/events/*.json']
)

LOAD DATA INTO analytics.events
FROM FILES (
  format = 'JSON',
  uris = ['gs://my-bucket/events/*.json']
)

LOAD DATA INTO analytics.events
FROM FILES (
  format = 'JSON',
  uris = ['gs://my-bucket/events/*.json']
)

Databricks:

COPY INTO events
FROM 's3://my-bucket/events/'
FILEFORMAT = JSON
FORMAT_OPTIONS ('multiLine' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true')

COPY INTO events
FROM 's3://my-bucket/events/'
FILEFORMAT = JSON
FORMAT_OPTIONS ('multiLine' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true')

COPY INTO events
FROM 's3://my-bucket/events/'
FILEFORMAT = JSON
FORMAT_OPTIONS ('multiLine' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true')

Example 3: Load Parquet Files

Snowflake:

COPY INTO sales_data
FROM @s3_stage/sales/year=2024/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'

COPY INTO sales_data
FROM @s3_stage/sales/year=2024/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'

COPY INTO sales_data
FROM @s3_stage/sales/year=2024/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'

BigQuery:

LOAD DATA INTO warehouse.sales_data
FROM FILES (
  format = 'PARQUET',
  uris = ['gs://data-lake/sales/year=2024/*.parquet'],
  hive_partition_uri_prefix = 'gs://data-lake/sales'
)

LOAD DATA INTO warehouse.sales_data
FROM FILES (
  format = 'PARQUET',
  uris = ['gs://data-lake/sales/year=2024/*.parquet'],
  hive_partition_uri_prefix = 'gs://data-lake/sales'
)

LOAD DATA INTO warehouse.sales_data
FROM FILES (
  format = 'PARQUET',
  uris = ['gs://data-lake/sales/year=2024/*.parquet'],
  hive_partition_uri_prefix = 'gs://data-lake/sales'
)

Databricks:

COPY INTO sales_data
FROM 's3://data-lake/sales/year=2024/'

COPY INTO sales_data
FROM 's3://data-lake/sales/year=2024/'

COPY INTO sales_data
FROM 's3://data-lake/sales/year=2024/'

Common File Formats Supported

Format	Snowflake	BigQuery	Databricks
CSV	✅	✅	✅
JSON	✅	✅	✅
Avro	✅	✅	✅
Parquet	✅	✅	✅
ORC	✅	✅	✅
XML	✅	❌	✅

Error Handling Options

Snowflake:

ON_ERROR = 'CONTINUE' - Skip rows with errors
ON_ERROR = 'SKIP_FILE' - Skip entire file if any error
ON_ERROR = 'ABORT_STATEMENT' - Stop and rollback (default)
Use VALIDATION_MODE = 'RETURN_ERRORS' to preview errors

BigQuery:

Set max_bad_records to allow some errors
Use reject_errors option to continue on errors

Databricks:

Automatically handles malformed records with badRecordsPath
Use mode option: PERMISSIVE, DROPMALFORMED, FAILFAST

Best Practices

Use stages/external locations - Set up named stages for reusability
Create file formats - Define reusable file format objects
Partition data - Organize files by date/category for incremental loads
Monitor load history - Check COPY_HISTORY views for tracking
Validate first - Use validation modes to test before production loads
Handle errors gracefully - Use appropriate error handling for your use case
Compression - Use compressed files (gzip, snappy) to reduce transfer time

Performance Tips

Load multiple files in parallel (all platforms auto-parallelize)
Use columnar formats (Parquet, ORC) for better performance
In Snowflake, larger files (100-250MB) perform better than many small files
BigQuery works well with files 10MB-1GB
Databricks benefits from files optimized for Spark (128MB-1GB)

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Learn

Feb 18, 2026

BigQuery Global Queries: How to Run Cross-Region SQL in 2026

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Analytics

Feb 18, 2026

Context Engineering and AI Quality for Data Teams

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Product

Feb 18, 2026

Accelerate Analytics Development with Paradime and Tableau

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Start for free

Platform

Resources

ADD-ONs

Industries

About

Legal

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Start for free

Platform

Resources