SQL Keywords

SQL Keywords

COPY INTO

Feb 23, 2026

·

5

min read

Category: Data Manipulation Language (DML)

Platform Support:

✅ Snowflake | ✅ BigQuery (as LOAD DATA) | ✅ Databricks

Description

COPY INTO efficiently loads data from cloud storage (S3, GCS, Azure Blob Storage) into tables. It's optimized for bulk loading with built-in error handling, file tracking, and transformation capabilities. This is the recommended method for loading large volumes of data.

Syntax by Platform

Snowflake:

COPY INTO [database.]schema.table_name
FROM { @stage_name[/path] | 's3://bucket/path' | ... }
FILE_FORMAT = (TYPE = format_type [options])
[PATTERN = 'regex_pattern']
[ON_ERROR = { CONTINUE | SKIP_FILE | ABORT_STATEMENT }]
[VALIDATION_MODE = RETURN_ERRORS]

COPY INTO [database.]schema.table_name
FROM { @stage_name[/path] | 's3://bucket/path' | ... }
FILE_FORMAT = (TYPE = format_type [options])
[PATTERN = 'regex_pattern']
[ON_ERROR = { CONTINUE | SKIP_FILE | ABORT_STATEMENT }]
[VALIDATION_MODE = RETURN_ERRORS]

COPY INTO [database.]schema.table_name
FROM { @stage_name[/path] | 's3://bucket/path' | ... }
FILE_FORMAT = (TYPE = format_type [options])
[PATTERN = 'regex_pattern']
[ON_ERROR = { CONTINUE | SKIP_FILE | ABORT_STATEMENT }]
[VALIDATION_MODE = RETURN_ERRORS]

BigQuery:

LOAD DATA INTO [project.]dataset.table_name
FROM FILES (
  format = 'format_type',
  uris = ['gs://bucket/path/*.ext'],
  [options]
)

LOAD DATA INTO [project.]dataset.table_name
FROM FILES (
  format = 'format_type',
  uris = ['gs://bucket/path/*.ext'],
  [options]
)

LOAD DATA INTO [project.]dataset.table_name
FROM FILES (
  format = 'format_type',
  uris = ['gs://bucket/path/*.ext'],
  [options]
)

Databricks:

COPY INTO [catalog.]schema.table_name
FROM 'storage_path'
FILEFORMAT = format_type
FORMAT_OPTIONS ('key' = 'value', ...)
COPY_OPTIONS ('key' = 'value', ...)

COPY INTO [catalog.]schema.table_name
FROM 'storage_path'
FILEFORMAT = format_type
FORMAT_OPTIONS ('key' = 'value', ...)
COPY_OPTIONS ('key' = 'value', ...)

COPY INTO [catalog.]schema.table_name
FROM 'storage_path'
FILEFORMAT = format_type
FORMAT_OPTIONS ('key' = 'value', ...)
COPY_OPTIONS ('key' = 'value', ...)

Platform-Specific Notes

Snowflake:

  • Supports internal and external stages

  • Tracks loaded files automatically (idempotent)

  • Can transform data during load with SELECT

  • Best for incremental loads

  • Supports VALIDATION_MODE to test before loading

BigQuery:

  • Uses LOAD DATA statement (not COPY INTO)

  • Automatic schema detection available

  • Can append or overwrite tables

  • Also accessible via bq load CLI command

  • Supports CSV, JSON, Avro, Parquet, ORC

Databricks:

  • Requires Delta Lake format for best features

  • Automatic file tracking (idempotent by default)

  • Schema evolution support with mergeSchema

  • Compatible with all cloud storage providers

  • Supports COPY_OPTIONS like force to reload files

Example 1: Load CSV from Cloud Storage

Snowflake:

-- Create file format (reusable)
CREATE OR REPLACE FILE FORMAT csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    COMPRESSION = 'AUTO';

-- Load data
COPY INTO customers
FROM @my_s3_stage/data/customers/
FILE_FORMAT = csv_format
PATTERN = '.*customer_.*[.]csv'
ON_ERROR = 'SKIP_FILE'
RETURN_FAILED_ONLY = TRUE

-- Create file format (reusable)
CREATE OR REPLACE FILE FORMAT csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    COMPRESSION = 'AUTO';

-- Load data
COPY INTO customers
FROM @my_s3_stage/data/customers/
FILE_FORMAT = csv_format
PATTERN = '.*customer_.*[.]csv'
ON_ERROR = 'SKIP_FILE'
RETURN_FAILED_ONLY = TRUE

-- Create file format (reusable)
CREATE OR REPLACE FILE FORMAT csv_format
    TYPE = 'CSV'
    FIELD_DELIMITER = ','
    SKIP_HEADER = 1
    NULL_IF = ('NULL', 'null', '')
    EMPTY_FIELD_AS_NULL = TRUE
    COMPRESSION = 'AUTO';

-- Load data
COPY INTO customers
FROM @my_s3_stage/data/customers/
FILE_FORMAT = csv_format
PATTERN = '.*customer_.*[.]csv'
ON_ERROR = 'SKIP_FILE'
RETURN_FAILED_ONLY = TRUE

BigQuery:

LOAD DATA INTO my_dataset.customers
FROM FILES (
  format = 'CSV',
  uris = ['gs://my-bucket/data/customers/*.csv'],
  skip_leading_rows = 1,
  field_delimiter = ',',
  null_marker = 'NULL',
  allow_jagged_rows = false,
  allow_quoted_newlines = true
)

LOAD DATA INTO my_dataset.customers
FROM FILES (
  format = 'CSV',
  uris = ['gs://my-bucket/data/customers/*.csv'],
  skip_leading_rows = 1,
  field_delimiter = ',',
  null_marker = 'NULL',
  allow_jagged_rows = false,
  allow_quoted_newlines = true
)

LOAD DATA INTO my_dataset.customers
FROM FILES (
  format = 'CSV',
  uris = ['gs://my-bucket/data/customers/*.csv'],
  skip_leading_rows = 1,
  field_delimiter = ',',
  null_marker = 'NULL',
  allow_jagged_rows = false,
  allow_quoted_newlines = true
)

Databricks:

COPY INTO customers
FROM 's3://my-bucket/data/customers/'
FILEFORMAT = CSV
FORMAT_OPTIONS (
    'header' = 'true',
    'delimiter' = ',',
    'nullValue' = 'NULL',
    'inferSchema' = 'false'
)
COPY_OPTIONS (
    'mergeSchema' = 'false',
    'force' = 'false'  -- skip already loaded files
)

COPY INTO customers
FROM 's3://my-bucket/data/customers/'
FILEFORMAT = CSV
FORMAT_OPTIONS (
    'header' = 'true',
    'delimiter' = ',',
    'nullValue' = 'NULL',
    'inferSchema' = 'false'
)
COPY_OPTIONS (
    'mergeSchema' = 'false',
    'force' = 'false'  -- skip already loaded files
)

COPY INTO customers
FROM 's3://my-bucket/data/customers/'
FILEFORMAT = CSV
FORMAT_OPTIONS (
    'header' = 'true',
    'delimiter' = ',',
    'nullValue' = 'NULL',
    'inferSchema' = 'false'
)
COPY_OPTIONS (
    'mergeSchema' = 'false',
    'force' = 'false'  -- skip already loaded files
)

Sample CSV File (customer_001.csv):

customer_id,name,email,country,signup_date
1,John Smith,john@email.com,USA,2024-01-15
2,Maria Garcia,maria@email.com,Spain,2024-01-16
3,NULL,sarah@email.com,USA,2024-01-17
4,Ahmed Hassan,ahmed@email.com,Egypt,2024-01-18
customer_id,name,email,country,signup_date
1,John Smith,john@email.com,USA,2024-01-15
2,Maria Garcia,maria@email.com,Spain,2024-01-16
3,NULL,sarah@email.com,USA,2024-01-17
4,Ahmed Hassan,ahmed@email.com,Egypt,2024-01-18
customer_id,name,email,country,signup_date
1,John Smith,john@email.com,USA,2024-01-15
2,Maria Garcia,maria@email.com,Spain,2024-01-16
3,NULL,sarah@email.com,USA,2024-01-17
4,Ahmed Hassan,ahmed@email.com,Egypt,2024-01-18

Result in customers table:

customer_id

name

email

country

signup_date

1

John Smith

john@email.com

USA

2024-01-15

2

Maria Garcia

maria@email.com

Spain

2024-01-16

3

NULL

sarah@email.com

USA

2024-01-17

4

Ahmed Hassan

ahmed@email.com

Egypt

2024-01-18

Example 2: Load JSON with Transformation

Snowflake:

COPY INTO events
FROM (
    SELECT 
        $1:event_id::VARCHAR as event_id,
        $1:user_id::INTEGER as user_id,
        $1:event_type::VARCHAR as event_type,
        $1:timestamp::TIMESTAMP as event_timestamp,
        $1:properties::VARIANT as properties
    FROM @json_stage/events/
)
FILE_FORMAT = (TYPE = 'JSON')
ON_ERROR = 'CONTINUE'

COPY INTO events
FROM (
    SELECT 
        $1:event_id::VARCHAR as event_id,
        $1:user_id::INTEGER as user_id,
        $1:event_type::VARCHAR as event_type,
        $1:timestamp::TIMESTAMP as event_timestamp,
        $1:properties::VARIANT as properties
    FROM @json_stage/events/
)
FILE_FORMAT = (TYPE = 'JSON')
ON_ERROR = 'CONTINUE'

COPY INTO events
FROM (
    SELECT 
        $1:event_id::VARCHAR as event_id,
        $1:user_id::INTEGER as user_id,
        $1:event_type::VARCHAR as event_type,
        $1:timestamp::TIMESTAMP as event_timestamp,
        $1:properties::VARIANT as properties
    FROM @json_stage/events/
)
FILE_FORMAT = (TYPE = 'JSON')
ON_ERROR = 'CONTINUE'

BigQuery:

LOAD DATA INTO analytics.events
FROM FILES (
  format = 'JSON',
  uris = ['gs://my-bucket/events/*.json']
)

LOAD DATA INTO analytics.events
FROM FILES (
  format = 'JSON',
  uris = ['gs://my-bucket/events/*.json']
)

LOAD DATA INTO analytics.events
FROM FILES (
  format = 'JSON',
  uris = ['gs://my-bucket/events/*.json']
)

Databricks:

COPY INTO events
FROM 's3://my-bucket/events/'
FILEFORMAT = JSON
FORMAT_OPTIONS ('multiLine' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true')

COPY INTO events
FROM 's3://my-bucket/events/'
FILEFORMAT = JSON
FORMAT_OPTIONS ('multiLine' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true')

COPY INTO events
FROM 's3://my-bucket/events/'
FILEFORMAT = JSON
FORMAT_OPTIONS ('multiLine' = 'true')
COPY_OPTIONS ('mergeSchema' = 'true')

Example 3: Load Parquet Files

Snowflake:

COPY INTO sales_data
FROM @s3_stage/sales/year=2024/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'

COPY INTO sales_data
FROM @s3_stage/sales/year=2024/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'

COPY INTO sales_data
FROM @s3_stage/sales/year=2024/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = 'CASE_INSENSITIVE'

BigQuery:

LOAD DATA INTO warehouse.sales_data
FROM FILES (
  format = 'PARQUET',
  uris = ['gs://data-lake/sales/year=2024/*.parquet'],
  hive_partition_uri_prefix = 'gs://data-lake/sales'
)

LOAD DATA INTO warehouse.sales_data
FROM FILES (
  format = 'PARQUET',
  uris = ['gs://data-lake/sales/year=2024/*.parquet'],
  hive_partition_uri_prefix = 'gs://data-lake/sales'
)

LOAD DATA INTO warehouse.sales_data
FROM FILES (
  format = 'PARQUET',
  uris = ['gs://data-lake/sales/year=2024/*.parquet'],
  hive_partition_uri_prefix = 'gs://data-lake/sales'
)

Databricks:

COPY INTO sales_data
FROM 's3://data-lake/sales/year=2024/'

COPY INTO sales_data
FROM 's3://data-lake/sales/year=2024/'

COPY INTO sales_data
FROM 's3://data-lake/sales/year=2024/'

Common File Formats Supported

Format

Snowflake

BigQuery

Databricks

CSV

JSON

Avro

Parquet

ORC

XML

Error Handling Options

Snowflake:

  • ON_ERROR = 'CONTINUE' - Skip rows with errors

  • ON_ERROR = 'SKIP_FILE' - Skip entire file if any error

  • ON_ERROR = 'ABORT_STATEMENT' - Stop and rollback (default)

  • Use VALIDATION_MODE = 'RETURN_ERRORS' to preview errors

BigQuery:

  • Set max_bad_records to allow some errors

  • Use reject_errors option to continue on errors

Databricks:

  • Automatically handles malformed records with badRecordsPath

  • Use mode option: PERMISSIVE, DROPMALFORMED, FAILFAST

Best Practices

  1. Use stages/external locations - Set up named stages for reusability

  2. Create file formats - Define reusable file format objects

  3. Partition data - Organize files by date/category for incremental loads

  4. Monitor load history - Check COPY_HISTORY views for tracking

  5. Validate first - Use validation modes to test before production loads

  6. Handle errors gracefully - Use appropriate error handling for your use case

  7. Compression - Use compressed files (gzip, snappy) to reduce transfer time

Performance Tips

  • Load multiple files in parallel (all platforms auto-parallelize)

  • Use columnar formats (Parquet, ORC) for better performance

  • In Snowflake, larger files (100-250MB) perform better than many small files

  • BigQuery works well with files 10MB-1GB

  • Databricks benefits from files optimized for Spark (128MB-1GB)

Interested to Learn More?
Try Out the Free 14-Days Trial

More Articles

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

decorative icon

Experience Analytics for the AI-Era

Start your 14-day trial today - it's free and no credit card needed

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.