COPY INTO
Feb 23, 2026
·
5
min read
Category: Data Manipulation Language (DML)
Platform Support:
✅ Snowflake | ✅ BigQuery (as LOAD DATA) | ✅ Databricks
Description
COPY INTO efficiently loads data from cloud storage (S3, GCS, Azure Blob Storage) into tables. It's optimized for bulk loading with built-in error handling, file tracking, and transformation capabilities. This is the recommended method for loading large volumes of data.
Syntax by Platform
Snowflake:
BigQuery:
Databricks:
Platform-Specific Notes
Snowflake:
Supports internal and external stages
Tracks loaded files automatically (idempotent)
Can transform data during load with SELECT
Best for incremental loads
Supports VALIDATION_MODE to test before loading
BigQuery:
Uses
LOAD DATAstatement (not COPY INTO)Automatic schema detection available
Can append or overwrite tables
Also accessible via
bq loadCLI commandSupports CSV, JSON, Avro, Parquet, ORC
Databricks:
Requires Delta Lake format for best features
Automatic file tracking (idempotent by default)
Schema evolution support with
mergeSchemaCompatible with all cloud storage providers
Supports COPY_OPTIONS like
forceto reload files
Example 1: Load CSV from Cloud Storage
Snowflake:
BigQuery:
Databricks:
Sample CSV File (customer_001.csv):
Result in customers table:
customer_id | name | country | signup_date | |
|---|---|---|---|---|
1 | John Smith | USA | 2024-01-15 | |
2 | Maria Garcia | Spain | 2024-01-16 | |
3 | NULL | USA | 2024-01-17 | |
4 | Ahmed Hassan | Egypt | 2024-01-18 |
Example 2: Load JSON with Transformation
Snowflake:
BigQuery:
Databricks:
Example 3: Load Parquet Files
Snowflake:
BigQuery:
Databricks:
Common File Formats Supported
Format | Snowflake | BigQuery | Databricks |
|---|---|---|---|
CSV | ✅ | ✅ | ✅ |
JSON | ✅ | ✅ | ✅ |
Avro | ✅ | ✅ | ✅ |
Parquet | ✅ | ✅ | ✅ |
ORC | ✅ | ✅ | ✅ |
XML | ✅ | ❌ | ✅ |
Error Handling Options
Snowflake:
ON_ERROR = 'CONTINUE'- Skip rows with errorsON_ERROR = 'SKIP_FILE'- Skip entire file if any errorON_ERROR = 'ABORT_STATEMENT'- Stop and rollback (default)Use
VALIDATION_MODE = 'RETURN_ERRORS'to preview errors
BigQuery:
Set
max_bad_recordsto allow some errorsUse
reject_errorsoption to continue on errors
Databricks:
Automatically handles malformed records with
badRecordsPathUse
modeoption:PERMISSIVE,DROPMALFORMED,FAILFAST
Best Practices
Use stages/external locations - Set up named stages for reusability
Create file formats - Define reusable file format objects
Partition data - Organize files by date/category for incremental loads
Monitor load history - Check COPY_HISTORY views for tracking
Validate first - Use validation modes to test before production loads
Handle errors gracefully - Use appropriate error handling for your use case
Compression - Use compressed files (gzip, snappy) to reduce transfer time
Performance Tips
Load multiple files in parallel (all platforms auto-parallelize)
Use columnar formats (Parquet, ORC) for better performance
In Snowflake, larger files (100-250MB) perform better than many small files
BigQuery works well with files 10MB-1GB
Databricks benefits from files optimized for Spark (128MB-1GB)





