ARRAY_AGG / ARRAY_CONSTRUCT
Feb 23, 2026
·
5
min read
Category: Array Functions
Platform Support:
✅ Snowflake (ARRAY_AGG, ARRAY_CONSTRUCT) | ✅ BigQuery (ARRAY_AGG) | ✅ Databricks (COLLECT_LIST, ARRAY)
Description
Aggregates values into arrays. ARRAY_AGG collects values from multiple rows into a single array (aggregate function), while ARRAY_CONSTRUCT creates an array from multiple columns or values (scalar function). Essential for working with nested data structures and creating complex data types.
Syntax by Platform
Snowflake:
BigQuery:
Databricks:
Platform-Specific Notes
Snowflake:
ARRAY_AGG preserves order with WITHIN GROUP
ARRAY_CONSTRUCT for creating arrays from columns
Supports nested arrays (ARRAY of ARRAY)
ARRAY_UNIQUE_AGG for distinct values
BigQuery:
ARRAY_AGG can include ORDER BY and LIMIT
ARRAY_AGG(DISTINCT ...) for unique values
Literal syntax with square brackets
ARRAY_CONCAT_AGG for flattening arrays
Databricks:
COLLECT_LIST keeps duplicates
COLLECT_SET removes duplicates
ARRAY function for literals
Compatible with Spark SQL
Example 1: ARRAY_AGG - Group Related Items
Snowflake:
BigQuery:
Databricks:
Sample Data (orders table):
customer_id | customer_name | order_id | order_date | order_total |
|---|---|---|---|---|
101 | John Smith | 1001 | 2024-01-15 | 250.00 |
101 | John Smith | 1003 | 2024-01-20 | 180.00 |
101 | John Smith | 1005 | 2024-02-01 | 320.00 |
102 | Maria Garcia | 1002 | 2024-01-18 | 450.00 |
102 | Maria Garcia | 1004 | 2024-01-25 | 290.00 |
Result:
customer_id | customer_name | order_ids | order_totals | order_count | total_spent |
|---|---|---|---|---|---|
101 | John Smith | [1001, 1003, 1005] | [250.00, 180.00, 320.00] | 3 | 750.00 |
102 | Maria Garcia | [1002, 1004] | [450.00, 290.00] | 2 | 740.00 |
Example 2: ARRAY_CONSTRUCT - Create Arrays from Columns
Snowflake:
BigQuery:
Databricks:
Sample Data (products table):
product_id | product_name | category | subcategory | brand | price | cost | msrp |
|---|---|---|---|---|---|---|---|
1 | Laptop | Electronics | Computers | Dell | 899 | 600 | 999 |
2 | Mouse | Electronics | Accessories | Logitech | 25 | 10 | 30 |
Result:
product_id | product_name | price_array | taxonomy_array | price_objects |
|---|---|---|---|---|
1 | Laptop | [899, 600, 999] | ["Electronics", "Computers", "Dell"] | [{type:price,value:899},...] |
2 | Mouse | [25, 10, 30] | ["Electronics", "Accessories", "Logitech"] | [{type:price,value:25},...] |
Example 3: Aggregate with DISTINCT
Snowflake:
BigQuery:
Databricks:
Sample Data (sales table):
sale_id | product_id | product_name | customer_id | region |
|---|---|---|---|---|
1 | 101 | Laptop | 1 | East |
2 | 101 | Laptop | 2 | West |
3 | 101 | Laptop | 1 | East |
4 | 101 | Laptop | 3 | East |
5 | 101 | Laptop | 2 | West |
Result:
product_id | product_name | unique_customers | regions_sold | customer_count |
|---|---|---|---|---|
101 | Laptop | [1, 2, 3] | ["East", "West"] | 3 |
Example 4: Nested Arrays for Hierarchical Data
All Platforms:
Result Structure:
Example 5: Array Operations
Snowflake:
BigQuery:
Databricks:
Example 6: STRING_AGG Alternative
Snowflake:
BigQuery:
Databricks:
Common Array Functions by Platform
Operation | Snowflake | BigQuery | Databricks |
|---|---|---|---|
Create array | ARRAY_CONSTRUCT | [...] literal | ARRAY(...) |
Aggregate to array | ARRAY_AGG | ARRAY_AGG | COLLECT_LIST |
Distinct aggregate | ARRAY_UNIQUE_AGG | ARRAY_AGG(DISTINCT) | COLLECT_SET |
Array length | ARRAY_SIZE | ARRAY_LENGTH | SIZE |
Array contains | ARRAY_CONTAINS | IN UNNEST(...) | ARRAY_CONTAINS |
Array slice | ARRAY_SLICE | ARRAY(SELECT LIMIT) | SLICE |
Flatten array | FLATTEN | UNNEST | EXPLODE |
Concat arrays | ARRAY_CAT | ARRAY_CONCAT | CONCAT |
Common Use Cases
Denormalization: Aggregate child records into parent
JSON generation: Create complex nested structures
Tag management: Store multiple tags per entity
Event tracking: Collect event sequences per user
Product catalogs: Group variants and options
Data export: Prepare hierarchical data for APIs
Performance Considerations
Large arrays can impact query performance
Consider array size limits (varies by platform)
Flattening arrays for analysis can be expensive
Index considerations for array columns
Memory usage grows with array size
Best Practices
Limit array size: Keep arrays reasonably small
Order matters: Use ORDER BY in ARRAY_AGG for consistency
Handle NULLs: Arrays can contain NULL values
Document structure: Explain nested array schemas
Consider alternatives: Sometimes normalized tables are better
Test performance: Benchmark array operations on your data





