DISTINCT removes duplicate rows from results, LIMIT restricts the number of rows returned, and OFFSET skips a specified number of rows. Essential for pagination, removing duplicates, and controlling result set size.
Syntax
DISTINCT:
SELECTDISTINCT column1, column2, ...
FROM
SELECTDISTINCT column1, column2, ...
FROM
SELECTDISTINCT column1, column2, ...
FROM
LIMIT:
-- Snowflake, DatabricksSELECT * FROM table_name LIMIT n;
-- BigQuerySELECT * FROM table_name LIMIT
-- Snowflake, DatabricksSELECT * FROM table_name LIMIT n;
-- BigQuerySELECT * FROM table_name LIMIT
-- Snowflake, DatabricksSELECT * FROM table_name LIMIT n;
-- BigQuerySELECT * FROM table_name LIMIT
OFFSET:
SELECT * FROM table_name
LIMIT n OFFSET m; -- Skip m rows, return n rows
SELECT * FROM table_name
LIMIT n OFFSET m; -- Skip m rows, return n rows
SELECT * FROM table_name
LIMIT n OFFSET m; -- Skip m rows, return n rows
Platform-Specific Notes
Snowflake:
Supports both LIMIT and TOP
OFFSET must be used with LIMIT
ORDER BY recommended with LIMIT for deterministic results
BigQuery:
LIMIT only (no TOP)
OFFSET requires LIMIT
Can use LIMIT in subqueries
Databricks:
Supports LIMIT and TOP
OFFSET requires LIMIT
Standard Spark SQL syntax
Example 1: DISTINCT - Remove Duplicates
All Platforms:
-- Get unique countriesSELECTDISTINCT country
FROM customers
ORDERBY country;
-- Get unique combinationsSELECTDISTINCT country, city
FROM customers
ORDERBY country, city;
-- Count distinct valuesSELECTCOUNT(*)as total_orders,COUNT(DISTINCT customer_id)as unique_customers,COUNT(DISTINCT product_id)as unique_products
FROM
-- Get unique countriesSELECTDISTINCT country
FROM customers
ORDERBY country;
-- Get unique combinationsSELECTDISTINCT country, city
FROM customers
ORDERBY country, city;
-- Count distinct valuesSELECTCOUNT(*)as total_orders,COUNT(DISTINCT customer_id)as unique_customers,COUNT(DISTINCT product_id)as unique_products
FROM
-- Get unique countriesSELECTDISTINCT country
FROM customers
ORDERBY country;
-- Get unique combinationsSELECTDISTINCT country, city
FROM customers
ORDERBY country, city;
-- Count distinct valuesSELECTCOUNT(*)as total_orders,COUNT(DISTINCT customer_id)as unique_customers,COUNT(DISTINCT product_id)as unique_products
FROM
Sample Data (customers table):
customer_id
name
country
city
1
John
USA
New York
2
Maria
Spain
Madrid
3
Sarah
USA
New York
4
Bob
USA
Chicago
5
Carlos
Spain
Barcelona
Result (DISTINCT country):
country
Spain
USA
Result (DISTINCT country, city):
country
city
Spain
Barcelona
Spain
Madrid
USA
Chicago
USA
New York
Example 2: LIMIT - Return Top N Rows
All Platforms:
-- Top 5 most expensive productsSELECT product_name, price
FROM products
ORDERBY price DESCLIMIT5;
-- Top 10 recent ordersSELECT order_id, order_date, total
FROM orders
ORDERBY order_date DESCLIMIT10;
-- Sample data for testingSELECT *
FROM large_table
LIMIT100; -- Preview first 100 rows
-- Top 5 most expensive productsSELECT product_name, price
FROM products
ORDERBY price DESCLIMIT5;
-- Top 10 recent ordersSELECT order_id, order_date, total
FROM orders
ORDERBY order_date DESCLIMIT10;
-- Sample data for testingSELECT *
FROM large_table
LIMIT100; -- Preview first 100 rows
-- Top 5 most expensive productsSELECT product_name, price
FROM products
ORDERBY price DESCLIMIT5;
-- Top 10 recent ordersSELECT order_id, order_date, total
FROM orders
ORDERBY order_date DESCLIMIT10;
-- Sample data for testingSELECT *
FROM large_table
LIMIT100; -- Preview first 100 rows
-- Get first order for each customer (using QUALIFY instead)SELECT
customer_id,
order_id,
order_date,
total
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY order_date) = 1
-- Get first order for each customer (using QUALIFY instead)SELECT
customer_id,
order_id,
order_date,
total
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY order_date) = 1
-- Get first order for each customer (using QUALIFY instead)SELECT
customer_id,
order_id,
order_date,
total
FROM orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY order_date) = 1
Alternative for all platforms:
WITH ranked_orders AS(SELECT
customer_id,
order_id,
order_date,
total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY order_date)as rn
FROM orders
)SELECT customer_id, order_id, order_date, total
FROM ranked_orders
WHERE rn = 1
WITH ranked_orders AS(SELECT
customer_id,
order_id,
order_date,
total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY order_date)as rn
FROM orders
)SELECT customer_id, order_id, order_date, total
FROM ranked_orders
WHERE rn = 1
WITH ranked_orders AS(SELECT
customer_id,
order_id,
order_date,
total,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDERBY order_date)as rn
FROM orders
)SELECT customer_id, order_id, order_date, total
FROM ranked_orders
WHERE rn = 1
Example 6: Combine DISTINCT, LIMIT, OFFSET
All Platforms:
-- Get unique product categories, paginatedSELECTDISTINCT category
FROM products
ORDERBY category
LIMIT10 OFFSET 0;
-- Top 5 countries by customer countSELECT
country,COUNT(DISTINCT customer_id)as customer_count
FROM customers
GROUPBY country
ORDERBY customer_count DESCLIMIT5
-- Get unique product categories, paginatedSELECTDISTINCT category
FROM products
ORDERBY category
LIMIT10 OFFSET 0;
-- Top 5 countries by customer countSELECT
country,COUNT(DISTINCT customer_id)as customer_count
FROM customers
GROUPBY country
ORDERBY customer_count DESCLIMIT5
-- Get unique product categories, paginatedSELECTDISTINCT category
FROM products
ORDERBY category
LIMIT10 OFFSET 0;
-- Top 5 countries by customer countSELECT
country,COUNT(DISTINCT customer_id)as customer_count
FROM customers
GROUPBY country
ORDERBY customer_count DESCLIMIT5
DISTINCT vs GROUP BY
Aspect
DISTINCT
GROUP BY
Purpose
Remove duplicates
Aggregate data
Aggregation
No
Yes
Performance
Similar
Similar
Flexibility
Limited
High
Use case
Simple deduplication
Analysis with aggregates
Equivalent queries:
-- These produce the same resultSELECTDISTINCT country FROM customers;
SELECT country FROM customers GROUPBY
-- These produce the same resultSELECTDISTINCT country FROM customers;
SELECT country FROM customers GROUPBY
-- These produce the same resultSELECTDISTINCT country FROM customers;
SELECT country FROM customers GROUPBY
Pagination Best Practices
Method 1: LIMIT/OFFSET (Simple but slow for large offsets)
SELECT * FROM products
ORDERBY product_id
LIMIT25 OFFSET 1000; -- Slow: scans 1000 rows to skip them
SELECT * FROM products
ORDERBY product_id
LIMIT25 OFFSET 1000; -- Slow: scans 1000 rows to skip them
SELECT * FROM products
ORDERBY product_id
LIMIT25 OFFSET 1000; -- Slow: scans 1000 rows to skip them
Method 2: Keyset Pagination (Faster for large datasets)
-- Page 1SELECT * FROM products
ORDERBY product_id
LIMIT25;
-- Page 2 (where last_id from previous page)SELECT * FROM products
WHERE product_id > last_id
ORDERBY product_id
LIMIT25
-- Page 1SELECT * FROM products
ORDERBY product_id
LIMIT25;
-- Page 2 (where last_id from previous page)SELECT * FROM products
WHERE product_id > last_id
ORDERBY product_id
LIMIT25
-- Page 1SELECT * FROM products
ORDERBY product_id
LIMIT25;
-- Page 2 (where last_id from previous page)SELECT * FROM products
WHERE product_id > last_id
ORDERBY product_id
LIMIT25
Performance Considerations
DISTINCT:
Requires sorting or hashing (expensive)
Index on DISTINCT columns helps
Consider GROUP BY for complex queries
LIMIT:
Very efficient (stops after n rows)
Must include ORDER BY for deterministic results
Use for sampling and testing
OFFSET:
Scans and discards offset rows (slow for large offsets)
O(n) performance where n = offset
Consider keyset pagination for large offsets
Common Use Cases
DISTINCT:
Unique values: Get all unique categories, tags
Deduplication: Remove duplicate records
Data quality: Find unique combinations
Reporting: Count distinct customers, products
LIMIT:
Top N queries: Best sellers, highest scores
Sampling: Test queries on subset
Performance: Limit result size
Dashboards: Show recent items
OFFSET:
Pagination: Web applications, reports
Batch processing: Process data in chunks
APIs: Return pages of results
Best Practices
Always use ORDER BY with LIMIT: Ensure deterministic results
Index columns in ORDER BY: Improves LIMIT performance
Avoid large OFFSET: Use keyset pagination instead
Test DISTINCT performance: May be expensive on large tables
Use LIMIT for testing: Preview results before full query
Consider COUNT(DISTINCT): More efficient than DISTINCT + COUNT
Document pagination logic: Make page calculations clear
Interested to Learn More? Try Out the Free 14-Days Trial
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.