dbt™ and Amazon Athena on Paradime: Setup Guide for Reliable Scheduled Runs

Feb 26, 2026

Table of Contents

The Complete dbt™ Athena Guide: Setup, Gotchas, and Production Checklists on Paradime

If you've used dbt™ with Snowflake or BigQuery, plugging it into Amazon Athena can feel deceptively similar — until it isn't. Athena is serverless and cost-effective, but it introduces a unique combination of AWS services — S3 for storage, the Glue Data Catalog for metadata, and workgroups for governance — that create configuration surface area you simply don't encounter with traditional cloud warehouses.

This dbt™ Athena guide walks you through everything: what makes Athena different, the AWS prerequisites you need before writing a single model, how to wire it up in Paradime, and the troubleshooting playbook you'll need when things go sideways.

Athena + dbt™: What's Different

Before diving into configuration, it's worth understanding why Athena requires more deliberate setup than a typical warehouse. The core difference is architectural: Athena doesn't own your data or metadata. It reads from S3 and looks up schemas in the AWS Glue Data Catalog. This separation of storage, metadata, and compute means your dbt™ project must be configured to talk to all three correctly.

How dbt™ interacts with Athena's distributed architecture: adapter queries flow through Athena, which reads metadata from Glue and data from S3, all governed by workgroup settings.

S3 Output Location and Workgroups

Every Athena query writes results to an S3 location. In dbt™, this is controlled by two profile settings:

  • s3_staging_dir — Where Athena stores query results and metadata (required).

  • s3_data_dir — Where dbt™ materializes table data in S3 (optional but strongly recommended).

Here's the gotcha: if your Athena workgroup has a default output location configured and the "Override client-side settings" option is enabled, the workgroup's S3 location takes precedence. Your s3_staging_dir and s3_data_naming settings in profiles.yml are silently ignored. This is one of the most common causes of "it works on my machine" bugs.

Additionally, the s3_data_naming parameter controls how dbt™ generates S3 paths for table data. The options include:

s3_data_naming Value

S3 Path Pattern

Best For

schema_table_unique (default)

{s3_data_dir}/{schema}/{table}/{uuid}/

Avoiding path collisions during full-refresh

schema_table

{s3_data_dir}/{schema}/{table}/

Predictable paths for external tooling

table_unique

{s3_data_dir}/{table}/{uuid}/

Simple setups with a single schema

table

{s3_data_dir}/{table}/

Minimal path structure

⚠️ Key Gotcha: If the table name in your dbt™ model matches an existing table in the Glue Catalog, the adapter deletes the files in that table's S3 location before recreating it. Always use schema_table_unique in shared environments.

Glue Data Catalog and Schema Management

In Athena, there is no separate "database server" — the Glue Data Catalog is the metadata layer. When you set database: awsdatacatalog and schema: analytics in your profile, dbt™ maps these to a Glue catalog and a Glue database respectively.

Key gotchas to know:

  • Tables, schemas, and databases must be lowercase. The Glue Data Catalog is case-insensitive, but the dbt-athena adapter enforces lowercase. Mixed-case identifiers will cause silent failures.

  • Schema changes behave differently. The on_schema_change config supports ignore, fail, append_new_columns, and sync_all_columns — but for Iceberg incremental models, sync_all_columns cannot remove columns used for partitioning. You must do a full-refresh.

  • Dropping tables has side effects. By default, the adapter deletes both the Glue catalog entry and the associated S3 data files. Enable native_drop if you want SQL-only DROP TABLE that leaves S3 data intact (except for Iceberg tables).

Prerequisites in AWS

Before configuring anything in dbt™ or Paradime, you need four things set up in your AWS account.

Athena Workgroup Configuration

Create at least two Athena workgroups — one for development and one for production:

  1. Navigate to Athena Console → Workgroups → Create workgroup.

  2. Set the Query result location to an S3 bucket (e.g., s3://my-company-athena-results/dev/).

  3. Enable "Override client-side settings" to enforce the output location at the workgroup level. This prevents individual developers from accidentally writing results to the wrong bucket.

  4. Configure query data usage controls (bytes scanned limits) appropriate for each environment.

💡 Tip: Workgroup-level enforcement is the simplest way to guarantee consistent S3 output locations across your team. When this is enabled, the s3_data_naming and bucket settings in profiles.yml are ignored in favor of the workgroup's configuration.

S3 Query Results Bucket + Permissions

Create dedicated S3 buckets (or prefixes) for query results and table data:

Critical rules:

  • All S3 paths must end with a trailing slash (/).

  • The results bucket must be in the same AWS region as your Athena workgroup. Cross-region configurations will fail with InvalidRequestException.

  • Ensure bucket policies don't have restrictive Deny statements that block Athena's service principal.

Glue Data Catalog Database Setup

Create Glue databases for each environment. You can do this via the Athena console, the Glue console, or SQL:

Note: If the database doesn't exist, dbt™ will attempt to create it automatically — but this requires glue:CreateDatabase IAM permissions, which you may not want to grant in production.

IAM Policy Checklist

The IAM role or user used by dbt™ needs permissions across three AWS services. Here's the minimum viable policy:

✅ Athena Permissions

  • athena:StartQueryExecution

  • athena:GetQueryExecution

  • athena:GetQueryResults

  • athena:StopQueryExecution

  • athena:GetWorkGroup

  • athena:GetDataCatalog

✅ Glue Permissions (Read)

  • glue:GetDatabase / glue:GetDatabases

  • glue:GetTable / glue:GetTables

  • glue:GetPartition / glue:GetPartitions

  • glue:GetTableVersions

✅ Glue Permissions (Write)

  • glue:CreateDatabase (if dbt™ should auto-create schemas)

  • glue:CreateTable / glue:UpdateTable / glue:DeleteTable

  • glue:CreatePartition / glue:UpdatePartition / glue:DeletePartition

  • glue:BatchCreatePartition / glue:BatchDeletePartition

  • glue:BatchDeleteTable / glue:BatchDeleteTableVersion

✅ S3 Permissions (for both s3_staging_dir and s3_data_dir buckets)

  • s3:GetObject / s3:PutObject / s3:DeleteObject

  • s3:ListBucket / s3:GetBucketLocation

  • s3:ListBucketMultipartUploads / s3:ListMultipartUploadParts

  • s3:AbortMultipartUpload

✅ Conditional Permissions

  • kms:GenerateDataKey*, kms:DescribeKey, kms:Decrypt — only if S3 buckets use KMS encryption

  • lakeformation:GetDataAccess — only if using AWS Lake Formation

Create an Athena Connection in Paradime

With AWS prerequisites in place, you can now wire dbt™ to Athena through Paradime. You'll create two connections: one for the Code IDE (development) and one for Bolt Scheduler (production).

Settings → Connections

  1. Click Settings in the top menu bar.

  2. In the left sidebar, click Connections.

  3. You'll see two sections: Code IDE and Bolt Schedules.

  4. Click Add New next to the appropriate section.

  5. Select Amazon Athena from the warehouse list.

Connection setup flow in Paradime: choose your environment, select Athena, and configure the YAML profile.

Region, Workgroup, S3 Output Location

In the Profile Configuration field, enter the YAML configuration that maps to your dbt™ profiles.yml:

Field

What It Does

Required

s3_staging_dir

S3 path for Athena query results

s3_data_dir

S3 path where dbt™ materializes tables

s3_data_naming

Controls S3 path structure for tables

❌ (default: schema_table_unique)

region_name

AWS region of your Athena instance

database

Glue Data Catalog name

work_group

Athena workgroup identifier

❌ (default: primary)

num_retries

Retries for failed queries

❌ (default: 5)

Then fill in the UI fields:

  • dbt™ Profile Name — Must match the profile value in your dbt_project.yml.

  • Target — Set to dev for Code IDE, prod for Bolt.

  • Schema — Your default Glue database (e.g., analytics_dev).

  • Threads — Start with 4 for dev; see the production section for tuning advice.

Auth Approach and Secret Storage

Paradime supports multiple authentication methods for AWS:

  1. AWS Access Keys — Add aws_access_key_id and aws_secret_access_key directly in the profile configuration. Simple but less secure.

  2. AWS Profile Name — Reference a named profile via aws_profile_name if using shared credentials.

  3. IAM Roles (recommended) — Use IAM roles with environment variables. Store sensitive values in Paradime's Environment Variables (Settings → Workspaces → Environment Variables).

Best practice: Never put credentials directly in profile YAML. Use Paradime Environment Variables to store AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, then reference them in your connection. Bolt Scheduler and Code IDE each have their own environment variable scopes.

Configure Development (Code IDE)

Dev Database/Schema Strategy

In Athena, schema maps to a Glue database. Your development strategy should give each developer an isolated namespace to avoid stepping on each other's tables:

Alternatively, use a shared dev database with the schema_table_unique naming convention — the UUID suffix prevents S3 path collisions during concurrent development.

A common pattern:

Environment

Glue Database

S3 Data Path

Workgroup

Dev (per-user)

analytics_dev_jane

s3://dbt-data/dev/jane/

dbt-dev

CI

analytics_ci

s3://dbt-data/ci/

dbt-ci

Prod

analytics_prod

s3://dbt-data/prod/

dbt-prod

Handling Multiple Data Lake Zones (Raw / Stage / Mart)

In a data lake architecture, you typically have source data in one S3 location and transformed data in another. Use dbt™'s sources to point at existing Glue tables over raw data, and materialize staging and mart models into separate Glue databases:

Define sources that reference existing Glue catalog tables:

Data lake zone mapping: raw data stays untouched in source S3 paths, while dbt™ materializes staging and mart models into separate Glue databases and S3 prefixes.

Configure Production (Bolt Scheduler)

Prod Workgroup Isolation

Production dbt™ runs should use a dedicated Athena workgroup that is separate from development. This gives you:

  • Cost isolation — Track production query costs independently.

  • Enforced output locations — Enable "Override client-side settings" to guarantee results go to the right S3 bucket.

  • Query limits — Set per-query data scanning limits to prevent runaway costs.

  • Separate IAM controls — Restrict who can execute in the production workgroup.

In your Paradime Bolt Scheduler connection, set a production-specific profile:

Set the Target to prod and the Schema to your production Glue database (e.g., analytics_prod).

Retry / Backoff Considerations for Athena

Athena is a shared, serverless service — meaning throttling is real. The dbt-athena adapter uses exponential backoff via the tenacity library:

  • num_retries controls how many times a failed query is retried (default: 5, meaning up to 6 total attempts).

  • Backoff uses exponential wait: wait_exponential(multiplier=attempt, max=100, exp_base=2).

  • The adapter will not retry TOO_MANY_OPEN_PARTITIONS errors to avoid cascading Iceberg filesystem errors.

Production tuning checklist:

  • Set num_retries: 5 (or higher for critical pipelines).

  • Keep threads moderate (4–8). High thread counts trigger ThrottlingException on the GetWorkGroup API, which has a lower rate limit (5 calls/sec) than StartQueryExecution (20 calls/sec, burst 80/sec).

  • If you see ThrottlingException errors, reduce threads first before increasing retries. The adapter caches GetWorkGroup results, but extremely high concurrency can still exceed limits.

  • Use Paradime Bolt's schedule-level environment variable overrides to adjust num_retries or threads per schedule without modifying the global connection.

Validate with SQL Scratchpad

Before running dbt build, use Paradime's Scratchpad to confirm your connection is wired correctly.

Confirm Workgroup + Output Location Works

Open the Scratchpad in the Code IDE (it auto-creates a paradime_scratch folder with gitignored files) and run:

If this returns a result, your Athena workgroup, IAM permissions, and S3 output location are all functioning. If it fails, check:

  1. Is the s3_staging_dir accessible to your IAM role?

  2. Is the region_name correct?

  3. Does the workgroup exist and have a valid output location configured?

Run a Query Against a Known Glue Table

Next, verify that your Glue catalog integration works:

You can also use the Data Explorer tab in the command panel to preview compiled SQL and query results with row limits (1–1,000 rows), download as CSV, and inspect individual CTEs by highlighting SQL segments.

Step-by-step validation flow using the Paradime Scratchpad to confirm every layer of the Athena connection.

Troubleshooting

AccessDenied for S3 Output

Symptom: Access Denied (Service: Amazon S3; Status Code: 403) when running any query.

Root causes and fixes:

Cause

Fix

IAM role lacks s3:PutObject on the results bucket

Add S3 write permissions for the s3_staging_dir bucket

S3 bucket is in a different region than Athena

Ensure region_name matches the bucket's region

Bucket has a restrictive Deny policy

Review the bucket policy for explicit denies that block Athena's access

Workgroup enforces a different output location

Check workgroup settings — if "Override client-side settings" is enabled, your s3_staging_dir is ignored

KMS-encrypted bucket without KMS permissions

Add kms:GenerateDataKey*, kms:DescribeKey, kms:Decrypt to the IAM policy

Quick diagnostic:

Glue Database Not Found

Symptom: SCHEMA_NOT_FOUND or Database not found errors during dbt run.

Common causes:

  1. Case mismatch — The Glue database name is case-sensitive in the adapter. Ensure your schema value in the connection profile is all lowercase.

  2. Wrong catalog — If database in your profile doesn't match your Glue Data Catalog name (default is awsdatacatalog), dbt™ will look in the wrong catalog.

  3. Database doesn't exist — If your IAM role doesn't have glue:CreateDatabase permission, dbt™ can't auto-create the schema. Create it manually:

  4. Missing TableType in Glue — If tables were created via the Glue API or CloudFormation without specifying TableType (e.g., EXTERNAL_TABLE), Athena won't recognize them. Re-create the table with the property set.

Slow or Throttled Queries

Symptom: Queries take much longer than expected, or you see ThrottlingException / Rate exceeded errors.

Throttling sources and mitigations:

Service

Error

Mitigation

Athena API

ThrottlingException on GetWorkGroup

Reduce threads (4–8 is safe); upgrade to dbt-athena ≥ 1.7 which caches GetWorkGroup calls

AWS Glue

AWSGlueException: Rate exceeded

Reduce concurrency; batch partition operations; avoid unnecessary dbt run of unchanged models (use dbt build --select state:modified+)

Amazon S3

AmazonS3Exception: Please reduce your request rate

Use S3 prefix partitioning; avoid very small files; compact data with Glue ETL

AWS KMS

You have exceeded the rate at which you may call KMS

Reduce concurrent queries against KMS-encrypted buckets; request a KMS rate limit increase

Performance checklist:

  • Use Parquet (columnar) file format for table materializations.

  • Partition tables on high-cardinality columns (note: partitions limited to 100 per partitioned_by — use force_batch for more).

  • Prefer view materialization for staging models — views are free in Athena.

  • Avoid very large seeds — the seed_by_insert method is capped at Athena's 262,144 bytes limit.

  • Consider Iceberg table format for incremental models that need merge semantics.

  • If using dbt test, run tests in parallel but respect the thread limit to avoid Glue API throttling.

Wrapping Up

Athena is a powerful, serverless query engine — but the distributed nature of its architecture (S3 + Glue + workgroups) means there's more surface area to get wrong compared to a monolithic warehouse. The most common pitfalls are almost always one of three things: S3 permissions, Glue catalog mismatches, or workgroup configuration overrides.

By following the checklists in this guide — setting up isolated workgroups, configuring IAM policies correctly, validating with the Scratchpad before running dbt build, and tuning retry/thread settings for production — you'll avoid the hours of debugging that typically accompany a first-time Athena setup.

Quick reference links:

Interested to Learn More?
Try Out the Free 14-Days Trial
decorative icon

Future of Data Work
Available Today

decorative icon

Future of Data Work
Available Today

decorative icon

Future of Data Work
Available Today

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Copyright © 2026 Paradime Labs, Inc.

Made with ❤️ in San Francisco ・ London

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.