dbt™ and Amazon Athena on Paradime: Setup Guide for Reliable Scheduled Runs

Feb 26, 2026

Table of Contents

The Complete dbt™ Athena Guide: Setup, Gotchas, and Production Checklists on Paradime

If you've used dbt™ with Snowflake or BigQuery, plugging it into Amazon Athena can feel deceptively similar — until it isn't. Athena is serverless and cost-effective, but it introduces a unique combination of AWS services — S3 for storage, the Glue Data Catalog for metadata, and workgroups for governance — that create configuration surface area you simply don't encounter with traditional cloud warehouses.

This dbt™ Athena guide walks you through everything: what makes Athena different, the AWS prerequisites you need before writing a single model, how to wire it up in Paradime, and the troubleshooting playbook you'll need when things go sideways.

Athena + dbt™: What's Different

Before diving into configuration, it's worth understanding why Athena requires more deliberate setup than a typical warehouse. The core difference is architectural: Athena doesn't own your data or metadata. It reads from S3 and looks up schemas in the AWS Glue Data Catalog. This separation of storage, metadata, and compute means your dbt™ project must be configured to talk to all three correctly.

How dbt™ interacts with Athena's distributed architecture: adapter queries flow through Athena, which reads metadata from Glue and data from S3, all governed by workgroup settings.

S3 Output Location and Workgroups

Every Athena query writes results to an S3 location. In dbt™, this is controlled by two profile settings:

s3_staging_dir — Where Athena stores query results and metadata (required).
s3_data_dir — Where dbt™ materializes table data in S3 (optional but strongly recommended).

Here's the gotcha: if your Athena workgroup has a default output location configured and the "Override client-side settings" option is enabled, the workgroup's S3 location takes precedence. Your s3_staging_dir and s3_data_naming settings in profiles.yml are silently ignored. This is one of the most common causes of "it works on my machine" bugs.

Additionally, the s3_data_naming parameter controls how dbt™ generates S3 paths for table data. The options include:

`s3_data_naming` Value	S3 Path Pattern	Best For
`schema_table_unique` (default)	`{s3_data_dir}/{schema}/{table}/{uuid}/`	Avoiding path collisions during full-refresh
`schema_table`	`{s3_data_dir}/{schema}/{table}/`	Predictable paths for external tooling
`table_unique`	`{s3_data_dir}/{table}/{uuid}/`	Simple setups with a single schema
`table`	`{s3_data_dir}/{table}/`	Minimal path structure

⚠️ Key Gotcha: If the table name in your dbt™ model matches an existing table in the Glue Catalog, the adapter deletes the files in that table's S3 location before recreating it. Always use schema_table_unique in shared environments.

Glue Data Catalog and Schema Management

In Athena, there is no separate "database server" — the Glue Data Catalog is the metadata layer. When you set database: awsdatacatalog and schema: analytics in your profile, dbt™ maps these to a Glue catalog and a Glue database respectively.

Key gotchas to know:

Tables, schemas, and databases must be lowercase. The Glue Data Catalog is case-insensitive, but the dbt-athena adapter enforces lowercase. Mixed-case identifiers will cause silent failures.
Schema changes behave differently. The on_schema_change config supports ignore, fail, append_new_columns, and sync_all_columns — but for Iceberg incremental models, sync_all_columns cannot remove columns used for partitioning. You must do a full-refresh.
Dropping tables has side effects. By default, the adapter deletes both the Glue catalog entry and the associated S3 data files. Enable native_drop if you want SQL-only DROP TABLE that leaves S3 data intact (except for Iceberg tables).

Prerequisites in AWS

Before configuring anything in dbt™ or Paradime, you need four things set up in your AWS account.

Athena Workgroup Configuration

Create at least two Athena workgroups — one for development and one for production:

Navigate to Athena Console → Workgroups → Create workgroup.
Set the Query result location to an S3 bucket (e.g., s3://my-company-athena-results/dev/).
Enable "Override client-side settings" to enforce the output location at the workgroup level. This prevents individual developers from accidentally writing results to the wrong bucket.
Configure query data usage controls (bytes scanned limits) appropriate for each environment.

💡 Tip: Workgroup-level enforcement is the simplest way to guarantee consistent S3 output locations across your team. When this is enabled, the s3_data_naming and bucket settings in profiles.yml are ignored in favor of the workgroup's configuration.

S3 Query Results Bucket + Permissions

Create dedicated S3 buckets (or prefixes) for query results and table data:

Critical rules:

All S3 paths must end with a trailing slash (/).
The results bucket must be in the same AWS region as your Athena workgroup. Cross-region configurations will fail with InvalidRequestException.
Ensure bucket policies don't have restrictive Deny statements that block Athena's service principal.

Glue Data Catalog Database Setup

Create Glue databases for each environment. You can do this via the Athena console, the Glue console, or SQL:

Note: If the database doesn't exist, dbt™ will attempt to create it automatically — but this requires glue:CreateDatabase IAM permissions, which you may not want to grant in production.

IAM Policy Checklist

The IAM role or user used by dbt™ needs permissions across three AWS services. Here's the minimum viable policy:

✅ Athena Permissions

athena:StartQueryExecution
athena:GetQueryExecution
athena:GetQueryResults
athena:StopQueryExecution
athena:GetWorkGroup
athena:GetDataCatalog

✅ Glue Permissions (Read)

glue:GetDatabase / glue:GetDatabases
glue:GetTable / glue:GetTables
glue:GetPartition / glue:GetPartitions
glue:GetTableVersions

✅ Glue Permissions (Write)

glue:CreateDatabase (if dbt™ should auto-create schemas)
glue:CreateTable / glue:UpdateTable / glue:DeleteTable
glue:CreatePartition / glue:UpdatePartition / glue:DeletePartition
glue:BatchCreatePartition / glue:BatchDeletePartition
glue:BatchDeleteTable / glue:BatchDeleteTableVersion

✅ S3 Permissions (for both s3_staging_dir and s3_data_dir buckets)

s3:GetObject / s3:PutObject / s3:DeleteObject
s3:ListBucket / s3:GetBucketLocation
s3:ListBucketMultipartUploads / s3:ListMultipartUploadParts
s3:AbortMultipartUpload

✅ Conditional Permissions

kms:GenerateDataKey*, kms:DescribeKey, kms:Decrypt — only if S3 buckets use KMS encryption
lakeformation:GetDataAccess — only if using AWS Lake Formation

Create an Athena Connection in Paradime

With AWS prerequisites in place, you can now wire dbt™ to Athena through Paradime. You'll create two connections: one for the Code IDE (development) and one for Bolt Scheduler (production).

Settings → Connections

Click Settings in the top menu bar.
In the left sidebar, click Connections.
You'll see two sections: Code IDE and Bolt Schedules.
Click Add New next to the appropriate section.
Select Amazon Athena from the warehouse list.

Connection setup flow in Paradime: choose your environment, select Athena, and configure the YAML profile.

Region, Workgroup, S3 Output Location

In the Profile Configuration field, enter the YAML configuration that maps to your dbt™ profiles.yml:

Field	What It Does	Required
`s3_staging_dir`	S3 path for Athena query results	✅
`s3_data_dir`	S3 path where dbt™ materializes tables	✅
`s3_data_naming`	Controls S3 path structure for tables	❌ (default: `schema_table_unique`)
`region_name`	AWS region of your Athena instance	✅
`database`	Glue Data Catalog name	✅
`work_group`	Athena workgroup identifier	❌ (default: `primary`)
`num_retries`	Retries for failed queries	❌ (default: `5`)

Then fill in the UI fields:

dbt™ Profile Name — Must match the profile value in your dbt_project.yml.
Target — Set to dev for Code IDE, prod for Bolt.
Schema — Your default Glue database (e.g., analytics_dev).
Threads — Start with 4 for dev; see the production section for tuning advice.

Auth Approach and Secret Storage

Paradime supports multiple authentication methods for AWS:

AWS Access Keys — Add aws_access_key_id and aws_secret_access_key directly in the profile configuration. Simple but less secure.
AWS Profile Name — Reference a named profile via aws_profile_name if using shared credentials.
IAM Roles (recommended) — Use IAM roles with environment variables. Store sensitive values in Paradime's Environment Variables (Settings → Workspaces → Environment Variables).

Best practice: Never put credentials directly in profile YAML. Use Paradime Environment Variables to store AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, then reference them in your connection. Bolt Scheduler and Code IDE each have their own environment variable scopes.

Configure Development (Code IDE)

Dev Database/Schema Strategy

In Athena, schema maps to a Glue database. Your development strategy should give each developer an isolated namespace to avoid stepping on each other's tables:

Alternatively, use a shared dev database with the schema_table_unique naming convention — the UUID suffix prevents S3 path collisions during concurrent development.

A common pattern:

Environment	Glue Database	S3 Data Path	Workgroup
Dev (per-user)	`analytics_dev_jane`	`s3://dbt-data/dev/jane/`	`dbt-dev`
CI	`analytics_ci`	`s3://dbt-data/ci/`	`dbt-ci`
Prod	`analytics_prod`	`s3://dbt-data/prod/`	`dbt-prod`

Handling Multiple Data Lake Zones (Raw / Stage / Mart)

In a data lake architecture, you typically have source data in one S3 location and transformed data in another. Use dbt™'s sources to point at existing Glue tables over raw data, and materialize staging and mart models into separate Glue databases:

Define sources that reference existing Glue catalog tables:

Data lake zone mapping: raw data stays untouched in source S3 paths, while dbt™ materializes staging and mart models into separate Glue databases and S3 prefixes.

Configure Production (Bolt Scheduler)

Prod Workgroup Isolation

Production dbt™ runs should use a dedicated Athena workgroup that is separate from development. This gives you:

Cost isolation — Track production query costs independently.
Enforced output locations — Enable "Override client-side settings" to guarantee results go to the right S3 bucket.
Query limits — Set per-query data scanning limits to prevent runaway costs.
Separate IAM controls — Restrict who can execute in the production workgroup.

In your Paradime Bolt Scheduler connection, set a production-specific profile:

Set the Target to prod and the Schema to your production Glue database (e.g., analytics_prod).

Retry / Backoff Considerations for Athena

Athena is a shared, serverless service — meaning throttling is real. The dbt-athena adapter uses exponential backoff via the tenacity library:

num_retries controls how many times a failed query is retried (default: 5, meaning up to 6 total attempts).
Backoff uses exponential wait: wait_exponential(multiplier=attempt, max=100, exp_base=2).
The adapter will not retry TOO_MANY_OPEN_PARTITIONS errors to avoid cascading Iceberg filesystem errors.

Production tuning checklist:

Set num_retries: 5 (or higher for critical pipelines).
Keep threads moderate (4–8). High thread counts trigger ThrottlingException on the GetWorkGroup API, which has a lower rate limit (5 calls/sec) than StartQueryExecution (20 calls/sec, burst 80/sec).
If you see ThrottlingException errors, reduce threads first before increasing retries. The adapter caches GetWorkGroup results, but extremely high concurrency can still exceed limits.
Use Paradime Bolt's schedule-level environment variable overrides to adjust num_retries or threads per schedule without modifying the global connection.

Validate with SQL Scratchpad

Before running dbt build, use Paradime's Scratchpad to confirm your connection is wired correctly.

Confirm Workgroup + Output Location Works

Open the Scratchpad in the Code IDE (it auto-creates a paradime_scratch folder with gitignored files) and run:

If this returns a result, your Athena workgroup, IAM permissions, and S3 output location are all functioning. If it fails, check:

Is the s3_staging_dir accessible to your IAM role?
Is the region_name correct?
Does the workgroup exist and have a valid output location configured?

Run a Query Against a Known Glue Table

Next, verify that your Glue catalog integration works:

You can also use the Data Explorer tab in the command panel to preview compiled SQL and query results with row limits (1–1,000 rows), download as CSV, and inspect individual CTEs by highlighting SQL segments.

Step-by-step validation flow using the Paradime Scratchpad to confirm every layer of the Athena connection.

Troubleshooting

AccessDenied for S3 Output

Symptom: Access Denied (Service: Amazon S3; Status Code: 403) when running any query.

Root causes and fixes:

Cause	Fix
IAM role lacks `s3:PutObject` on the results bucket	Add S3 write permissions for the `s3_staging_dir` bucket
S3 bucket is in a different region than Athena	Ensure `region_name` matches the bucket's region
Bucket has a restrictive `Deny` policy	Review the bucket policy for explicit denies that block Athena's access
Workgroup enforces a different output location	Check workgroup settings — if "Override client-side settings" is enabled, your `s3_staging_dir` is ignored
KMS-encrypted bucket without KMS permissions	Add `kms:GenerateDataKey*`, `kms:DescribeKey`, `kms:Decrypt` to the IAM policy

Quick diagnostic:

Glue Database Not Found

Symptom: SCHEMA_NOT_FOUND or Database not found errors during dbt run.

Common causes:

Case mismatch — The Glue database name is case-sensitive in the adapter. Ensure your schema value in the connection profile is all lowercase.
Wrong catalog — If database in your profile doesn't match your Glue Data Catalog name (default is awsdatacatalog), dbt™ will look in the wrong catalog.
Database doesn't exist — If your IAM role doesn't have glue:CreateDatabase permission, dbt™ can't auto-create the schema. Create it manually:
Missing TableType in Glue — If tables were created via the Glue API or CloudFormation without specifying TableType (e.g., EXTERNAL_TABLE), Athena won't recognize them. Re-create the table with the property set.

Slow or Throttled Queries

Symptom: Queries take much longer than expected, or you see ThrottlingException / Rate exceeded errors.

Throttling sources and mitigations:

Service	Error	Mitigation
Athena API	`ThrottlingException` on `GetWorkGroup`	Reduce `threads` (4–8 is safe); upgrade to dbt-athena ≥ 1.7 which caches `GetWorkGroup` calls
AWS Glue	`AWSGlueException: Rate exceeded`	Reduce concurrency; batch partition operations; avoid unnecessary `dbt run` of unchanged models (use `dbt build --select state:modified+`)
Amazon S3	`AmazonS3Exception: Please reduce your request rate`	Use S3 prefix partitioning; avoid very small files; compact data with Glue ETL
AWS KMS	`You have exceeded the rate at which you may call KMS`	Reduce concurrent queries against KMS-encrypted buckets; request a KMS rate limit increase

Performance checklist:

Use Parquet (columnar) file format for table materializations.
Partition tables on high-cardinality columns (note: partitions limited to 100 per partitioned_by — use force_batch for more).
Prefer view materialization for staging models — views are free in Athena.
Avoid very large seeds — the seed_by_insert method is capped at Athena's 262,144 bytes limit.
Consider Iceberg table format for incremental models that need merge semantics.
If using dbt test, run tests in parallel but respect the thread limit to avoid Glue API throttling.

Wrapping Up

Athena is a powerful, serverless query engine — but the distributed nature of its architecture (S3 + Glue + workgroups) means there's more surface area to get wrong compared to a monolithic warehouse. The most common pitfalls are almost always one of three things: S3 permissions, Glue catalog mismatches, or workgroup configuration overrides.

By following the checklists in this guide — setting up isolated workgroups, configuring IAM policies correctly, validating with the Scratchpad before running dbt build, and tuning retry/thread settings for production — you'll avoid the hours of debugging that typically accompany a first-time Athena setup.

Quick reference links:

Interested to Learn More?
Try Out the Free 14-Days Trial

Start free trial

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Start for free

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Start for free

Stop Managing Pipelines. Start Shipping Them.

Join the teams that replaced manual dbt™ workflows with agentic AI. Free to start, no credit card required.

Platform

ADD-ONs

DINOAI

NEW

Programmable Agents

Self-Healing Pipelines

Resources

Industries

About

Legal

Responsible Disclosure Policy

*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.

Platform

ADD-ONs

DINOAI

NEW