dbt™ and Amazon Athena on Paradime: Setup Guide for Reliable Scheduled Runs
Feb 26, 2026
The Complete dbt™ Athena Guide: Setup, Gotchas, and Production Checklists on Paradime
If you've used dbt™ with Snowflake or BigQuery, plugging it into Amazon Athena can feel deceptively similar — until it isn't. Athena is serverless and cost-effective, but it introduces a unique combination of AWS services — S3 for storage, the Glue Data Catalog for metadata, and workgroups for governance — that create configuration surface area you simply don't encounter with traditional cloud warehouses.
This dbt™ Athena guide walks you through everything: what makes Athena different, the AWS prerequisites you need before writing a single model, how to wire it up in Paradime, and the troubleshooting playbook you'll need when things go sideways.
Athena + dbt™: What's Different
Before diving into configuration, it's worth understanding why Athena requires more deliberate setup than a typical warehouse. The core difference is architectural: Athena doesn't own your data or metadata. It reads from S3 and looks up schemas in the AWS Glue Data Catalog. This separation of storage, metadata, and compute means your dbt™ project must be configured to talk to all three correctly.
How dbt™ interacts with Athena's distributed architecture: adapter queries flow through Athena, which reads metadata from Glue and data from S3, all governed by workgroup settings.
S3 Output Location and Workgroups
Every Athena query writes results to an S3 location. In dbt™, this is controlled by two profile settings:
s3_staging_dir— Where Athena stores query results and metadata (required).s3_data_dir— Where dbt™ materializes table data in S3 (optional but strongly recommended).
Here's the gotcha: if your Athena workgroup has a default output location configured and the "Override client-side settings" option is enabled, the workgroup's S3 location takes precedence. Your s3_staging_dir and s3_data_naming settings in profiles.yml are silently ignored. This is one of the most common causes of "it works on my machine" bugs.
Additionally, the s3_data_naming parameter controls how dbt™ generates S3 paths for table data. The options include:
| S3 Path Pattern | Best For |
|---|---|---|
|
| Avoiding path collisions during full-refresh |
|
| Predictable paths for external tooling |
|
| Simple setups with a single schema |
|
| Minimal path structure |
⚠️ Key Gotcha: If the table name in your dbt™ model matches an existing table in the Glue Catalog, the adapter deletes the files in that table's S3 location before recreating it. Always use
schema_table_uniquein shared environments.
Glue Data Catalog and Schema Management
In Athena, there is no separate "database server" — the Glue Data Catalog is the metadata layer. When you set database: awsdatacatalog and schema: analytics in your profile, dbt™ maps these to a Glue catalog and a Glue database respectively.
Key gotchas to know:
Tables, schemas, and databases must be lowercase. The Glue Data Catalog is case-insensitive, but the dbt-athena adapter enforces lowercase. Mixed-case identifiers will cause silent failures.
Schema changes behave differently. The
on_schema_changeconfig supportsignore,fail,append_new_columns, andsync_all_columns— but for Iceberg incremental models,sync_all_columnscannot remove columns used for partitioning. You must do a full-refresh.Dropping tables has side effects. By default, the adapter deletes both the Glue catalog entry and the associated S3 data files. Enable
native_dropif you want SQL-onlyDROP TABLEthat leaves S3 data intact (except for Iceberg tables).
Prerequisites in AWS
Before configuring anything in dbt™ or Paradime, you need four things set up in your AWS account.
Athena Workgroup Configuration
Create at least two Athena workgroups — one for development and one for production:
Navigate to Athena Console → Workgroups → Create workgroup.
Set the Query result location to an S3 bucket (e.g.,
s3://my-company-athena-results/dev/).Enable "Override client-side settings" to enforce the output location at the workgroup level. This prevents individual developers from accidentally writing results to the wrong bucket.
Configure query data usage controls (bytes scanned limits) appropriate for each environment.
💡 Tip: Workgroup-level enforcement is the simplest way to guarantee consistent S3 output locations across your team. When this is enabled, the
s3_data_namingand bucket settings inprofiles.ymlare ignored in favor of the workgroup's configuration.
S3 Query Results Bucket + Permissions
Create dedicated S3 buckets (or prefixes) for query results and table data:
Critical rules:
All S3 paths must end with a trailing slash (
/).The results bucket must be in the same AWS region as your Athena workgroup. Cross-region configurations will fail with
InvalidRequestException.Ensure bucket policies don't have restrictive
Denystatements that block Athena's service principal.
Glue Data Catalog Database Setup
Create Glue databases for each environment. You can do this via the Athena console, the Glue console, or SQL:
Note: If the database doesn't exist, dbt™ will attempt to create it automatically — but this requires
glue:CreateDatabaseIAM permissions, which you may not want to grant in production.
IAM Policy Checklist
The IAM role or user used by dbt™ needs permissions across three AWS services. Here's the minimum viable policy:
✅ Athena Permissions
athena:StartQueryExecutionathena:GetQueryExecutionathena:GetQueryResultsathena:StopQueryExecutionathena:GetWorkGroupathena:GetDataCatalog
✅ Glue Permissions (Read)
glue:GetDatabase/glue:GetDatabasesglue:GetTable/glue:GetTablesglue:GetPartition/glue:GetPartitionsglue:GetTableVersions
✅ Glue Permissions (Write)
glue:CreateDatabase(if dbt™ should auto-create schemas)glue:CreateTable/glue:UpdateTable/glue:DeleteTableglue:CreatePartition/glue:UpdatePartition/glue:DeletePartitionglue:BatchCreatePartition/glue:BatchDeletePartitionglue:BatchDeleteTable/glue:BatchDeleteTableVersion
✅ S3 Permissions (for both s3_staging_dir and s3_data_dir buckets)
s3:GetObject/s3:PutObject/s3:DeleteObjects3:ListBucket/s3:GetBucketLocations3:ListBucketMultipartUploads/s3:ListMultipartUploadPartss3:AbortMultipartUpload
✅ Conditional Permissions
kms:GenerateDataKey*,kms:DescribeKey,kms:Decrypt— only if S3 buckets use KMS encryptionlakeformation:GetDataAccess— only if using AWS Lake Formation
Create an Athena Connection in Paradime
With AWS prerequisites in place, you can now wire dbt™ to Athena through Paradime. You'll create two connections: one for the Code IDE (development) and one for Bolt Scheduler (production).
Settings → Connections
Click Settings in the top menu bar.
In the left sidebar, click Connections.
You'll see two sections: Code IDE and Bolt Schedules.
Click Add New next to the appropriate section.
Select Amazon Athena from the warehouse list.
Connection setup flow in Paradime: choose your environment, select Athena, and configure the YAML profile.
Region, Workgroup, S3 Output Location
In the Profile Configuration field, enter the YAML configuration that maps to your dbt™ profiles.yml:
Field | What It Does | Required |
|---|---|---|
| S3 path for Athena query results | ✅ |
| S3 path where dbt™ materializes tables | ✅ |
| Controls S3 path structure for tables | ❌ (default: |
| AWS region of your Athena instance | ✅ |
| Glue Data Catalog name | ✅ |
| Athena workgroup identifier | ❌ (default: |
| Retries for failed queries | ❌ (default: |
Then fill in the UI fields:
dbt™ Profile Name — Must match the
profilevalue in yourdbt_project.yml.Target — Set to
devfor Code IDE,prodfor Bolt.Schema — Your default Glue database (e.g.,
analytics_dev).Threads — Start with
4for dev; see the production section for tuning advice.
Auth Approach and Secret Storage
Paradime supports multiple authentication methods for AWS:
AWS Access Keys — Add
aws_access_key_idandaws_secret_access_keydirectly in the profile configuration. Simple but less secure.AWS Profile Name — Reference a named profile via
aws_profile_nameif using shared credentials.IAM Roles (recommended) — Use IAM roles with environment variables. Store sensitive values in Paradime's Environment Variables (Settings → Workspaces → Environment Variables).
Best practice: Never put credentials directly in profile YAML. Use Paradime Environment Variables to store
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEY, then reference them in your connection. Bolt Scheduler and Code IDE each have their own environment variable scopes.
Configure Development (Code IDE)
Dev Database/Schema Strategy
In Athena, schema maps to a Glue database. Your development strategy should give each developer an isolated namespace to avoid stepping on each other's tables:
Alternatively, use a shared dev database with the schema_table_unique naming convention — the UUID suffix prevents S3 path collisions during concurrent development.
A common pattern:
Environment | Glue Database | S3 Data Path | Workgroup |
|---|---|---|---|
Dev (per-user) |
|
|
|
CI |
|
|
|
Prod |
|
|
|
Handling Multiple Data Lake Zones (Raw / Stage / Mart)
In a data lake architecture, you typically have source data in one S3 location and transformed data in another. Use dbt™'s sources to point at existing Glue tables over raw data, and materialize staging and mart models into separate Glue databases:
Define sources that reference existing Glue catalog tables:
Data lake zone mapping: raw data stays untouched in source S3 paths, while dbt™ materializes staging and mart models into separate Glue databases and S3 prefixes.
Configure Production (Bolt Scheduler)
Prod Workgroup Isolation
Production dbt™ runs should use a dedicated Athena workgroup that is separate from development. This gives you:
Cost isolation — Track production query costs independently.
Enforced output locations — Enable "Override client-side settings" to guarantee results go to the right S3 bucket.
Query limits — Set per-query data scanning limits to prevent runaway costs.
Separate IAM controls — Restrict who can execute in the production workgroup.
In your Paradime Bolt Scheduler connection, set a production-specific profile:
Set the Target to prod and the Schema to your production Glue database (e.g., analytics_prod).
Retry / Backoff Considerations for Athena
Athena is a shared, serverless service — meaning throttling is real. The dbt-athena adapter uses exponential backoff via the tenacity library:
num_retriescontrols how many times a failed query is retried (default:5, meaning up to 6 total attempts).Backoff uses exponential wait:
wait_exponential(multiplier=attempt, max=100, exp_base=2).The adapter will not retry
TOO_MANY_OPEN_PARTITIONSerrors to avoid cascading Iceberg filesystem errors.
Production tuning checklist:
Set
num_retries: 5(or higher for critical pipelines).Keep
threadsmoderate (4–8). High thread counts triggerThrottlingExceptionon theGetWorkGroupAPI, which has a lower rate limit (5 calls/sec) thanStartQueryExecution(20 calls/sec, burst 80/sec).If you see
ThrottlingExceptionerrors, reduce threads first before increasing retries. The adapter cachesGetWorkGroupresults, but extremely high concurrency can still exceed limits.Use Paradime Bolt's schedule-level environment variable overrides to adjust
num_retriesorthreadsper schedule without modifying the global connection.
Validate with SQL Scratchpad
Before running dbt build, use Paradime's Scratchpad to confirm your connection is wired correctly.
Confirm Workgroup + Output Location Works
Open the Scratchpad in the Code IDE (it auto-creates a paradime_scratch folder with gitignored files) and run:
If this returns a result, your Athena workgroup, IAM permissions, and S3 output location are all functioning. If it fails, check:
Is the
s3_staging_diraccessible to your IAM role?Is the
region_namecorrect?Does the workgroup exist and have a valid output location configured?
Run a Query Against a Known Glue Table
Next, verify that your Glue catalog integration works:
You can also use the Data Explorer tab in the command panel to preview compiled SQL and query results with row limits (1–1,000 rows), download as CSV, and inspect individual CTEs by highlighting SQL segments.
Step-by-step validation flow using the Paradime Scratchpad to confirm every layer of the Athena connection.
Troubleshooting
AccessDenied for S3 Output
Symptom: Access Denied (Service: Amazon S3; Status Code: 403) when running any query.
Root causes and fixes:
Cause | Fix |
|---|---|
IAM role lacks | Add S3 write permissions for the |
S3 bucket is in a different region than Athena | Ensure |
Bucket has a restrictive | Review the bucket policy for explicit denies that block Athena's access |
Workgroup enforces a different output location | Check workgroup settings — if "Override client-side settings" is enabled, your |
KMS-encrypted bucket without KMS permissions | Add |
Quick diagnostic:
Glue Database Not Found
Symptom: SCHEMA_NOT_FOUND or Database not found errors during dbt run.
Common causes:
Case mismatch — The Glue database name is case-sensitive in the adapter. Ensure your
schemavalue in the connection profile is all lowercase.Wrong catalog — If
databasein your profile doesn't match your Glue Data Catalog name (default isawsdatacatalog), dbt™ will look in the wrong catalog.Database doesn't exist — If your IAM role doesn't have
glue:CreateDatabasepermission, dbt™ can't auto-create the schema. Create it manually:Missing
TableTypein Glue — If tables were created via the Glue API or CloudFormation without specifyingTableType(e.g.,EXTERNAL_TABLE), Athena won't recognize them. Re-create the table with the property set.
Slow or Throttled Queries
Symptom: Queries take much longer than expected, or you see ThrottlingException / Rate exceeded errors.
Throttling sources and mitigations:
Service | Error | Mitigation |
|---|---|---|
Athena API |
| Reduce |
AWS Glue |
| Reduce concurrency; batch partition operations; avoid unnecessary |
Amazon S3 |
| Use S3 prefix partitioning; avoid very small files; compact data with Glue ETL |
AWS KMS |
| Reduce concurrent queries against KMS-encrypted buckets; request a KMS rate limit increase |
Performance checklist:
Use Parquet (columnar) file format for table materializations.
Partition tables on high-cardinality columns (note: partitions limited to 100 per
partitioned_by— useforce_batchfor more).Prefer
viewmaterialization for staging models — views are free in Athena.Avoid very large seeds — the
seed_by_insertmethod is capped at Athena's 262,144 bytes limit.Consider Iceberg table format for incremental models that need
mergesemantics.If using
dbt test, run tests in parallel but respect the thread limit to avoid Glue API throttling.
Wrapping Up
Athena is a powerful, serverless query engine — but the distributed nature of its architecture (S3 + Glue + workgroups) means there's more surface area to get wrong compared to a monolithic warehouse. The most common pitfalls are almost always one of three things: S3 permissions, Glue catalog mismatches, or workgroup configuration overrides.
By following the checklists in this guide — setting up isolated workgroups, configuring IAM policies correctly, validating with the Scratchpad before running dbt build, and tuning retry/thread settings for production — you'll avoid the hours of debugging that typically accompany a first-time Athena setup.
Quick reference links:


