The Data-Driven Advantage in Betting & Gambling.
A practical guide to the analytics, metrics, and AI infrastructure behind the operators winning a $187B global market — and how Paradime's agentic OS turns data engineering from a bottleneck into a moat.
What's inside.
Six sections. Start anywhere — operators reading for benchmarks should jump to §3, data leaders evaluating tooling to §5.
The market has doubled. The winners aren't the biggest — they're the best with data.
The global betting and gambling market will exceed $125 billion in 2026 and is on track to reach ~$187 billion by 2030. In the US alone, legal wagers crossed $170 billion in 2025, with three-quarters now placed on mobile. The UK's mature market continues to grow remote gross gambling yield ahead of land-based, and live in-play wagering now represents the majority of online activity in Europe.
Underneath the headline numbers, the economics have changed. Customer acquisition costs have climbed past $300 per FTD in saturated US states. Hold percentages are creeping up — but only for operators who can price, personalize, and react in real time. Regulatory pressure on responsible gambling, AML, and affordability checks is converting data quality from a back-office concern into an existential one.
The operators winning the next decade won't be the ones with the most products or the largest marketing budgets. They will be the ones whose data engineering moves at the speed of their traders, marketers, and risk teams.
This whitepaper covers four things:
- Where the market is. Concrete figures on US, UK, and global growth, the online/live-betting shift, and the M&A and regulatory forces reshaping competitive position.
- What to measure. The financial, player-centric, and marketing KPIs that separate compounding operators from cash-burning ones — including the metrics that legacy BI reports usually miss.
- What to do with it. Personalization, churn prediction, and CLV optimization patterns that have moved retention 10–30% in production deployments.
- How to ship it faster. Why the analytics layer — not the model layer — is now the bottleneck, and how Paradime's agentic OS for data engineering compresses ticket-to-production from weeks to minutes.
A market reshaped by mobile, live betting, and consolidation.
Gambling is no longer a single business — it is a portfolio of fast-moving, regulated, increasingly digital sub-markets. Operators who treat it as one homogeneous category are already losing share.
The US: a land grab past its early innings
Since the 2018 Supreme Court ruling that opened sports betting to state-level regulation, 38 states plus DC have legalized it in some form. The American Gaming Association estimates US legal wagers exceeded $170 billion in 2025, with the industry's blended hold percentage climbing from roughly 8.1% in 2022 to 9.1% in 2023 as operators improved pricing and product mix.
The market is concentrating. FanDuel and DraftKings together control roughly 70% of online sports betting GGR in most legal states, with BetMGM, Caesars Digital, Fanatics, and ESPN BET competing for the long tail. iGaming — legal in only seven states — already produces hold percentages two to three times higher than sportsbook on the same handle, and is the single largest growth lever available to operators today.
The UK: maturity, remote dominance, and tightening rules
The UK Gambling Commission's most recent industry statistics show total gross gambling yield (GGY) of around £15.6 billion, with the remote sector contributing the largest single share at over £6.9 billion. Land-based casinos and bingo are still recovering pandemic-era footprints; remote betting and remote casino are not. The 2023 White Paper review introduced affordability checks, stake limits on online slots, and a statutory levy on operators — every one of which converts directly into a data infrastructure requirement.
Global: $125B today, $187B by 2030
Six trends every operator's data team is reacting to
Mobile-first is the default
Online accounts for ~75% of US legal handle; in established UK and EU markets, the figure exceeds 80%. Retail still matters for brand and cross-sell, but the data, the personalization, and the margin are mobile.
Live in-play takes over
Live betting now drives 60–65% of online sports volume in mature European markets and is climbing fast in North America. It demands sub-second pricing, low-latency feeds, and feature stores that update in real time.
iGaming is the margin engine
Where legal, online casino delivers 2–3x the hold of sportsbook on similar handle. The seven legalized US states are a leading indicator; cross-sell from sportsbook to iGaming is the single biggest CLV unlock available.
Affordability and AML
UK affordability checks, EU responsible gambling rules, and US state-level RG mandates require operator-side data on session intensity, deposit velocity, and behavioral risk — at write speed, not weekly.
Consolidation accelerates
Flutter, Entain, MGM, and Caesars continue to absorb mid-tier operators across Europe and the Americas. Successful integration depends almost entirely on whether data platforms can be unified post-deal.
Acquisition keeps getting harder
CPA in saturated US states has crossed $300 per FTD. The path to positive unit economics is no longer "more spend, more channels" — it is better targeting, faster reactivation, and longer player lifetimes.
Three pillars hold up every modern operator.
Every data initiative inside a betting business — no matter how it is named on the org chart — ultimately serves one of three commercial outcomes: grow NGR, cut churn, or extend CLV. Use this as a triage filter for every roadmap.
Grow Net Gaming Revenue
Pricing accuracy, product mix, cross-sell from sportsbook to iGaming, and live in-play coverage. The instrumented operator wins by knowing which margin lever to pull, by segment, in real time.
Reduce Churn
Behavioural early-warning systems that catch session intensity drops, balance withdrawals without redeposits, and bonus-only patterns — early enough to intervene with the right offer, not the same offer.
Maximize CLV
Segment-aware lifetime modelling, VIP detection well before the first whale-sized deposit, and reactivation campaigns priced against true incremental margin — not last-touch attribution.
The five building blocks
To move on any of those three pillars, a data team needs five capabilities working together. Most operators have at least two of them in some form. The leaders have all five, integrated, and they ship changes to them weekly — not quarterly.
- Unified data foundation. Every wager, deposit, withdrawal, session, marketing touch, and RG signal in one warehouse, modelled consistently. Without this, every other capability degrades into a slower, less reliable version of itself.
- Real-time event streaming. Sub-second feeds for live betting, fraud, and session-level RG checks. Batch is acceptable for finance reporting; it is not acceptable for in-play pricing.
- Production-grade ML. Models for churn, CLV, fraud, RG triggers, and recommendation — versioned, monitored, and tied to clearly defined business KPIs rather than F1 scores in a notebook.
- Self-serve BI. Marketing, trading, and RG teams answering 80% of their own questions without a Jira ticket. The remaining 20% is where the data team adds real value.
- Governance and lineage. Column-level lineage, PII controls, and full audit trails — required by regulators, depended on by every other team when something breaks.
The modern data stack for gambling
The warehouse and BI layers have largely commoditized. The orchestration and engineering layer in the middle has not — and that is where most operators still measure cycle time in weeks. Every hour spent there is an hour the trading team waits for a new market, the marketing team waits for a new segment, and the RG team waits for a new threshold rule.
What to measure, why it matters, and where most teams get it wrong.
Three families of metrics — financial, player, and marketing — sit at the heart of every operator's reporting. A surprising number of teams track all of them and still fail to act on them, because the metrics are not connected to a single underlying model of the player.
Financial metrics
| Metric | What it measures | Why it matters |
|---|---|---|
| Handle | Total amount wagered across products and channels. | The top of the funnel. Useful as a volume signal — but a vanity metric on its own. |
| GGR | Gross Gaming Revenue: handle minus winnings paid out. | The truer revenue line. Must be split by product, channel, and live vs pre-match. |
| NGR | GGR minus bonuses, free bets, and promotional cost. | The number that pays salaries. The gap between GGR and NGR is where promotional discipline lives or dies. |
| Hold % | GGR as a percentage of handle. | The clearest single measure of pricing and product-mix quality. US blended hold has moved from ~8.1% to ~9.1% in recent years. |
| EBITDA margin | Operating profitability after marketing, gaming taxes, and platform costs. | The end-state metric the market cares about. Separates winners from acquisition-burning runners-up. |
Operator scale, 2025 — for context
Player-centric metrics
| Metric | What it measures | Why it matters |
|---|---|---|
| FTD | First-Time Depositor — the moment of monetary commitment. | The conversion event that ties acquisition spend to revenue. Track time-to-FTD from registration too. |
| ARPU / ARPPU | Average revenue per user, per active user, and per paying user. | Three different cuts. The gap between ARPU and ARPPU tells you how much of your base is freeloading on free bets. |
| CLV | Customer Lifetime Value — predicted, not historical. | The single most valuable per-player number. Powers offer caps, VIP routing, and acquisition bidding. |
| Churn rate | % of active players inactive over a defined window (typically 30/60/90 day). | Easy to compute. Easy to misinterpret. Pair with churn probability per player to act on it. |
| Session intensity | Frequency, duration, and stake velocity within and across sessions. | The leading indicator for both retention drops and RG triggers. Tracking this in batch is too late. |
Marketing metrics
| Metric | What it measures | Why it matters |
|---|---|---|
| CPA / CAC | Cost per acquisition / cost to acquire a customer (often per FTD). | Now exceeds $300 in saturated US states. The metric most likely to determine whether a state is profitable. |
| Bonus cost ratio | Promotional spend as a % of GGR. | The discipline metric. A ratio creeping up without NGR following is a red flag for incentive-only behaviour. |
| CLV : CAC | Predicted lifetime value divided by cost to acquire. | The unit-economics ratio. Anything below 3:1 is a slow-motion problem; below 1:1 is an emergency. |
| Reactivation rate | % of dormant players returned to active by a campaign. | Often the highest-ROI lever available. Cheaper than acquisition; bigger than upsell. |
Handle is interesting. NGR is real. CLV : CAC is the metric your CFO and your CMO can finally agree on — and it cannot be computed without all three categories above modelled in the same place.
Three plays that move the needle — when the data layer can keep up.
The patterns below are not novel. What separates the operators getting 10–30% retention lifts from the ones running the same plays for flat results is execution speed, segment granularity, and the willingness to ship and measure weekly.
Personalization at the player level
Generic offers, generic homepages, and generic push sequences are the single largest source of avoidable spend in this industry. The operators who have moved past it run a closed loop: behavioural signal in, segment assignment, offer or content variant, measured outcome, model update — repeated daily.
Concretely, that looks like:
- Sport and market preferences driving homepage layout and notification content per player.
- Stake-size segmentation driving promotional caps so that a £10 weekend bettor isn't offered the same boost as a four-figure VIP.
- Time-of-day modelling determining when push notifications are sent — and, more importantly, when they aren't.
- Cross-product nudging from sportsbook to iGaming when player models suggest receptiveness, with hold uplift typically 2–3x the source product.
Churn prevention before it shows up in a dashboard
Most operators detect churn by definition: 30 days inactive. That is not detection; that is documentation. The leaders model churn probability per player on rolling daily windows, using session intensity, deposit patterns, withdrawal-without-redeposit signals, and bonus-only behaviour — and they trigger interventions when probability rises, not when activity has already stopped.
Done well, this produces three classes of intervention:
- Soft — a feature highlight, a relevant content recommendation, a free token on their preferred sport. Cheap, high-frequency.
- Medium — a personalised boost or matched-bet offer sized against predicted CLV, not a flat dollar amount.
- Hard — a VIP host outreach for high-value players whose churn probability has crossed an action threshold. Expensive, high-conversion, and lethal when done at scale by automation.
CLV optimization and the whale economics problem
Player value distribution in this industry is extreme. A small percentage of players typically drive the majority of NGR, and that distribution gets more skewed in iGaming than in sportsbook. Three operational consequences follow:
- VIP detection has to be predictive, not reactive. By the time someone has deposited £50,000, every operator's CRM knows. The moat is detecting the signal months earlier — in stake escalation patterns, session frequency, and product breadth — and routing the right host.
- Reactivation pricing has to be CLV-aware. Spending the same reactivation budget on a £20 bettor and a £2,000 bettor is the most common waste pattern in marketing spend. Predicted CLV should set the cap.
- Responsible gambling controls protect CLV — they don't undermine it. The same models that detect harmful patterns protect players who would otherwise self-exclude or cool off permanently. The teams that internalise this earn long-term player trust as a moat, not a compliance burden.
Underneath all three: governance you can move with
Personalization, churn modelling, and CLV optimization all touch sensitive data. They are also all subject to escalating regulatory scrutiny — affordability, AML, RG, and standard data protection. The operators who handle this best treat governance as infrastructure, not a periodic audit. Column-level lineage, PII tagging, automated test coverage on every model, and a full audit trail on every change to a production pipeline are the baseline.
The reward for getting governance right isn't just lower regulatory risk — it's higher data team velocity. When every model has tested lineage, change becomes safe; when change is safe, it becomes frequent; when it becomes frequent, the analytics layer stops being the bottleneck.
The agentic OS for data engineering.
Paradime moves the analytics layer at the speed the rest of a betting business already moves. Three things make that possible: an agent (DinoAI) that does the engineering, a CI/CD platform (Bolt) that ships it safely, and a workspace built around dbt™, Spark, and Python that the team already knows.
DinoAI — the first end-to-end data engineering agent
Most "AI assistants" for data work autocomplete a SELECT statement. DinoAI takes a Slack or Teams ticket and ships a tested, reviewed pull request. v3.0 introduced background agent mode: a marketing analyst pings a channel, DinoAI scopes the change, writes the dbt™ model, generates the tests, opens the PR, and tags the engineer who owns the domain. Median ticket-to-PR time drops from 10–20 days to under 5 minutes.
The reason it works in production where other agents don't is that DinoAI has been built for data engineering, not for general code. It understands dbt™ project structure, lineage, and macro patterns. It runs in the user's warehouse permissions, so it tests the SQL against real data — and it has been benchmarked accordingly.
Bolt — the orchestration and CI/CD layer that catches breakages before they ship
Bolt is Paradime's purpose-built orchestration and CI/CD for analytics work. It runs on the same dbt™ project the team already maintains, with column-level lineage diffs on every PR, and TurboCI for testing partial pipelines instead of rebuilding the whole DAG. 99.9% uptime, 7M+ monthly model builds across customers, and a 70% MTTR reduction on the breakages that do happen.
For a betting operator, the Bolt math is simple: every minute the trading or RG team waits on a model change is a minute of either lost revenue or unmanaged risk. Compressing PR review and deployment from days to minutes converts directly into faster pricing changes, faster RG rule rollouts, and faster integration of acquired books.
How it maps to the three pillars
| Pillar | What you need | How Paradime delivers it |
|---|---|---|
| Grow NGR | Live pricing models, cross-sell logic, fast iteration on segment definitions | DinoAI ships new dbt™ models from a Slack ticket; Bolt validates lineage and rolls out within minutes — not a sprint. |
| Reduce Churn | Daily-refreshed churn probabilities, intervention triggers, A/B test pipelines | Test scaffolding is generated automatically; column-level lineage means a churn model change is provably safe before it ships. |
| Maximize CLV | Predicted CLV per player, VIP detection, segment-aware reactivation | Agent-built feature pipelines stay in sync with source systems; governance and PII controls let teams move fast without risk. |
Why generic AI doesn't solve this
A general-purpose code assistant in an IDE has no model of the warehouse, no understanding of the dbt™ project graph, no test execution against real data, and no concept of CI/CD. It can suggest SQL. It cannot ship a tested, reviewed pipeline change. The 30-point gap on ADE-Bench between DinoAI and a vanilla agent is not a tuning gap — it is a system design gap.
A 7-day rollout, not a re-platform
Connect & observe
Paradime connects to the existing warehouse and dbt™ project. DinoAI ingests lineage and starts surfacing model debt, performance issues, and test gaps. No code change required to start.
Ship the first agent PRs
The data team starts routing tickets to DinoAI from Slack. First production PRs typically merge on the second day. Bolt CI/CD takes over PR validation with column-level lineage diffs.
Scale to teams
Marketing, trading, and RG teams begin self-service ticketing into the agent. Median ticket-to-PR time drops from days to minutes. The team's roadmap stops being capacity-bound.
Median ticket-to-PR under five minutes. Agent acceptance rate above 90%. Lineage coverage at 100%. Data downtime down by an order of magnitude. The data team finally working on the problems that need humans.
The data-driven future is already here. The question is who can ship it.
Every operator at the top of the league table this decade will have made the same three bets. They will have built a unified data foundation that treats every wager, deposit, session, and marketing touch as one stream. They will have moved from descriptive dashboards to predictive models that act before churn, before RG triggers, before the trading team is caught short. And they will have rebuilt their data engineering layer so that "ship a pipeline change" is measured in minutes, not sprints.
The third bet is the one most operators have not yet placed. It is also the one that determines whether the first two pay off.
See DinoAI on your own dbt™ project.
Paradime connects to your warehouse in under an hour. The first agent-built PR usually lands within a week. We'll show you both, on your code, with no slide deck required.
Platform
Resources
ADD-ONs
Industries
Copyright © 2026 Paradime Labs, Inc. Made with ❤️ in San Francisco ・ London
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
Platform
Resources
ADD-ONs
Industries


Copyright © 2026 Paradime Labs, Inc. Made with ❤️ in San Francisco ・ London
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.
Platform
Resources
ADD-ONs
Industries


Copyright © 2026 Paradime Labs, Inc. Made with ❤️ in San Francisco ・ London
*dbt® and dbt Core® are federally registered trademarks of dbt Labs, Inc. in the United States and various jurisdictions around the world. Paradime is not a partner of dbt Labs. All rights therein are reserved to dbt Labs. Paradime is not a product or service of or endorsed by dbt Labs, Inc.