Abstract
Firms increasingly rely on A/B testing to evaluate marketing strategies, yet most experiments are analyzed in isolation, limiting insight into why effectiveness varies and how repeated exposure shapes outcomes. We develop a hierarchical Bayesian framework that jointly analyzes randomized marketing interventions to decompose treatment effect heterogeneity into three components: customer responsiveness, campaign design, and contextual timing. The model is scalable, incorporates both observed and unobserved variation, and links estimation directly to policy evaluation with full uncertainty quantification.
Using data from large-scale field experiments involving nearly half a million customers, we document three main findings. First, unobserved customer-level heterogeneity accounts for the majority of variation in treatment effects, explaining roughly two-thirds of the total dispersion, considerably exceeding the contribution of campaign design or timing. Second, we find strong evidence of intervention fatigue: responsiveness declines with repeated exposure and recovers only slowly. Third, in a held-out policy evaluation, a model-based targeting strategy improves revenue on average while substantially reducing the share of customers targeted, with gains reaching over 10% in settings where experimental variation is most informative. Approximately 80% of these gains are driven by personalization based on latent customer responsiveness rather than observable characteristics.
Together, these results demonstrate how repeated experimentation can be leveraged to explain variation in marketing effectiveness and to support scalable targeting and personalization.