In A/B testing, speed and accuracy are a balancing act. Wait too long, and you lose momentum; move too fast, and you risk chasing false positives. PDQ’s CUPED implementation tilts the balance in your favor.
CUPED, short for Controlled Pre-Experiment Data, is a proven statistical technique that reduces variance in your results by factoring in visitor behavior before they’re exposed to the test. At PDQ, we’ve taken CUPED further by adding a dynamic, per-shop optimization layer that maximizes variance reduction while avoiding overfitting. The result is cleaner insights in less time, helping you iterate quickly without cutting statistical corners.
2. PDQ’s Unique “Dynamic-Outlier” Mode
Unlike standard CUPED implementations, which typically apply a fixed covariate and static trimming rules, PDQ optimizes CUPED configuration per shop using dynamic outlier removal:
1. Pre-Experiment Window: 60 days of recent pre-test data.
2. Outlier-Removal Options Tested Per Shop:
Trim at the 99th percentile (TP 99%)
Trim at the 99.9th percentile (TP 99.9%)
Remove values beyond 3 standard deviations (3SD)
3. Rule Selection: Choose the rule yielding the highest correlation (𝜃) between pre-experiment covariate and outcome, subject to guardrails.
4. Qualification: Apply CUPED only if (𝜃) > 10%.
5. Lock-In: Store selected rule and 𝜃 in a configuration table before test start for reproducibility.
This adaptive approach maximizes variance reduction while minimizing the risk of overfitting or unstable adjustments.
3. Pre-Experiment Covariate at PDQ
Initial checkout subtotal before discounts
Definition: Checkout subtotal before discounts, captured at the very first moment the checkout loads, when Shopify sends the initial payload containing the cart value.
Why it’s valid: This is recorded before any exposure to the A/B test, ensuring it is a true pre-treatment measurement.
Unique PDQ advantage: Because it’s collected split-seconds before treatment assignment, this covariate is available for both first-time shoppers and returning customers, a capability many CUPED setups lack. This means the model benefits from a richer data source that applies to 100% of traffic, enhancing statistical power.
Why it’s powerful: Strongly correlated with ARPC (Average Revenue Per Checkout) for converters.
Advantages:
True pre-treatment measurement.
High-resolution, user-level granularity.
Not restricted to returning user cohorts.
Challenges:
Zero inflation: Many sessions have a recorded pre-experiment subtotal (X) but zero post-experiment ARPC (Y) due to abandonment.
Mixture distribution: Population consists of two distinct subgroups — converters (positive spend) and non-converters (zero spend) — producing a nonlinear X–Y relationship beyond zero inflation alone.
4. CUPED Adjustment Formula
Yadj = Y -X-EX
Where: Y = Post-experiment ARPC
X = Pre-experiment covariate (initial checkout subtotal before discounts)
= Cov(Y, X) / Var(X)
The parameter quantifies the fraction of variance removed. Shops can request this from their CSM; excluding outliers can sometimes double this value, indicating substantially higher noise control.
5. Business Impact at PDQ
Speed: Up to ~15 days saved in sequential-bound tests
Efficiency: Cuts idea-to-decision cycles nearly in half, expanding experimentation bandwidth.
Rigor: Sequential α-spending ensures Type-I error remains controlled even with variance reduction.
6. Current Limitations During Transition
Backend-Only Computation: CUPED adjustments, dynamic outlier removal, and sequential bounds are computed in the stats-engine backend. Raw exports alone cannot replicate official results.
BI Reporting Outlier Definition Mismatch:
Test Results Dashboard: Outliers trimmed using initial cart value at checkout start.
Deep A/B Test Analysis (legacy): Outliers trimmed using final order subtotal. This difference can cause inconsistent numbers between views. Recommendation: Use the Test Results Dashboard for authoritative CUPED figures.
Deep A/B Test Analysis “Slice and Dice” Tool Limitations: Now positioned for exploratory slicing/filtering. No statistical significance is provided; it cannot replicate the main CUPED model.
You Might See High Significance in ARPC but Not in Its Components: Two small, individually non-sig shifts (Δp, Δm) can combine (plus their covariance) to make Δ(ARPC)) statistically significant, especially when ARPC’s variance is reduced via CUPED.
ARPC=p×m
where p = conversion rate,m = avg order value (among converters):
Δ(ARPC)=p1m1−p0m0≈mΔp+pΔm+ΔpΔm
Even if conversion rate (p) and order value (m) individually don’t show significance, their combined effect on ARPC might. Especially after variance reduction.
7. Future Direction
CUPED V2: Multi-covariate CUPED to capture more variance.
CUPED isn’t about changing your results, it’s about revealing them faster and with greater clarity. PDQ’s adaptive, dynamic-outlier approach ensures each shop gets the optimal setup for maximum variance reduction and minimal bias.
For merchants running multiple tests per quarter, the time savings can be game-changing - freeing up capacity to validate more ideas, faster.
Next step: If you’d like CUPED enabled for your upcoming tests or want to review your shop’s CUPED configuration, reach out to your PDQ CSM.