Controlled Pre-Experiment Data (CUPED) and how we use it

In A/B testing, speed and accuracy are a balancing act. Wait too long, and you lose momentum; move too fast, and you risk chasing false positives. PDQ’s CUPED implementation tilts the balance in your favor.

CUPED, short for Controlled Pre-Experiment Data, is a proven statistical technique that reduces variance in your results by factoring in visitor behavior before they’re exposed to the test. At PDQ, we’ve taken CUPED further by adding a dynamic, per-shop optimization layer that maximizes variance reduction while avoiding overfitting. The result is cleaner insights in less time, helping you iterate quickly without cutting statistical corners.

2. PDQ’s Unique “Dynamic-Outlier” Mode

Unlike standard CUPED implementations, which typically apply a fixed covariate and static trimming rules, PDQ optimizes CUPED configuration per shop using dynamic outlier removal:

1. Pre-Experiment Window: 60 days of recent pre-test data.

2. Outlier-Removal Options Tested Per Shop:

Trim at the 99th percentile (TP 99%)
Trim at the 99.9th percentile (TP 99.9%)
Remove values beyond 3 standard deviations (3SD)

3. Rule Selection: Choose the rule yielding the highest correlation (𝜃) between pre-experiment covariate and outcome, subject to guardrails.

4. Qualification: Apply CUPED only if (𝜃) > 10%.

5. Lock-In: Store selected rule and 𝜃 in a configuration table before test start for reproducibility.

This adaptive approach maximizes variance reduction while minimizing the risk of overfitting or unstable adjustments.

3. Pre-Experiment Covariate at PDQ

Initial checkout subtotal before discounts

Definition: Checkout subtotal before discounts, captured at the very first moment the checkout loads, when Shopify sends the initial payload containing the cart value.
Why it’s valid: This is recorded before any exposure to the A/B test, ensuring it is a true pre-treatment measurement.
Unique PDQ advantage: Because it’s collected split-seconds before treatment assignment, this covariate is available for both first-time shoppers and returning customers, a capability many CUPED setups lack. This means the model benefits from a richer data source that applies to 100% of traffic, enhancing statistical power.
Why it’s powerful: Strongly correlated with ARPC (Average Revenue Per Checkout) for converters.
Advantages:
- True pre-treatment measurement.
- High-resolution, user-level granularity.
- Not restricted to returning user cohorts.
Challenges:
- Zero inflation: Many sessions have a recorded pre-experiment subtotal (X) but zero post-experiment ARPC (Y) due to abandonment.
- Mixture distribution: Population consists of two distinct subgroups — converters (positive spend) and non-converters (zero spend) — producing a nonlinear X–Y relationship beyond zero inflation alone.

4. CUPED Adjustment Formula

Yadj = Y -X-EX

Where: Y = Post-experiment ARPC

X = Pre-experiment covariate (initial checkout subtotal before discounts)

= Cov(Y, X) / Var(X)

The parameter quantifies the fraction of variance removed. Shops can request this from their CSM; excluding outliers can sometimes double this value, indicating substantially higher noise control.

5. Business Impact at PDQ

Speed: Up to ~15 days saved in sequential-bound tests
Efficiency: Cuts idea-to-decision cycles nearly in half, expanding experimentation bandwidth.
Rigor: Sequential α-spending ensures Type-I error remains controlled even with variance reduction.

6. Current Limitations During Transition

Backend-Only Computation: CUPED adjustments, dynamic outlier removal, and sequential bounds are computed in the stats-engine backend. Raw exports alone cannot replicate official results.
BI Reporting Outlier Definition Mismatch:

Test Results Dashboard: Outliers trimmed using initial cart value at checkout start.
Deep A/B Test Analysis (legacy): Outliers trimmed using final order subtotal. This difference can cause inconsistent numbers between views. Recommendation: Use the Test Results Dashboard for authoritative CUPED figures.

Deep A/B Test Analysis “Slice and Dice” Tool Limitations: Now positioned for exploratory slicing/filtering. No statistical significance is provided; it cannot replicate the main CUPED model.
You Might See High Significance in ARPC but Not in Its Components: Two small, individually non-sig shifts (Δp, Δm) can combine (plus their covariance) to make Δ(ARPC)) statistically significant, especially when ARPC’s variance is reduced via CUPED.

ARPC=p×m

where p = conversion rate,m = avg order value (among converters):

Δ(ARPC)=p1m1−p0m0≈mΔp+pΔm+ΔpΔm

Even if conversion rate (p) and order value (m) individually don’t show significance, their combined effect on ARPC might. Especially after variance reduction.

7. Future Direction

CUPED V2: Multi-covariate CUPED to capture more variance.

CUPED isn’t about changing your results, it’s about revealing them faster and with greater clarity. PDQ’s adaptive, dynamic-outlier approach ensures each shop gets the optimal setup for maximum variance reduction and minimal bias.

For merchants running multiple tests per quarter, the time savings can be game-changing - freeing up capacity to validate more ideas, faster.

Next step: If you’d like CUPED enabled for your upcoming tests or want to review your shop’s CUPED configuration, reach out to your PDQ CSM.