Skip to main content

Controlled Pre-Experiment Data (CUPED) and how we use it

How PDQ sets parameters & prepares data for tests

Haim avatar
Written by Haim
Updated over 3 weeks ago

In A/B testing, speed and accuracy are a balancing act. Wait too long, and you lose momentum; move too fast, and you risk chasing false positives. PDQ’s CUPED implementation tilts the balance in your favor.

CUPED, short for Controlled Pre-Experiment Data, is a proven statistical technique that reduces variance in your results by factoring in visitor behavior before they’re exposed to the test. At PDQ, we’ve taken CUPED further by adding a dynamic, per-shop optimization layer that maximizes variance reduction while avoiding overfitting. The result is cleaner insights in less time, helping you iterate quickly without cutting statistical corners.

2. PDQ’s Unique “Dynamic-Outlier” Mode

Unlike standard CUPED implementations, which typically apply a fixed covariate and static trimming rules, PDQ optimizes CUPED configuration per shop using dynamic outlier removal:

1. Pre-Experiment Window: 60 days of recent pre-test data.

2. Outlier-Removal Options Tested Per Shop:

  • Trim at the 99th percentile (TP 99%)

  • Trim at the 99.9th percentile (TP 99.9%)

  • Remove values beyond 3 standard deviations (3SD)

3. Rule Selection: Choose the rule yielding the highest correlation (𝜃) between pre-experiment covariate and outcome, subject to guardrails.

4. Qualification: Apply CUPED only if (𝜃) > 10%.

5. Lock-In: Store selected rule and 𝜃 in a configuration table before test start for reproducibility.

This adaptive approach maximizes variance reduction while minimizing the risk of overfitting or unstable adjustments.


3. Pre-Experiment Covariate at PDQ

Initial checkout subtotal before discounts

  • Definition: Checkout subtotal before discounts, captured at the very first moment the checkout loads, when Shopify sends the initial payload containing the cart value.

  • Why it’s valid: This is recorded before any exposure to the A/B test, ensuring it is a true pre-treatment measurement.

  • Unique PDQ advantage: Because it’s collected split-seconds before treatment assignment, this covariate is available for both first-time shoppers and returning customers, a capability many CUPED setups lack. This means the model benefits from a richer data source that applies to 100% of traffic, enhancing statistical power.

  • Why it’s powerful: Strongly correlated with ARPC (Average Revenue Per Checkout) for converters.

  • Advantages:

    • True pre-treatment measurement.

    • High-resolution, user-level granularity.

    • Not restricted to returning user cohorts.

  • Challenges:

    • Zero inflation: Many sessions have a recorded pre-experiment subtotal (X) but zero post-experiment ARPC (Y) due to abandonment.

    • Mixture distribution: Population consists of two distinct subgroups — converters (positive spend) and non-converters (zero spend) — producing a nonlinear X–Y relationship beyond zero inflation alone.

4. CUPED Adjustment Formula

Yadj = Y -X-EX

Where: Y = Post-experiment ARPC

X = Pre-experiment covariate (initial checkout subtotal before discounts)

= Cov(Y, X) / Var(X)

The parameter quantifies the fraction of variance removed. Shops can request this from their CSM; excluding outliers can sometimes double this value, indicating substantially higher noise control.


5. Business Impact at PDQ

  • Speed: Up to ~15 days saved in sequential-bound tests

  • Efficiency: Cuts idea-to-decision cycles nearly in half, expanding experimentation bandwidth.

  • Rigor: Sequential α-spending ensures Type-I error remains controlled even with variance reduction.


6. Current Limitations During Transition

  1. Backend-Only Computation: CUPED adjustments, dynamic outlier removal, and sequential bounds are computed in the stats-engine backend. Raw exports alone cannot replicate official results.

  2. BI Reporting Outlier Definition Mismatch:

  • Test Results Dashboard: Outliers trimmed using initial cart value at checkout start.

  • Deep A/B Test Analysis (legacy): Outliers trimmed using final order subtotal. This difference can cause inconsistent numbers between views. Recommendation: Use the Test Results Dashboard for authoritative CUPED figures.

  1. Deep A/B Test Analysis “Slice and Dice” Tool Limitations: Now positioned for exploratory slicing/filtering. No statistical significance is provided; it cannot replicate the main CUPED model.

  2. You Might See High Significance in ARPC but Not in Its Components: Two small, individually non-sig shifts (Δp, Δm) can combine (plus their covariance) to make Δ(ARPC)) statistically significant, especially when ARPC’s variance is reduced via CUPED.

ARPC=p×m

where p = conversion rate,m = avg order value (among converters):

Δ(ARPC)=p1​m1​−p0​m0​≈mΔp+pΔm+ΔpΔm

Even if conversion rate (p) and order value (m) individually don’t show significance, their combined effect on ARPC might. Especially after variance reduction.


7. Future Direction

  • CUPED V2: Multi-covariate CUPED to capture more variance.

CUPED isn’t about changing your results, it’s about revealing them faster and with greater clarity. PDQ’s adaptive, dynamic-outlier approach ensures each shop gets the optimal setup for maximum variance reduction and minimal bias.

For merchants running multiple tests per quarter, the time savings can be game-changing - freeing up capacity to validate more ideas, faster.

Next step: If you’d like CUPED enabled for your upcoming tests or want to review your shop’s CUPED configuration, reach out to your PDQ CSM.

Did this answer your question?