Skip to main content
The experiment detail page shows results as soon as the test starts collecting data. This page explains each section, how to read the statistics honestly, and how to end the test by declaring a winner or stopping it. All result sections appear once the experiment is past Draft. Until then there is nothing to measure.

How much data you have

Before any lift or confidence number means anything, both variants need enough enrollments. The dashboard tracks three states:
StateConditionWhat you see
No dataNo enrollments yetMetrics show a dash; “results will appear here as users enroll”
Low dataFewer than 30 enrollments in either variantCounts may show, but lift and confidence are suppressed
SufficientAt least 30 enrollments in both variantsLift and confidence are shown
Thirty enrollments per variant is the floor below which the underlying statistical test is unreliable, so the dashboard refuses to show a confidence number until both variants clear it.

Results summary

For each variant the Results Summary card shows:
  • Users Enrolled, how many users were assigned to this variant. Assignment is sticky per user: each user is bucketed once from a stable per-install id and stays in that variant (see Variables and SDK context).
  • Completions, how many of them finished the flow.
  • Completion Rate, completions divided by enrollments.
  • The variant’s traffic share.
Once you have sufficient data, a banner shows the lift: how much higher (or lower) the challenger’s completion rate is compared to the control, as a relative percentage. Lift is the challenger rate minus the control rate, divided by the control rate. A positive lift means the challenger is doing better; a negative lift means the control is ahead.

Statistical confidence

The Statistical Confidence card compares your current confidence to the target you set (90%, 95%, or 99%). FlowPilot computes confidence with a two-proportion z-test (pooled, two-tailed) comparing the challenger’s completion rate to the control’s. The reported confidence is 1 minus the two-tailed p-value. An experiment is treated as significant only when both of these hold:
  1. Each variant has at least 30 enrollments.
  2. Confidence has reached your target.
The card also shows an estimated days to significance: a projection of how much longer the test needs at its current effect size and enrollment rate (assuming 80% statistical power). Treat it as a rough guide, not a promise. It is undefined when there is no measurable difference yet or no enrollments, and it disappears once the test is already significant.
The card phrases confidence as “X% probability that Variant B truly outperforms control.” That is a plain-language simplification of a frequentist test, not a Bayesian probability. Read it as “the test is X% confident the difference is real, not noise.” The honest takeaway is the same: do not act until confidence reaches your target and both variants have enough data.
The card surfaces a recommendation that tracks the data:
  • Below the data floor: “Awaiting first enrollments” or “Need at least 30 enrollments per variant”.
  • Climbing but not there yet: “Keep collecting data for reliable results”, and near the end “Approaching significance, recommend waiting”.
  • At or above target: “High confidence achieved, safe to declare winner”.

Performance chart

The Performance chart plots each variant’s cumulative completion rate over time. Cumulative (rather than per-day) rates are used because daily rates jitter heavily early on; the cumulative lines settle as data accumulates and make the trend easy to read.

Funnel comparison

The Funnel Comparison card puts the two variants’ per-screen funnels side by side. For each screen it shows the pass-through rate (the share of starters who reached that screen) and the drop-off at that step. This tells you not just which variant wins overall, but where one variant loses or keeps users that the other does not.

Make a decision

While an experiment is running or paused, the Make a Decision card offers three choices.

Declare Winner

End the test and apply the winning variant’s flow as the placement default.

Stop Test

End without a winner. The placement reverts to its default flow.

Keep Running

Continue collecting data. The card estimates how much longer you need.
Declare Winner is disabled until you have sufficient data, there is a clear leader, and confidence has reached about 90% of your target. It is marked Recommended only once confidence reaches the target in full. If you declare while still under target, the card warns that the conclusion may be wrong.

Declaring a winner

Choosing Declare Winner opens a dialog where you pick which variant won. It restates what happens next:
  • The experiment ends immediately and is marked Completed · Winner.
  • The winning flow becomes the default at the placement.
  • All users see the winning flow from then on.
  • Results are preserved for reference.
If your current confidence is below target, the dialog shows a Low Statistical Confidence warning before you confirm.

Stopping without a winner

Choosing Stop Test ends the experiment with no winner. The placement reverts to its default flow (not to any variant). This is irreversible: you cannot resume or collect more data, though the results are kept. See Running an experiment for how Stop differs from Pause.

After completion

A completed experiment shows its final state in the status badge:
  • Completed · Winner when you declared a winner. The winning flow is now the placement default.
  • Completed · No winner when you stopped the test (or it ended without one declared).

Common mistakes

  • Peeking and stopping early. Looking at results repeatedly and stopping the moment a variant looks ahead inflates false positives. Wait for your confidence target and an adequate sample.
  • Declaring on lift alone. A big lift with low confidence is often noise. Confidence, not lift, tells you whether the difference is real.
  • Comparing variants with very unequal traffic. A lopsided split (for example 90/10) makes the smaller arm take much longer to reach significance. A 50/50 split is fastest for most tests.
  • Trusting the days-to-significance estimate as a deadline. It is a projection from current data and shifts as the effect size and traffic change.