How much data you have
Before any lift or confidence number means anything, both variants need enough enrollments. The dashboard tracks three states:| State | Condition | What you see |
|---|---|---|
| No data | No enrollments yet | Metrics show a dash; “results will appear here as users enroll” |
| Low data | Fewer than 30 enrollments in either variant | Counts may show, but lift and confidence are suppressed |
| Sufficient | At least 30 enrollments in both variants | Lift and confidence are shown |
Results summary
For each variant the Results Summary card shows:- Users Enrolled, how many users were assigned to this variant. Assignment is sticky per user: each user is bucketed once from a stable per-install id and stays in that variant (see Variables and SDK context).
- Completions, how many of them finished the flow.
- Completion Rate, completions divided by enrollments.
- The variant’s traffic share.
Statistical confidence
The Statistical Confidence card compares your current confidence to the target you set (90%, 95%, or 99%). FlowPilot computes confidence with a two-proportion z-test (pooled, two-tailed) comparing the challenger’s completion rate to the control’s. The reported confidence is 1 minus the two-tailed p-value. An experiment is treated as significant only when both of these hold:- Each variant has at least 30 enrollments.
- Confidence has reached your target.
- Below the data floor: “Awaiting first enrollments” or “Need at least 30 enrollments per variant”.
- Climbing but not there yet: “Keep collecting data for reliable results”, and near the end “Approaching significance, recommend waiting”.
- At or above target: “High confidence achieved, safe to declare winner”.
Performance chart
The Performance chart plots each variant’s cumulative completion rate over time. Cumulative (rather than per-day) rates are used because daily rates jitter heavily early on; the cumulative lines settle as data accumulates and make the trend easy to read.Funnel comparison
The Funnel Comparison card puts the two variants’ per-screen funnels side by side. For each screen it shows the pass-through rate (the share of starters who reached that screen) and the drop-off at that step. This tells you not just which variant wins overall, but where one variant loses or keeps users that the other does not.Make a decision
While an experiment is running or paused, the Make a Decision card offers three choices.Declare Winner
End the test and apply the winning variant’s flow as the placement default.
Stop Test
End without a winner. The placement reverts to its default flow.
Keep Running
Continue collecting data. The card estimates how much longer you need.
Declaring a winner
Choosing Declare Winner opens a dialog where you pick which variant won. It restates what happens next:- The experiment ends immediately and is marked Completed · Winner.
- The winning flow becomes the default at the placement.
- All users see the winning flow from then on.
- Results are preserved for reference.
Stopping without a winner
Choosing Stop Test ends the experiment with no winner. The placement reverts to its default flow (not to any variant). This is irreversible: you cannot resume or collect more data, though the results are kept. See Running an experiment for how Stop differs from Pause.After completion
A completed experiment shows its final state in the status badge:- Completed · Winner when you declared a winner. The winning flow is now the placement default.
- Completed · No winner when you stopped the test (or it ended without one declared).
Common mistakes
- Peeking and stopping early. Looking at results repeatedly and stopping the moment a variant looks ahead inflates false positives. Wait for your confidence target and an adequate sample.
- Declaring on lift alone. A big lift with low confidence is often noise. Confidence, not lift, tells you whether the difference is real.
- Comparing variants with very unequal traffic. A lopsided split (for example 90/10) makes the smaller arm take much longer to reach significance. A 50/50 split is fastest for most tests.
- Trusting the days-to-significance estimate as a deadline. It is a projection from current data and shifts as the effect size and traffic change.