Bases de donnéesConnectez n'importe quelle base de données et analysez vos données instantanément·FichiersImportez des fichiers CSV ou Excel et explorez-les avec l'IA·ChatPosez vos questions en langage naturel — dialoguez avec vos données·Tableaux de bordCréez des dashboards interactifs à partir de vos requêtes en quelques secondes·IALaissez l'IA écrire le SQL à votre place·GraphiquesVisualisez les tendances avec des graphiques générés automatiquement·No-codeAucune connaissance SQL requise — demandez simplement en français·PartagePartagez vos dashboards en direct avec votre équipe en un clic·InsightsDétectez automatiquement les tendances et anomalies cachées dans vos données·ExportsTéléchargez vos résultats en CSV, Excel ou PNG instantanément·Bases de donnéesConnectez n'importe quelle base de données et analysez vos données instantanément·FichiersImportez des fichiers CSV ou Excel et explorez-les avec l'IA·ChatPosez vos questions en langage naturel — dialoguez avec vos données·Tableaux de bordCréez des dashboards interactifs à partir de vos requêtes en quelques secondes·IALaissez l'IA écrire le SQL à votre place·GraphiquesVisualisez les tendances avec des graphiques générés automatiquement·No-codeAucune connaissance SQL requise — demandez simplement en français·PartagePartagez vos dashboards en direct avec votre équipe en un clic·InsightsDétectez automatiquement les tendances et anomalies cachées dans vos données·ExportsTéléchargez vos résultats en CSV, Excel ou PNG instantanément·

Blog›Product

A/B Test Results That Tell You What to Ship

Declaring a test winner based on a raw conversion rate difference without checking sample size, segment effects, or guardrail metrics is how teams ship regressions they celebrate as wins.

Try AnalityQa free →See live examples

Laptop displaying code and data charts in a bright workspace

The problem

→Most teams look at the headline metric lift and declare a winner without calculating whether the result is statistically significant at their actual sample size.
→Segment-level effects — where a variant wins for one user group and loses for another — are invisible in aggregate results and only surface when someone slices the data manually.
→Guardrail metrics that should not move — page load time, support ticket volume, error rates — are rarely checked in the same analysis pass as the primary metric, so regressions go undetected until they show up in customer complaints.
→Experiment data from feature flag tools like LaunchDarkly or Optimizely lives separately from product event data and business outcome data, making a complete analysis require joins across three systems.

Why the usual approach breaks down

Significance testing is easy to get wrong and hard to explain

Choosing between a t-test, a z-test, a chi-squared test, or a Mann-Whitney test depends on the metric type and distribution. Using the wrong test produces a p-value that looks meaningful but is not. Explaining the result to a non-technical stakeholder adds another layer of difficulty that often leads teams to skip the explanation entirely.

Segment-level analysis multiplies the number of comparisons and the chance of false positives

Running significance tests on ten segments without a multiple-comparison correction inflates the chance of a false positive substantially. Most ad-hoc segment analyses in spreadsheets skip this correction, leading to confident but wrong conclusions about which segments the variant helps.

Experiment data is siloed from the metrics it is supposed to move

LaunchDarkly or your feature flag system knows who was in which variant. Your product database knows what those users did. Your data warehouse or billing system knows the revenue impact. Joining these three sources to get a complete picture requires either a pre-built data pipeline or a data team engagement.

Guardrail metric checks are skipped because they require extra queries

Checking whether the variant degraded page load time, error rates, or churn alongside the primary metric means running additional queries on additional tables. Under time pressure, this step is routinely skipped — until a shipped variant is discovered to have caused a silent regression.

How AnalityQa solves it

Upload your data — or connect it live — and ask in plain English.

Upload your experiment data and get a statistically rigorous result in one query

Upload a CSV with variant assignment and metric outcomes, or connect your product database. AnalityQa selects the appropriate test for your metric type, calculates the lift with confidence intervals, and states plainly whether the result is significant at your chosen threshold.

Segment-level effect heatmap with multiple-comparison correction

Ask for a segment breakdown by plan tier, acquisition channel, or device type and AnalityQa applies a Bonferroni or Benjamini-Hochberg correction automatically, so the segments flagged as significant are actually worth acting on.

Guardrail metric check in the same pass as the primary metric

Specify your guardrail metrics — latency, error rate, unsubscribe rate — and AnalityQa evaluates them alongside the primary metric in one session. Any guardrail that moves significantly in the wrong direction is flagged before you ship.

Auto-join experiment assignments to product and business outcome data

Upload your variant assignment file and your event or revenue data separately. AnalityQa joins them on user ID and makes the merged dataset available for all queries — primary metric, guardrails, and segment analysis — without any manual data preparation.

Plain-English results summary for non-technical stakeholders

After the statistical analysis, ask for a summary you can paste into a product review. AnalityQa writes a concise, jargon-free recommendation stating the lift, the confidence level, the segment findings, and the guardrail status — no translation required.

You askedGenerated in 4.2s

"Calculate the lift in 7-day conversion rate for variant B vs. control, with 95% confidence intervals."

MRR

€328k+4.1%

Net retention

112%+3pp

Churn

2.4%−0.6pp

Summary table: conversion rate by variant, absolute lift, relative lift, 95% CI, p-value

Last 12 mo

Heatmap: variant lift by plan tier x device type, with significance markers

Table: guardrail metric check — direction, magnitude, significance, ship/hold flag

A dashboard built in AnalityQa — from question to chart, no SQL.

Real examples

Paste your data. Ask. Ship.

You

Calculate the lift in 7-day conversion rate for variant B vs. control, with 95% confidence intervals.

AnalityQa runs a two-proportion z-test, calculates the absolute and relative lift, computes the 95% CI, and states whether the result clears the significance threshold at your sample size.

Summary table: conversion rate by variant, absolute lift, relative lift, 95% CI, p-value

You

Show me a segment-level effect heatmap for plan tier and device type, with multiple-comparison correction.

It computes the variant effect for each segment combination, applies a Benjamini-Hochberg correction, and renders a heatmap where cells are shaded by effect size and marked significant or not.

Heatmap: variant lift by plan tier x device type, with significance markers

You

Check our guardrail metrics — page error rate, session length, and 30-day churn — for any regressions from the variant.

AnalityQa runs significance tests on each guardrail metric and produces a status table showing the direction of change, the magnitude, and whether it is statistically significant.

Table: guardrail metric check — direction, magnitude, significance, ship/hold flag

You

How long would I need to run the test to detect a 5% relative lift in activation rate with 80% power?

It computes the required sample size based on your baseline activation rate, desired lift, and power threshold, then estimates the days to reach that sample at your current traffic volume.

Sample size and runtime estimate: days to 80% power for 5% relative lift

You

Write a product review summary of the test results — lift, confidence, segment findings, guardrail status.

AnalityQa produces a concise paragraph summarising the headline result, the most notable segment effect, and the guardrail check outcome, written for a non-technical audience.

Text: product review summary paragraph, ready to paste

Note from Alex

The first time I ran an A/B test at a previous company, I declared a winner because the conversion rate was up 4%. A week later someone asked me the sample size — it was 300 users. We had shipped a change that was almost certainly noise. That embarrassment is why I made rigorous significance testing the first thing we built in AnalityQa. Now the tool just refuses to let you call a winner without showing you the confidence interval and the minimum detectable effect at your actual traffic volume. It won't stop you from shipping, but at least you'll know what you're betting on.

— Alex, Co-founder, AnalityQa

What teams get out of it

✓Teams catch false positives from underpowered tests before shipping decisions are made.

✓Segment-level heatmaps with multiple-comparison correction surface actionable heterogeneous effects that aggregate results hide.

✓Guardrail metric checks in every analysis pass prevent silent regressions from reaching production.

✓Plain-English summaries reduce the time from test completion to stakeholder decision from days to the same afternoon.

Frequently asked questions

Which statistical tests does AnalityQa use for A/B test analysis?+

The test is selected based on your metric type. Conversion rates and proportions use a two-proportion z-test or chi-squared test. Continuous metrics like revenue per user or session length use a t-test or Mann-Whitney test depending on the distribution. You can override the selection if your team has a preferred method.

How does it handle multiple testing when I analyse several segments?+

By default, AnalityQa applies a Benjamini-Hochberg false discovery rate correction when you request a segment breakdown with more than three segments. You can also choose Bonferroni correction or no correction if you prefer, with a note in the output about the implications.

Can it detect network effects or interference between variants in experiments with social features?+

AnalityQa does not model network interference automatically, as this requires cluster-randomised experiment designs that depend on your specific product graph. You can upload cluster-level data and it will run appropriate cluster-level tests, but detecting interference from individual-level data is not supported.

How is experiment data containing user IDs handled?+

AnalityQa does not use uploaded data for model training, and supports pseudonymised or hashed user IDs if you prefer not to upload raw identifiers.

Can it analyse experiments where the randomisation unit is not the user — for example, page views or sessions?+

Yes. Specify the randomisation unit when you upload the data and AnalityQa adjusts the variance calculation accordingly. Analysing at a different unit than the randomisation unit — for example, aggregating page-view-randomised data to the user level — inflates false positives, and AnalityQa will flag this if it detects the mismatch.

How do I connect experiment assignment data from LaunchDarkly or Optimizely to my product event data?+

Export your variant assignment log from your feature flag tool and your event or outcome data as separate CSV files, or connect the database tables directly. AnalityQa joins on user ID and makes the merged dataset available for the full analysis — primary metric, segments, and guardrails — in one session.

What plan do I need to run guardrail metric checks alongside primary metric analysis?+

Multi-metric analysis — primary metric plus guardrails in the same session — is available on all plans including the free tier. Scheduled experiment monitoring with automated alerts is available on Pro and Business plans.

Related guides

Product

Know If Your Feature Launch Is Actually Working

Run this analysis on your own data.

Upload a file or connect your database — 100 credits free, no credit card. Your first dashboard ships in under 5 minutes.

Try AnalityQa free →

No credit card required

Blog›Product

A/B Test Results That Tell You What to Ship

Declaring a test winner based on a raw conversion rate difference without checking sample size, segment effects, or guardrail metrics is how teams ship regressions they celebrate as wins.

Try AnalityQa free →See live examples

The problem

→Most teams look at the headline metric lift and declare a winner without calculating whether the result is statistically significant at their actual sample size.
→Segment-level effects — where a variant wins for one user group and loses for another — are invisible in aggregate results and only surface when someone slices the data manually.
→Guardrail metrics that should not move — page load time, support ticket volume, error rates — are rarely checked in the same analysis pass as the primary metric, so regressions go undetected until they show up in customer complaints.
→Experiment data from feature flag tools like LaunchDarkly or Optimizely lives separately from product event data and business outcome data, making a complete analysis require joins across three systems.

Why the usual approach breaks down

Significance testing is easy to get wrong and hard to explain

Segment-level analysis multiplies the number of comparisons and the chance of false positives

Experiment data is siloed from the metrics it is supposed to move

Guardrail metric checks are skipped because they require extra queries

How AnalityQa solves it

Upload your data — or connect it live — and ask in plain English.

Upload your experiment data and get a statistically rigorous result in one query

Segment-level effect heatmap with multiple-comparison correction

Guardrail metric check in the same pass as the primary metric

Auto-join experiment assignments to product and business outcome data

Plain-English results summary for non-technical stakeholders

You askedGenerated in 4.2s

"Calculate the lift in 7-day conversion rate for variant B vs. control, with 95% confidence intervals."

MRR

€328k+4.1%

Net retention

112%+3pp

Churn

2.4%−0.6pp

Summary table: conversion rate by variant, absolute lift, relative lift, 95% CI, p-value

Last 12 mo

Heatmap: variant lift by plan tier x device type, with significance markers

Table: guardrail metric check — direction, magnitude, significance, ship/hold flag

A dashboard built in AnalityQa — from question to chart, no SQL.

Real examples

Paste your data. Ask. Ship.

You

Calculate the lift in 7-day conversion rate for variant B vs. control, with 95% confidence intervals.

AnalityQa runs a two-proportion z-test, calculates the absolute and relative lift, computes the 95% CI, and states whether the result clears the significance threshold at your sample size.

Summary table: conversion rate by variant, absolute lift, relative lift, 95% CI, p-value

You

Show me a segment-level effect heatmap for plan tier and device type, with multiple-comparison correction.

It computes the variant effect for each segment combination, applies a Benjamini-Hochberg correction, and renders a heatmap where cells are shaded by effect size and marked significant or not.

Heatmap: variant lift by plan tier x device type, with significance markers

You

Check our guardrail metrics — page error rate, session length, and 30-day churn — for any regressions from the variant.

AnalityQa runs significance tests on each guardrail metric and produces a status table showing the direction of change, the magnitude, and whether it is statistically significant.

Table: guardrail metric check — direction, magnitude, significance, ship/hold flag

You

How long would I need to run the test to detect a 5% relative lift in activation rate with 80% power?

It computes the required sample size based on your baseline activation rate, desired lift, and power threshold, then estimates the days to reach that sample at your current traffic volume.

Sample size and runtime estimate: days to 80% power for 5% relative lift

You

Write a product review summary of the test results — lift, confidence, segment findings, guardrail status.

AnalityQa produces a concise paragraph summarising the headline result, the most notable segment effect, and the guardrail check outcome, written for a non-technical audience.

Text: product review summary paragraph, ready to paste

Note from Alex

— Alex, Co-founder, AnalityQa

What teams get out of it

✓Teams catch false positives from underpowered tests before shipping decisions are made.

✓Segment-level heatmaps with multiple-comparison correction surface actionable heterogeneous effects that aggregate results hide.

✓Guardrail metric checks in every analysis pass prevent silent regressions from reaching production.

✓Plain-English summaries reduce the time from test completion to stakeholder decision from days to the same afternoon.

Frequently asked questions

Which statistical tests does AnalityQa use for A/B test analysis?+

How does it handle multiple testing when I analyse several segments?+

Can it detect network effects or interference between variants in experiments with social features?+

How is experiment data containing user IDs handled?+

AnalityQa does not use uploaded data for model training, and supports pseudonymised or hashed user IDs if you prefer not to upload raw identifiers.

Can it analyse experiments where the randomisation unit is not the user — for example, page views or sessions?+

How do I connect experiment assignment data from LaunchDarkly or Optimizely to my product event data?+

What plan do I need to run guardrail metric checks alongside primary metric analysis?+

Related guides

Product

Know If Your Feature Launch Is Actually Working

Run this analysis on your own data.

Upload a file or connect your database — 100 credits free, no credit card. Your first dashboard ships in under 5 minutes.

Try AnalityQa free →

No credit card required