Data & AnalyticsLive🔒 Private

Sample Size Estimator

Estimate required sample size for statistical significance. Free online sample size calculator. No signup, 100% private, browser-based.

Sample Size Estimator

Sample size

385

How it works

Sample size calculation determines the minimum number of observations needed to detect a specified effect with a given statistical power and significance level. Performing this calculation before an experiment prevents two costly failure modes: underpowering (missing a real effect) and overpowering (wasting resources detecting trivially small effects).

**Key parameters** Significance level (α): the probability of a false positive (incorrectly rejecting the null). Typically 0.05 (5%). Power (1−β): the probability of detecting a true effect. Typically 0.80 (80%) or 0.90. Effect size: the minimum meaningful difference to detect — must be defined by domain knowledge, not data. Smaller effect sizes require larger samples.

**Formula for two-sample t-test** n per group = 2 × [(z_α/2 + z_β) / δ]² × σ², where δ is the expected mean difference and σ is the estimated standard deviation. For α=0.05, power=0.80: (1.96 + 0.842)² = 7.85. To detect a 5-unit difference (σ=15): n = 2 × 7.85 × (15/5)² = 2 × 7.85 × 9 ≈ 141 per group.

**Practical considerations** Always inflate the calculated n by 10–20% to account for dropouts, missing data, and protocol deviations. For proportions (A/B testing conversion rates), use the two-proportion z-test formula: n = 2p̄(1−p̄)(z_α/2+z_β)² / (p1−p2)², where p̄ is the pooled proportion. Effect size should be specified before seeing data — using observed data to set the effect size inflates the false positive rate.

Frequently Asked Questions

What is statistical power and why is 80% a common threshold?
Statistical power (1 − β) is the probability of correctly detecting a true effect when it exists. With 80% power: if the true effect size is what you specified, there is an 80% chance your test will find it statistically significant. The 20% failure rate (β = 0.20, Type II error) is a conventional compromise between sample size cost and detection reliability. High-stakes research (clinical trials) often uses 90% power. Exploratory research might accept 70%. Power < 60% means most real effects will be missed.
How do I specify the effect size if I don't know it in advance?
Three approaches: Prior research — use effect sizes from published studies on similar interventions. Minimum meaningful effect — define the smallest difference that would matter in practice (e.g., 'a 2% absolute increase in conversion rate is our minimum for a business decision'). Pilot study — run a small-scale study to estimate the effect size before the main study. Do NOT use your observed pilot data to set the effect size for the main study power calculation — this inflates the false positive rate.
What happens if I run an underpowered study?
An underpowered study that finds p < 0.05 is actually more likely to be a false positive than a well-powered study. The effect sizes reported by underpowered significant studies are systematically inflated ('winner's curse' — only the largest random fluctuations cross the significance threshold). This is a major driver of the replication crisis in psychology and medicine: many small underpowered studies with inflated effect sizes failed to replicate in larger well-powered studies.
How does the significance level (α) affect required sample size?
Lower α (stricter threshold, e.g., α = 0.01 instead of 0.05) requires a larger sample to achieve the same power. Using α = 0.01, 80% power, effect size d = 0.5: n ≈ 58 per group. Using α = 0.05: n ≈ 51 per group. The relationship is through the z critical value: z_{0.025} = 1.96 (two-tailed α=0.05), z_{0.005} = 2.576 (two-tailed α=0.01). For multiple comparisons, Bonferroni correction requires α_per_test = α_family / k, which substantially increases required sample size.