Free Sample Size Estimator | Statistics Calculator Online

Sample Size Estimator

Z-scoreMargin of error

Proportion

Sample size

385

🔒 Privacy Protected

Your data is processed locally and never sent to any server.

FAQ

Keywords

sample size estimatordevlocal processingprivacyfree online tool

How it works

Sample size calculation determines the minimum number of observations needed to detect a specified effect with a given statistical power and significance level. Performing this calculation before an experiment prevents two costly failure modes: underpowering (missing a real effect) and overpowering (wasting resources detecting trivially small effects).

**Key parameters** Significance level (α): the probability of a false positive (incorrectly rejecting the null). Typically 0.05 (5%). Power (1−β): the probability of detecting a true effect. Typically 0.80 (80%) or 0.90. Effect size: the minimum meaningful difference to detect — must be defined by domain knowledge, not data. Smaller effect sizes require larger samples.

**Formula for two-sample t-test** n per group = 2 × [(z_α/2 + z_β) / δ]² × σ², where δ is the expected mean difference and σ is the estimated standard deviation. For α=0.05, power=0.80: (1.96 + 0.842)² = 7.85. To detect a 5-unit difference (σ=15): n = 2 × 7.85 × (15/5)² = 2 × 7.85 × 9 ≈ 141 per group.

**Practical considerations** Always inflate the calculated n by 10–20% to account for dropouts, missing data, and protocol deviations. For proportions (A/B testing conversion rates), use the two-proportion z-test formula: n = 2p̄(1−p̄)(z_α/2+z_β)² / (p1−p2)², where p̄ is the pooled proportion. Effect size should be specified before seeing data — using observed data to set the effect size inflates the false positive rate.

Frequently Asked Questions

What is statistical power and why is 80% a common threshold?

Statistical power (1 − β) is the probability of correctly detecting a true effect when it exists. With 80% power: if the true effect size is what you specified, there is an 80% chance your test will find it statistically significant. The 20% failure rate (β = 0.20, Type II error) is a conventional compromise between sample size cost and detection reliability. High-stakes research (clinical trials) often uses 90% power. Exploratory research might accept 70%. Power < 60% means most real effects will be missed.

How do I specify the effect size if I don't know it in advance?

Three approaches: Prior research — use effect sizes from published studies on similar interventions. Minimum meaningful effect — define the smallest difference that would matter in practice (e.g., 'a 2% absolute increase in conversion rate is our minimum for a business decision'). Pilot study — run a small-scale study to estimate the effect size before the main study. Do NOT use your observed pilot data to set the effect size for the main study power calculation — this inflates the false positive rate.

What happens if I run an underpowered study?

An underpowered study that finds p < 0.05 is actually more likely to be a false positive than a well-powered study. The effect sizes reported by underpowered significant studies are systematically inflated ('winner's curse' — only the largest random fluctuations cross the significance threshold). This is a major driver of the replication crisis in psychology and medicine: many small underpowered studies with inflated effect sizes failed to replicate in larger well-powered studies.

How does the significance level (α) affect required sample size?

Lower α (stricter threshold, e.g., α = 0.01 instead of 0.05) requires a larger sample to achieve the same power. Using α = 0.01, 80% power, effect size d = 0.5: n ≈ 58 per group. Using α = 0.05: n ≈ 51 per group. The relationship is through the z critical value: z_{0.025} = 1.96 (two-tailed α=0.05), z_{0.005} = 2.576 (two-tailed α=0.01). For multiple comparisons, Bonferroni correction requires α_per_test = α_family / k, which substantially increases required sample size.