Data & AnalyticsLive🔒 Private

T-Test Two Sample (Simple)

Run a two-sample t-test on two data sets. Free online t-test calculator. No signup, 100% private, browser-based.

T-Test Two Sample (Simple)

p-value

0.315

How it works

The two-sample t-test (Student's t-test) tests whether the means of two independent groups are statistically significantly different. It is the foundational test for A/B experiment analysis, clinical trial comparisons, and before/after measurements on separate groups.

**When to use a two-sample t-test** Independent samples (not paired — use paired t-test for before/after on the same subjects). Continuous numeric outcome variable. Groups are approximately normally distributed (or n > 30 per group, where the Central Limit Theorem applies). Reasonably equal variances (Levene's test; if unequal, use Welch's t-test — this tool uses Welch's by default as it is more robust).

**Interpreting results** t-statistic: standardized difference between means, in units of standard error. p-value: probability of observing a difference this large (or larger) if the null hypothesis is true (means are equal). p < 0.05 is the conventional threshold for rejecting the null (though the choice of threshold is a design decision, not a universal law). Effect size (Cohen's d): (mean1 − mean2) / pooled_std — d=0.2 is small, d=0.5 is medium, d=0.8 is large. Statistical significance ≠ practical significance: with large samples, trivially small differences become statistically significant.

**Common mistakes** Running multiple t-tests on the same dataset without p-value correction (multiple comparisons inflate the false positive rate — use Bonferroni or Benjamini-Hochberg correction). Confusing one-tailed and two-tailed p-values (two-tailed is appropriate unless the direction of difference is pre-specified before seeing data).

Frequently Asked Questions

What does a p-value of 0.03 actually mean?
A p-value of 0.03 means: assuming the null hypothesis is true (the two group means are equal), there is a 3% probability of observing a difference at least as large as this one purely by chance. It does NOT mean 'there is a 97% probability the groups are truly different,' or 'the alternative hypothesis is 97% likely.' P-values quantify evidence against the null hypothesis, not probability that the alternative is true.
When should I use a paired t-test instead of a two-sample t-test?
Use a paired t-test when the same subjects (or matched pairs) are measured under both conditions: before-and-after measurements on the same patients, the same users seeing version A then version B, matched case-control studies. Pairing removes between-subject variability, increasing statistical power. A two-sample t-test is appropriate only when the two groups are genuinely independent — different subjects with no pairing relationship.
What is Cohen's d and when does statistical significance not imply practical significance?
Cohen's d = (mean1 − mean2) / pooled_std is the effect size — a standardized measure of how large the difference is in real terms. d=0.2: small, d=0.5: medium, d=0.8: large. With large samples (n>10,000), even d=0.01 (trivially small) becomes statistically significant (p<0.05). Conversely, a meaningful effect (d=0.5) in a small sample (n=10) may not be statistically significant. Always report both p-value and effect size.
What assumptions does the two-sample t-test make?
Independence: observations within and between groups are independent. Normality: each group is approximately normally distributed (relaxed by CLT for n>30). Equal variances (Welch's t-test relaxes this — this tool uses Welch's by default). Continuous outcome variable. Violations: independence violations (correlated samples, repeated measures) are the most serious. For small samples from non-normal distributions, use the Mann-Whitney U test (non-parametric alternative).