T-Test Two Sample (Simple)
p-value
0.315
How it works
The two-sample t-test (Student's t-test) tests whether the means of two independent groups are statistically significantly different. It is the foundational test for A/B experiment analysis, clinical trial comparisons, and before/after measurements on separate groups.
**When to use a two-sample t-test** Independent samples (not paired — use paired t-test for before/after on the same subjects). Continuous numeric outcome variable. Groups are approximately normally distributed (or n > 30 per group, where the Central Limit Theorem applies). Reasonably equal variances (Levene's test; if unequal, use Welch's t-test — this tool uses Welch's by default as it is more robust).
**Interpreting results** t-statistic: standardized difference between means, in units of standard error. p-value: probability of observing a difference this large (or larger) if the null hypothesis is true (means are equal). p < 0.05 is the conventional threshold for rejecting the null (though the choice of threshold is a design decision, not a universal law). Effect size (Cohen's d): (mean1 − mean2) / pooled_std — d=0.2 is small, d=0.5 is medium, d=0.8 is large. Statistical significance ≠ practical significance: with large samples, trivially small differences become statistically significant.
**Common mistakes** Running multiple t-tests on the same dataset without p-value correction (multiple comparisons inflate the false positive rate — use Bonferroni or Benjamini-Hochberg correction). Confusing one-tailed and two-tailed p-values (two-tailed is appropriate unless the direction of difference is pre-specified before seeing data).
Frequently Asked Questions
- A p-value of 0.03 means: assuming the null hypothesis is true (the two group means are equal), there is a 3% probability of observing a difference at least as large as this one purely by chance. It does NOT mean 'there is a 97% probability the groups are truly different,' or 'the alternative hypothesis is 97% likely.' P-values quantify evidence against the null hypothesis, not probability that the alternative is true.
- Use a paired t-test when the same subjects (or matched pairs) are measured under both conditions: before-and-after measurements on the same patients, the same users seeing version A then version B, matched case-control studies. Pairing removes between-subject variability, increasing statistical power. A two-sample t-test is appropriate only when the two groups are genuinely independent — different subjects with no pairing relationship.
- Cohen's d = (mean1 − mean2) / pooled_std is the effect size — a standardized measure of how large the difference is in real terms. d=0.2: small, d=0.5: medium, d=0.8: large. With large samples (n>10,000), even d=0.01 (trivially small) becomes statistically significant (p<0.05). Conversely, a meaningful effect (d=0.5) in a small sample (n=10) may not be statistically significant. Always report both p-value and effect size.
- Independence: observations within and between groups are independent. Normality: each group is approximately normally distributed (relaxed by CLT for n>30). Equal variances (Welch's t-test relaxes this — this tool uses Welch's by default). Continuous outcome variable. Violations: independence violations (correlated samples, repeated measures) are the most serious. For small samples from non-normal distributions, use the Mann-Whitney U test (non-parametric alternative).