Descriptive Statistics Batch
StdDev
14.14
Range
40
How it works
Descriptive statistics summarize the central tendency, dispersion, and shape of a numeric dataset. Running a batch analysis on multiple datasets simultaneously — pasting several columns at once — enables rapid comparison of distributions without writing code in Python, R, or a spreadsheet.
**Central tendency measures** Mean: sum/count — affected by outliers. Median: middle value (P50) — robust to outliers. Mode: most frequent value — useful for discrete data. Geometric mean: nth root of the product — appropriate for growth rates, ratios, and log-normally distributed data. Trimmed mean (10% trim): discard the top and bottom 10% of values, then average — robust estimation for slightly contaminated data.
**Dispersion measures** Variance: average squared deviation from the mean. Standard deviation (SD): square root of variance — in the same units as data. Coefficient of variation (CV): SD/mean — normalized dispersion, allows comparison across variables with different scales. Range: max−min. IQR: P75−P25 (robust range).
**Shape measures** Skewness: asymmetry. Positive skew: right tail is longer (income data). Negative skew: left tail is longer (exam scores near maximum). Kurtosis: tail heaviness vs. normal distribution. Excess kurtosis > 0 (leptokurtic): more extreme values than a normal distribution (financial returns). Excess kurtosis < 0 (platykurtic): fewer extreme values.
Frequently Asked Questions
- Standard deviation (SD) measures spread in the data — how much individual values vary from the mean. It describes the distribution of your sample. Standard error of the mean (SEM) = SD / √n — it measures precision of the sample mean as an estimate of the population mean. SEM shrinks with larger samples; SD does not. Use SD when describing the variability of individual observations; use SEM (or confidence intervals) when reporting the precision of an estimated mean.
- Use geometric mean for ratios, growth rates, and log-normally distributed data. If a portfolio grew 50% one year and fell 33% the next: arithmetic mean = (50% − 33%) / 2 = 8.5% per year, but actual growth = 1.5 × 0.67 = 1.005 (0.5% total, not 17%). Geometric mean = √(1.5 × 0.67) ≈ 1.0025 per year — correctly reflects the actual compound growth rate. For any data derived from multiplicative processes (interest rates, concentration ratios, speed), geometric mean is appropriate.
- Excess kurtosis > 0 (leptokurtic): the distribution has heavier tails and a sharper peak than a normal distribution — more extreme values occur than expected. Financial return distributions are famously leptokurtic (fat tails), which is why models assuming normality underestimate crash risk. Excess kurtosis < 0 (platykurtic): fewer extreme values, flatter peak. Excess kurtosis = 0: normal distribution. For risk analysis, high positive kurtosis means your worst-case scenarios occur more frequently than a normal model suggests.
- Missing values (empty cells, NA, null) should be excluded from count and from all statistics — they represent 'no measurement,' not a value of zero. Report the count of valid (non-null) values separately from total rows so readers can see the missing data rate. Columns with >20% missing values warrant investigation before analysis — is the missingness random (MCAR) or systematic (MAR/MNAR)? Systematic missingness can bias statistics.