Zero to per-column running stats on a real CSV. Real prompts, real output.
$ brew install gawk # macOS $ sudo apt install gawk # debian/ubuntu
$ curl -sL https://raw.githubusercontent.com/timm/awk/master/dotcols/dotcols -o dotcols $ chmod +x dotcols $ ./dotcols --demo stats | head -3 column n mid spread AGE 303 54.366 9.082 sex 303 male 0.624
One file. Bundles dot's runtime + Num/Sym/Data + the bundled stats demo with its sample CSV.
$ ./dotcols --get-data fetching 10 classify -> data/classify/ iris wine ... fetching 10 regression -> data/regression/ ... done. 30 files in ./data/
$ cat > tour1.awk <<'EOF' BEGIN { N = new("num"); S = new("sym") printf "%-12s %5s %12s %12s\n", "column", "n", "mid", "spread" } { add(N, $1, 1); add(S, $2, 1) } END { printf "%-12s %5d %12.3f %12.3f\n", "AGE", .N.n, mid(N), var(N) printf "%-12s %5d %12s %12.3f\n", "color", .S.n, mid(S), var(S) } EOF $ printf "10 red\n20 blue\n30 blue\n40 red\n50 blue\n" | ./dotcols tour1.awk column n mid spread AGE 5 30.000 15.811 color 5 blue 0.673
Same add, mid, var calls dispatch to num_* or sym_* via the .it.is tag set by new(). The output shape matches the bundled stats demo — same column / n / mid / spread layout.
$ ./dotcols --demo stats data/classify/iris.csv column n mid spread SEPALLENGTH 150 5.843 0.828 SEPALWIDTH 150 3.054 0.434 PETALLENGTH 150 3.759 1.764 PETALWIDTH 150 1.199 0.763 class 150 Iris-setosa 1.099
UPPER columns get Num (mean + stdev). lowercase get Sym (mode + entropy). One pass, O(1) per row. Same shape as step 4 — just doing it for every column at once.
$ cat > tour2.awk <<'EOF' BEGIN { D = new("data"); FS = " *, *" } NR==1 { data_head(D, $0); next } { data_read(D, 1) } END { printf "rows=%d cols=%d ykind=%s\n\n", .D.nrows, .D.nc, .D.ykind printf "%-3s %-14s %-3s %-3s\n", "i", "name", "kind", "y?" for (i=1; i<=.D.nc; i++) printf "%-3d %-14s %-3s %-3s\n", i, .D.hdr[i], (.D.nump[i] ? "num" : "sym"), ((i in .D.y) ? "y" : "-") } EOF $ ./dotcols tour2.awk data/regression/housing.csv rows=506 cols=14 ykind=num i name kind y? 1 CRIM num - 2 ZN num - 3 INDUS num - 4 CHAS num - 5 NOX num - 6 RM num - 7 AGE num - 8 DIS num - 9 RAD num - 10 TAX num - 11 PTRATIO num - 12 B num - 13 LSTAT num - 14 MEDV+ num y
data_head parses the header (sigils too — + means y-goal to maximize); each subsequent data_read feeds one row's cells into the right column object. After ingest, .D.cols[i] is a Num or Sym you can call mid()/var() on directly.
Example — stats.awk walkthrough with code.
Manual — Num, Sym, Data API; dispatch; conventions.
Tests — annotated test suite, doubles as 17 mini usage snippets.
dotlearn — ML on top: trees, naive Bayes, active learning.