dotcols adds three things on top of dot: Num (running mean + stdev via Welford), Sym (counts table, mode, entropy), and Data (a CSV table that auto-builds one Num or Sym per column).
add(c, x, train) call dispatches to num_add or sym_add via .c.is.dotcols binary bundles all of dot's runtime plus these libs. One file install.dotcols --demo stats data.csv prints per-column summary.# a single CSV ingest using dotcols
function header( i) {
for (i = 1; i <= NF; i++) {
NAME[i] = $i
COL[i] = new($i ~ /^[A-Z]/ ? "num" : "sym") } } # new from dot
function ingest( i) {
for (i = 1; i <= NF; i++) add(COL[i], $i, 1) } # add from dotcols
END {
for (i = 1; i <= length(NAME); i++)
printf "%s\tn=%d\tmid=%s\tspread=%g\n",
NAME[i], .COL[i].n, mid(COL[i]), var(COL[i]) } # mid/var from dotcols
Same add(), mid(), var() calls work whether the column is Num or Sym. Dispatch is one line per op — concat type tag with op name, indirect-call via @fn.
curl -sL https://raw.githubusercontent.com/timm/awk/master/dotcols/dotcols -o dotcols
chmod +x dotcols
./dotcols --get-data # 30 curated CSVs into ./data/
./dotcols --demo stats # bundled stats demo on its sample.csv
Install · Stats walkthrough · Manual
Need ML on top? See dotlearn: trees, naive Bayes, active learning.