Annotated tests

17 smoke tests for dotcols. Verifies dot's runtime is bundled correctly, plus exercises Num, Sym, Data, dispatch, and the bundled stats demo.

cd dotcols
./tests/test.sh
# All 17 tests passed.

1. Build & help

contains "build.sh runs"     "built dotcols" "$out"
contains "no args -> usage"  "Usage:"        "$($DC 2>&1)"
contains "--help has --demo"  "--demo"       "$($DC --help)"
contains "--help has --get-data" "--get-data" "$($DC --help)"
contains "--demos lists stats" "stats"       "$($DC --demos)"

Asserts: binary builds; CLI dispatcher is wired (every -- branch reachable); --demos auto-discovers demos/stats/.

2. Inherits dot's runtime

dotcols's binary embeds dot's lib/*.awk. These tests prove it.

eq "new + .field still works"  "7"  "$(run_inline 'BEGIN{N=new("plain"); .N.n=7; printf "%d\n", .N.n}')"
eq "o(5.0) still collapses"    "5"  "$(run_inline 'BEGIN{o(5.0); print ""}')"

Asserts: new(), .field sugar, and the printer o all work inside dotcols. If the build forgot to include dot's libs, these fail with "function not defined".

3. Num type — Welford running mean + stdev

eq "num via add()" "30 15.811" "$(run_inline '
  BEGIN { N = new("num")
          add(N,10,1); add(N,20,1); add(N,30,1); add(N,40,1); add(N,50,1)
          printf "%d %.3f\n", mid(N), var(N) }')"

eq "num mid alias" "30"        "$(run_inline '
  BEGIN { N = new("num")
          for (i=1;i<=5;i++) add(N,i*10,1)
          printf "%d\n", mid(N) }')"

Asserts:

4. Sym type — counts table, mode, entropy

eq "sym mode"      "b"   "$(run_inline 'BEGIN{S=new("sym"); add(S,"a",1); add(S,"b",1); add(S,"b",1); printf "%s\n", mid(S)}')"
eq "sym entropy>0" "yes" "$(run_inline 'BEGIN{S=new("sym"); add(S,"a",1); add(S,"b",1); printf "%s\n", (var(S)>0?"yes":"no")}')"

Asserts: Sym's mode picks the most-frequent key (b appeared twice); entropy is positive when the distribution isn't degenerate. Same dispatch shape as Num — same add / mid / var calls, different per-type implementations.

5. Data type — CSV ingest

eq "data ingests rows" "303" "$(run_inline '
  BEGIN { d = new("data"); FS = " *, *" }
  NR==1 { data_head(d, $0); next }
        { data_read(d, 1) }
  END   { printf "%d\n", .d.nrows }
  ' demos/stats/sample.csv)"

Asserts: Data parses the heart-disease CSV header (14 columns, classify num!) and ingests all 303 data rows. .d.nrows is updated in-place via data_read. Any cell going to a wrong-typed column would crash earlier.

6. The bundled stats demo end-to-end

out=$($DC --demo stats)
contains "stats demo header"   "column"    "$out"
contains "stats demo body"     "AGE"       "$out"
contains "stats demo bottom"   "num!"      "$out"

Asserts: the bundled demo produces the expected per-column summary. First line has "column" header, body includes "AGE" (an UPPER numeric column → mean+stdev), bottom has "num!" (the classification target). End-to-end smoke for: dispatcher → build_gawk_args → preprocess + load → gawk runs over sample.csv → output.

7. Errors

out=$($DC --demo nonesuch 2>&1 || true)
contains "missing demo errors" "no such demo" "$out"

Asserts: bad demo name fails with a readable error, not a gawk crash.

What this proves

If you change anything in lib/numsym.awk or lib/data.awk, run ./tests/test.sh. Two minutes round-trip.