Manual

The CLI, preprocessor, runtime, helpers, and conventions. (For Num/Sym/Data column types, see dotcols.)

dot — the CLI

One self-contained bash script (~280 lines). Bundles the preprocessor, runtime, helpers, and Num/Sym types as embedded heredocs. No directory, no Makefile.

dot FILE.awk [DATA...]    # run rewritten FILE.awk on DATA (or stdin)
dot a.awk b.awk DATA      # multi-file run; .awk args go through prep
cat DATA | dot FILE.awk   # stdin works too
dot -c FILE.awk           # print rewritten source only
dot --demo NAME [DATA]    # run demos/NAME/*.awk on DATA (or sample.*; pass `-` for stdin)
dot --demos               # list available demos under ./demos/
dot --show                # dump bundled lib/*.awk (post-prep)
dot --help                # full help

Arg dispatch: any *.awk argument is preprocessed and added as a -f source; everything else becomes a data file. The bundled runtime is always loaded first. User .awk files are written to ${TMPDIR}/dot.XXXX/<basename> before gawk -f picks them up — that way error messages preserve the original filename and line number.

Built from the modular sources by build.sh. Edit a source file, run ./build.sh, get a rebuilt dot.

Preprocessor

Two rewrites applied to source before gawk sees it. A "value-char" is anything in [A-Za-z0-9_])]. So it.x rewrites; .x at line start does not.

Source
.it.cols
.d.rows[r][i]
.d
new(.d.ykind)
After prep
it["cols"]
d["rows"][r][i]
HEAP[d]
new(HEAP[d]["ykind"])

Run the preprocessor manually with dot -c file.awk (one file) or dot --show (bundled libs).

new

Allocate an object, dispatch its initializer. new(t) increments NID, sets .it.is = t, and calls t_init if defined. Returns the new ID.

function new(t,    it, fn) {
  it = ++NID;  .it.is = t
  fn = t "_init"
  return (fn in FUNCTAB) ? @fn(it) : it }

Storage lives in HEAP[id]["field"].

arr

Force a value to be an array. gawk creates arrays-of-arrays lazily. arr(x) is the standard idiom — assign to a dummy key, delete it. Now safe to for k in x when later empty.

function arr(x) { x[""] = 0; delete x[""] }

zap

Drop one HEAP slot. Use when the caller is done with an object and wants the memory back. No GC, no refcount — zap is the only memory hook.

function zap(i) { delete HEAP[i] }

Pattern: build a temporary aggregate, extract its scalar, drop it.

function spread(d, rows,    y, v) {
  y = ycol(d, rows); v = var(y); zap(y); return v }

Polymorphic dispatch

Each object carries its type string in .it.is (set by new()). A polymorphic call concatenates that string with the operation name and uses gawk's indirect-call (@fn) syntax.

function add(k, x, train, w,  fn) { fn = .k.is "_add"
                                    return @fn(k, x, train, w) }
function like(k, x, p, m,     fn) { fn = .k.is "_like"
                                    return @fn(k, x, p, m) }
function var(it,    fn)  { fn = .it.is "_var"; return @fn(it) }
function mid(it,    fn)  { fn = .it.is "_mid"; return @fn(it) }

The dispatch pattern (one line builds the name, one line calls it) is the only thing dot itself defines. Concrete types like num and sym with their num_add/sym_add/num_mid/sym_mid functions live in the dotcols layer.

o, _oo — recursive pretty-printers

One entry point. o(x) dispatches on shape: array vs scalar, list vs dict, integer-valued float vs other number, else string. Recursion is automatic — the workhorse _oo calls o() on each element, so nested mixes (list-of-dicts, dict-of-lists, anything) just work.

function o(x) {
  if (isarray(x)) {
    if (1 in x) _oo(x, "[", "]", "@ind_num_asc", 0)
    else        _oo(x, "{", "}", "@ind_str_asc", 1)
  } else if (x ~ /^-?[0-9]+(\.[0-9]+)?([eE][-+]?[0-9]+)?$/) {
    if (x == int(x)) printf "%d", x
    else             printf "%G", x }
  else printf "%s", x }

function _oo(a, lhs, rhs, how, withkey,    n, i, k, sep, sorted) {
  printf "%s", lhs
  n = asorti(a, sorted, how)
  sep = ""
  for (i = 1; i <= n; i++) {
    k = sorted[i]
    printf "%s", sep
    if (withkey) printf "%s: ", k
    o(a[k])
    sep = ", " }
  printf "%s", rhs }

Defaults: list keys numeric-sorted, dict keys string-sorted, integer-valued floats print as %d (so 5.00005), other numbers as %G (so 1e-71E-07). For custom brackets, call _oo directly:

_oo(a, "(", ")", "@ind_num_asc", 0)   # tuple style
_oo(a, "<", ">", "@ind_str_asc", 1)   # angle-bracket dict

The "@ind_num_asc" string names a built-in asorti() comparator. The @ prefix is required — without it gawk treats the string as a user-defined comparison function name and aborts with a fatal error.

Conventions

  • UPPER = global. HEAP, NID, FUNCTAB.
  • lowercase = local. Declare in the gawk pseudo-arg slot (extra spaces in arg list).
  • Every object sets .it.is to its type string.
  • Run rogues() at END to flag any lowercase name leaked into SYMTAB.
  • BEGIN/END blocks have no local scope — move logic to functions if you need locals there.

rogues

End-of-run leak detector. Lives in dotlib.awk. Walks SYMTAB (gawk's table of all global names) and prints any starting with a lowercase letter — the convention says lowercase is local, so any leaked lowercase global is a bug.

function rogues(    i) {
  for (i in SYMTAB)
    if (i ~ /^[a-z]/) print "leak:", i > "/dev/stderr" }

Call once at END. Silent if clean; one line per leak otherwise.

Files

What the user installs is exactly one file: dot. Everything else lives in the repo for editing — build.sh embeds each .awk source as a heredoc inside the bundled binary. At runtime, dot writes those heredocs to a tempdir (${TMPDIR}/dot.XXXX/) on demand, then gawk -f's them. So once you have dot, no other file is required.

Shipped (the install)

FileLinesPurpose
dot~280self-contained bash binary; bundles every .awk below as heredocs

In the repo (for editing)

FileLinesPurposeHow it ships
build.sh~180assembles dot from lib/not bundled (build-time only)
lib/prep.awk10preprocessor (one rule, two regex sweeps)heredoc inside dot
lib/dot.awk14runtime: new, arr, zapheredoc inside dot
lib/dotlib.awk34helpers: rogues, o, _ooheredoc inside dot
demos/hello/hello.awk6smallest example: running meanshipped in repo, runs via dot --demo hello
demos/hello/sample.txt5auto-loaded sample for hello demoshipped in repo

So: the install is one file (dot). The repo holds lib/ sources, build.sh, and any number of demos/NAME/ directories that dot --demo NAME can run. Higher layers (dotcols, dotlearn) ship their own binaries that bundle dot's lib/ plus their own.

Limitations

Read these before deploying to anything you care about.

  • Preprocessor is regex, not a parser. A literal ".x" inside a string or regex constant gets rewritten too. So does printf "%.3f" if a value-char precedes the dot. Workaround: split the dot across strings so it has a quote on each side — printf "%" "." "3f", or build basenames as name "." ext. (Quotes are not value-chars, so the regex skips it.) For object code, rename the field or write raw HEAP[x] directly.
  • Comments are not stripped. A .x inside a # ... comment also rewrites. Harmless but ugly in dot --show output.
  • .it.is is a reserved field name. Used internally for type dispatch. Don't shadow it.
  • zap(i) is the only memory hook. No GC, no refcount — you decide when an object is done with and call zap(i) to clear its HEAP slot.
  • Monotonic IDs, not recycled. NID only ever increments; zap deletes a slot's contents but never reuses its id. Not a big deal — an id is just an int, so accumulation costs effectively nothing. The slot contents stay zapped, no leak.
  • No type checking. add(num_obj, "abc", 1) coerces silently per awk rules.
  • Indirect call cost is real but small. Every polymorphic add() does a string concat + FUNCTAB lookup per row. gawk is interpreted, so the per-call overhead is already in microseconds — the indirect step adds a fraction of that. Hot inner loops can still call num_add directly if you want the last bit.
  • gawk 5.4.0 bug: gensub's second match returns empty captures. The preprocessor uses a match/substr loop instead.
  • Bash required to run dot. The binary uses heredocs, arrays, and process substitution <(...). macOS /bin/sh is bash 3.x — run via bash dot ... or under zsh's invocation, not /bin/sh dot.