One section per app. Each shows real input, real output.
Decision tree on heart-disease data. Binary cuts, ezr-style.
AGE,sex,cp,TRESTBPS,CHOL,fbs,restecg,THALACH,exang,OLDPEAK,slope,CA,thal,num!
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,<50
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,>50_1
dotlearn --demo tree # uses bundled sample (heart.c.csv)
dotlearn --demo tree data/classify/heart.c.csv # explicit data file
dotlearn --demo tree | gawk -f tools/metrics.awk # add metrics summary
<50,<50
<50,<50
>50_1,>50_1
<50,>50_1
... (one row per held-out test row)
Pipe to tools/metrics.awk for per-class summary:
n pd pf prec acc class
48 0.625 0.145 0.789 0.748 >50_1
55 0.855 0.375 0.723 0.748 <50
n = test count, pd = recall, pf = false-alarm, prec = precision, acc = accuracy. Same metrics for both classification and regression (with continuous targets, class column shows raw distance values).
Naive Bayes. Same shape as tree.
dotlearn --demo nb # uses bundled sample (heart.c.csv)
dotlearn --demo nb | gawk -f tools/metrics.awk # add metrics summary
n pd pf prec acc class
48 0.729 0.091 0.875 0.825 >50_1
55 0.909 0.271 0.794 0.825 <50
acc=0.825 on heart.c (vs tree=0.748). Naive Bayes wins here because heart has many low-cardinality symbolic columns.
Active learning on auto93 (multi-objective: Lbs-, Acc+, Mpg+).
lo, med, sd of disty over full data. Used later to score the final pick.dt with same column structure but only train rows fed in. All subsequent stats come from dt, not full data — matches ezr's setup, no info leak.best, rest go to rest.distx(best_centroid) - distx(rest_centroid). Pick lowest score. Add to best. Cap best at sqrt(|labelled|); evict worst to rest.mu.dotlearn --demo acquire # uses bundled sample (auto93.csv)
dotlearn --demo acquire data/regression/auto93.csv # explicit data file
wins lo=0.075 med=0.535 sd=0.266
=== TREE on 54 labelled rows ===
rule d2h n Lbs- Acc+ Mpg+
ROOT 0.348 54 2285.63 16.49 29.63
| Clndrs > 4 0.725 6 3348.50 13.47 18.33
| | Clndrs > 6 0.842 3 3818.00 13.07 13.33
| | Clndrs <= 6 0.608 3 2879.00 13.87 23.33
| Clndrs <= 4 0.300 48 2152.77 16.87 31.04
| | HpX > 70 0.378 23 2314.83 15.84 28.70
| | | origin != 1 0.413 15 2295.00 15.59 27.33
| | | | Volume > 108 0.465 7 2455.29 15.97 24.29
| | | | Volume <= 108 0.367 8 2154.75 15.26 30.00
| | | origin == 1 0.312 8 2352.00 16.31 31.25
| | HpX <= 70 0.229 25 2003.68 17.81 33.20
| | | Volume > 91 0.283 12 2120.67 17.71 30.83
| | | Volume <= 91 0.180 13 1895.69 17.91 35.38
=== RESULT ===
labelled : 54 (start=4, budget=50)
test rows : 199
top 5 guess actual-disty=0.158 win=100/100
Each row of the tree shows: rule, d2h (mean distance-to-heaven for rows in the node), n (row count), then per-y-goal column means. The deepest leaves with low d2h represent the "best" corner of design space — light cars (Lbs ~1900), high acceleration (Acc 17-18), high MPG (35).
30 seeds on auto93 (default config: start=4, budget=50, check=5):
all 30 seeds: win=100/100
pick disty: range [0.075, 0.160], all inside lo + 0.35*sd cushion (0.168)
Roughly: 14% labels + 5 oracle calls at end → consistently top-tier picks.
| Key | Default | Meaning |
|---|---|---|
seed | 1 | random seed (Fisher-Yates shuffle) |
p | 2 | Minkowski exponent (1=Manhattan, 2=Euclid) |
few | 128 | cap on train pool size |
start | 4 | warm-start labels |
budget | 50 | acquire loop iterations |
check | 5 | top-N predictions to evaluate at end |
leaf | 3 | min rows per tree leaf |
maxd | 8 | max tree depth |