WILLIAM Full
â Back to WILLIAM
Responsible: Ben Goertzel, Arthur Franz (original WILLIAM concept)
Papers: Goertzel (2025), Hyperon for AGIâASI Whitepaper, §5.12, §7.6; Franz, A., WILLIAM: Adaptive Compression for AGI
Status: Experimental. Core compression-based pattern detection is under development. MORK trie instrumentation is being implemented. Full integration with PLN scheduling, ECAN attention, and neural acceleration remains a research goal.
This card provides technical depth beyond the concise WILLIAM index card. WILLIAM is an adaptive-compression-based approach to cognitive pattern discovery, now being integrated into MORK's trie infrastructure to serve as a real-time guide for both symbolic reasoning and neural processing.
Core Principle
WILLIAM embodies a fundamental insight: the patterns worth remembering are those that compress experience most effectively. It acts as a cognitive feature detector that continuously asks "what's the simplest explanation that captures what I'm seeing?" The weakness prior provides the theoretical foundation â simpler patterns generalize better, and compression naturally identifies what matters without hard-coded heuristics.
Compression Gain
Each pattern application is scored by the description-length reduction it achieves:
Formal definition:
\[\text{gain}(r, S) = L(S) - L(S') - C(r)\]- Variables: \(r\) = template/rewrite rule; \(S\) = current state; \(S'\) = state after applying \(r\); \(L(\cdot)\) = description length; \(C(r)\) = dictionary cost of template \(r\)
- Domain: \(\text{gain} \in \mathbb{R}\); positive values indicate worthwhile compression
- Assumptions: Under the Compositional Description of Data (CoDD) framework, each template application strictly strengthens the hypothesis â weakness decreases monotonically, so MDL gain is also monotone
- Meaning: Keep branches where cumulative gain \(G_t = \sum_{i \leq t} \text{gain}(r_i, S_i) > 0\)
- Source: Franz, A., WILLIAM on MORK clarifications
Theoretical Guarantees (Hierarchical Priors)
When data follows a hierarchical generative process with bounded-size reusable templates, reuse probability \(\rho > 0\), and heavy-hitter separability margin \(\gamma\):
\[K(x) = \sum_i \ell(f_i) + K(r_s) + O\bigl((\log \ell(x))^2\bigr)\]- Variables: \(K(x)\) = Kolmogorov complexity of input \(x\); \(\ell(f_i)\) = length of template \(f_i\); \(K(r_s)\) = complexity of residual; \(\ell(x)\) = input length
- Meaning: WILLIAM-on-MORK with top-\(k\) beam search (where \(k \in \{2, 3\}\) suffices) achieves near-optimal compression in \(O(\log \ell(x))\) steps, each costing \(O(T_{\text{lookup}})\) via MORK trie traversal
- Source: Franz, A., WILLIAM on MORK clarifications, Theorem 1
MORK Trie Instrumentation
The key implementation step is adding instrumentation directly to MORK's trie nodes. Each node carries:
| Field | Purpose |
|---|---|
| Local occurrence counts | How often this exact node is accessed |
| Subtree totals | Aggregate weight counters across all descendants |
| Compression-gain sums | Cumulative compression benefit when this pattern is applied |
| Top-\(k\) children rankings | Ranked lists of most important children by various metrics |
This instrumentation enables weighted iterators that return heavy subpatterns directly from any point in the graph â no global scans required.
API Surface
The API remains minimal:
iter_prefix_topk(prefix, k)â returns the \(k\) most important patterns under a given prefixiter_any_topk(k)â finds globally significant patterns across the entire graph- A validation API that records actual compression gains when patterns are applied, closing the feedback loop between prediction and outcome
Concurrency Model
The implementation handles concurrent access through per-core write buffers for counts, with wait-free readers using snapshot/RCU (Read-Copy-Update) techniques. This ensures that pattern discovery does not block the main reasoning pipeline.
Consumer Integration
WILLIAM's iterators serve multiple cognitive processes simultaneously:
| Consumer | How It Uses WILLIAM |
|---|---|
| PLN | Prioritizes inference on high-value subgraphs; follows "heavy edges" during backward chaining |
| Schedulers | Allocates resources based on compression-adjusted priorities |
| ECAN | Receives compression-driven importance signals for attention allocation |
| Pattern mining | Uses heavy subpatterns as seeds for deeper structural discovery |
| Symbolic Heads | Template library creation: mines frequent subgraphs from training text for key-value template stores |
Neural Network Acceleration
WILLIAM's integration with neural networks â particularly transformers and predictive coding networks â uses compression metrics to guide computation allocation. Applied to neural internals (especially networks using local learning, which have greater propensity toward compositional representations), WILLIAM finds "heavy-hitter features" to:
- Prioritize attention heads and tokens: Route computation to where it matters most, guided by compression metrics rather than ad-hoc sparsification rules
- Trigger on-demand refinement: Expand computation only where uncertainty justifies the cost
- Prune low-value frequency bands: Remove computation that contributes little to compression, while preserving model capability
The weakness prior provides the theoretical foundation for all three operations â the same quantale-based framework that guides symbolic pattern selection also guides neural sparsification, ensuring a uniform simplicity bias across both domains.
Role in the PRIMUS Cognitive Cycle
In the whitepaper's §5.13 integration picture, WILLIAM occupies a specific position:
- Weakness-based control selects regions needing attention
- WILLIAM identifies compression-worthy patterns in those regions
- Fluid-dynamic ECAN routes attention optimally
- PLN and ActPC-Chem refine understanding
- MetaMo/SubRep evaluate discovered options
- TransWeave determines what can be reused
Patterns discovered by WILLIAM feed directly into ActPC-Chem as chemical rules, and WILLIAM-discovered structures can be transferred to new domains via TransWeave.
Key References
- Goertzel, B. (2025). Hyperon for AGIâASI Whitepaper, §5.12: WILLIAM-on-MORK, §7.6: WILLIAM-Guided Neural Efficiency
- Franz, A. WILLIAM: Adaptive Compression for AGI
- Franz, A. WILLIAM on MORK clarifications (Theorem 1: hierarchical prior guarantees)
Related cards: PRIMUS Full · MORK Full · ECAN Full
Tags
Discussion