Formal Foundations and Indexing
The Selectivity Theorem
The "MORK theory" paper provides a formal analysis of when ZAM/MORK yields substantial advantage. The selectivity exponent:
\[\gamma(p) = -\log_N\!\left(\frac{|P(p)|}{N}\right)\]- Variables: \(p\) = a prefix in the trie, \(|P(p)|\) = size of the posting list for prefix \(p\), \(N\) = total number of atoms
- Meaning: Measures the normalized information content of a prefix — how much it narrows the search space
- Source: Goertzel (2025), Articulating Conditions Where ZAM/MORK Yield Benefit
For a \(k\)-leg join over prefixes \(p_1, \ldots, p_k\), under \(\varepsilon\)-independence assumptions:
- If \(\sum \gamma(p_i) > 1\): the selectivity analysis predicts \(O(1)\) expected candidate intersections — ZAM feeds a constant number of candidates to the unifier
- If \(\sum \gamma(p_i) < 1\): the expected intersection grows as \(N^{1-\sum\gamma}\), still sublinear
Under a hierarchical generative model (tree-structured conditionals with exponential decay of prefix probabilities), even moderately deep bindings across a few legs push the selectivity sum past 1. Real-world AtomSpaces — logic proofs, program ASTs, semantic parses, knowledge graphs — are approximately hierarchical: symbols follow heavy-tailed distributions, structures are compositionally generated, and query legs bind weakly dependent positions.
Slot-Centric Indexing and the Break-Even Rule
Instead of \(k!\) permutations, MORK maintains at most \(k\) slot-centric prefix views — one per argument position — plus selectively promoted "hot" pair views. Key structure (binary relation):
- Canonical:
R/_1/<EXPR_ID>/_2/<C_ID>→ payload - Flip view:
R/_2/<C_ID>/_1/<EXPR_ID>→ pointer to canonical
The probabilistic break-even rule:
\[p_{s+1}(M - L) > \alpha\]- Variables: \(p_{s+1}\) = probability a query anchors on the \((s{+}1)\)-th slot; \(M\) = mining cost when anchor is missing; \(L\) = direct prefix-view hit cost; \(\alpha\) = storage cost of one additional view
- Meaning: Materialize the next slot view only when expected cost reduction exceeds storage cost
Worked example: with \(p_1=0.4, p_2=0.6, L=1, M=50\), the flip view yields a 29.4× expected speedup for roughly 2× index entries. For hot multi-slot queries, the general problem is submodular coverage under a storage budget, solvable greedily. (Source: Goertzel 2025, MORK Slots)