MORK Full

Draft — This content has not been approved for publication.

Home / Knowledge Representations / MORK / MORK Full

Responsible: Adam Vandervorst, Luke Peterson, Remy Clarke

Papers: Goertzel (2025), Hyperon Whitepaper §2.3, §3.6; Goertzel (2025), Articulating Conditions Where ZAM/MORK Yield Benefit: A Selectivity Theorem; Goertzel (2025), From Path Algebra in MORK to Tensor Logic on GPUs; Goertzel (2025), Slot-Centric Indexing vs. Permutation Explosion; Peyton Jones et al., Triemaps that Match (arXiv:2302.08775)

Status: Current. MORK is operational — the 8-crate Rust workspace compiles on nightly, with working triemap storage, Zipper Abstract Machine, bidirectional pattern matching, and MM2 execution. PeTTa/MORK integration handles 400M+ atoms in RAM (per AtomSpace cluster pilot 2026-04-29 reconciliation; earlier "500M" figure was wiki drift). MORK-native PLN, WILLIAM integration, and distributed multi-machine execution are under development. Note that PathMap is a sibling-repo dependency (separate codebase), not a MORK subcrate; the MORK server-branch is versioned independently of the main library.

This card provides technical depth beyond the concise MORK index card. MORK (MeTTa Optimal Reduction Kernel) is Hyperon's high-performance hypergraph engine — a specialized in-RAM processing kernel designed for large speedups over previous AtomSpace implementations. It provides the computational substrate on which PRIMUS's cognitive algorithms execute at scale.

Cluster-pilot context: The AtomSpace Backend Integration cluster pilot (closed 2026-04-29) locked in a four-layer taxonomy where MORK is Layer 4 — see the Architecture and Ecosystem and Status and Resources subcards below for the per-corner reconciled findings (8-crate workspace, server-branch separation, 400M atom RAM scaling, MorkDB link-delete blocker for primary-store promotion, weighted-atom-sweep reframed as adjacent experimental analogy not a strict-literal ECAN port).

Related cards: AtomSpace Full (abstract concept), DAS Full (distributed complement), PathMap (foundational trie substrate, sibling-repo dependency), PLN Full (chaining + factor graphs on MORK), ECAN Full (weighted atom sweeps), MORK Theory Publication Map

Core Mechanisms

PathMap (Triemap Storage)

MORK's foundational data structure is a prefix-compressed triemap (radix tree) that stores S-expressions as paths. Where a traditional graph database scatters nodes and links across memory, MORK organizes them into a structured hierarchy where shared prefixes are stored once. This provides:

Content addressing: Each atom is identified by its path in the trie (hash-consing), yielding structural sharing and $O(1)$ pointer equality
Near-constant-time neighbor lookups: PathMap supplies a path algebra (prefix scans, union/intersection of posting lists, anti-join) that directly supports join planning and streaming candidate generation
Automatic deduplication: Identical subexpressions share storage automatically
Slot-centric views: For $k$-ary relations, MORK maintains at most $k$ slot-anchored prefix views rather than $k!$ permutations

Zipper Abstract Machine (ZAM)

The execution model that exploits PathMap: given a query with one or more legs (rows of guarded reads), ZAM chooses selective path prefixes, fetches posting lists from the trie, computes intersections incrementally, and streams only the compatible frontier to the unifier. This pushes most pruning before unification. ZAM uses zippers (cursor-based navigation) for efficient, parallel logical inference with near-linear performance scaling across cores.

MM2 (Minimal MeTTa 2)

A low-level dataflow language for defining performance-critical components directly on MORK's data structures. MM2 uses a "Gather-Process-Scatter" paradigm with priority-based execution, sources (inputs), and sinks (outputs). The language is formally decomposed into four semantic segments: a monotonic base (metagraph rewrites where automatic pattern reordering is safe — enabling 1000× speedups without changing semantics), sinks (non-monotonic output operations), sources (input/data loading), and syncs (conjunctions of sources and sinks). Operations include fork/join patterns, set operations, and macro-based partial evaluation. MM2 is where PLN factor-graph message passing, ECAN attention sweeps, pattern mining iterations, and tensor operations run at database-engine speeds.

MORKL

A declarative query language providing "bare metal" access to MORK's trie structures for structural manipulation using S-expression syntax. Think of it as SQL for MORK — whereas MM2 is for dynamic algorithms at scale.

Formal Foundations and Indexing

The Selectivity Theorem

The "MORK theory" paper provides a formal analysis of when ZAM/MORK yields substantial advantage. The selectivity exponent:

\[\gamma(p) = -\log_N\!\left(\frac{|P(p)|}{N}\right)\]

Variables: $p$ = a prefix in the trie, $|P(p)|$ = size of the posting list for prefix $p$, $N$ = total number of atoms
Meaning: Measures the normalized information content of a prefix — how much it narrows the search space
Source: Goertzel (2025), Articulating Conditions Where ZAM/MORK Yield Benefit

For a $k$-leg join over prefixes $p_1, \ldots, p_k$, under $\varepsilon$-independence assumptions:

If $\sum \gamma(p_i) > 1$: the selectivity analysis predicts $O(1)$ expected candidate intersections — ZAM feeds a constant number of candidates to the unifier
If $\sum \gamma(p_i) < 1$: the expected intersection grows as $N^{1-\sum\gamma}$, still sublinear

Under a hierarchical generative model (tree-structured conditionals with exponential decay of prefix probabilities), even moderately deep bindings across a few legs push the selectivity sum past 1. Real-world AtomSpaces — logic proofs, program ASTs, semantic parses, knowledge graphs — are approximately hierarchical: symbols follow heavy-tailed distributions, structures are compositionally generated, and query legs bind weakly dependent positions.

Slot-Centric Indexing and the Break-Even Rule

Instead of $k!$ permutations, MORK maintains at most $k$ slot-centric prefix views — one per argument position — plus selectively promoted "hot" pair views. Key structure (binary relation):

Canonical: R/_1/<EXPR_ID>/_2/<C_ID> → payload
Flip view: R/_2/<C_ID>/_1/<EXPR_ID> → pointer to canonical

The probabilistic break-even rule:

\[p_{s+1}(M - L) > \alpha\]

Variables: $p_{s+1}$ = probability a query anchors on the $(s{+}1)$-th slot; $M$ = mining cost when anchor is missing; $L$ = direct prefix-view hit cost; $\alpha$ = storage cost of one additional view
Meaning: Materialize the next slot view only when expected cost reduction exceeds storage cost

Worked example: with $p_1=0.4, p_2=0.6, L=1, M=50$, the flip view yields a 29.4× expected speedup for roughly 2× index entries. For hot multi-slot queries, the general problem is submodular coverage under a storage budget, solvable greedily. (Source: Goertzel 2025, MORK Slots)

Architecture and Ecosystem

Crate Structure

MORK is an 8-member Rust workspace (verified at HEAD 4cef6f7 against MORK/Cargo.toml:3-12; nightly toolchain required for generators, coroutines, SIMD):

interning/ — Symbol interning with lock-free handles (128 concurrent writers)
expr/ — S-expression types, binary encoding, macros
frontend/ — Multiple parsers (CZ2, CZ3, HE, Rosetta, bytestring formats)
kernel/ — Core Space implementation, sinks, sources, pure reduction engine
experiments/eval/ — Exploratory MM2 evaluator
experiments/eval-ffi/ — FFI-side evaluator integration
experiments/eval-examples/ — Example workloads
experiments/unification_test_laws/ — MORK ↔ SWI-Prolog unification correctness audit (PR #49 Prolog-as-oracle for the unification subset only)

Foundational dependency: PathMap — sibling repo authored by Luke Peterson; declared at MORK/Cargo.toml:28-32 as ../PathMap/ with jemalloc, arena_compact, nightly features. PathMap is the low-level trie substrate (key-value store with prefix compression, structural sharing, algebraic operations); MORK's path-algebra and zipper machinery sit on top of it. See PathMap for substrate details.

Server branch: The mork-server deployment is maintained on a separate server branch, distinct from main. DAS pins MORK 578a759 (2025-07-21) via das/src/docker/mork/Dockerfile.server; local origin/server HEAD as of 2026-04-29 is 5b04a1d (2026-04-18), 49 commits ahead of the DAS pin with deadlock and UTF-8 fixes. See Status for drift detail.

System Interfaces

PeTTa: Primary MeTTa compiler connecting via FFI. PeTTa alone handles 50–100M atoms; PeTTa/MORK has been benchmarked up to 400M atoms in RAM (mork_ffi/example_space.metta:13-17 documents successful 100M/200M/300M/400M loads; 500M ran out of memory at the same site). Earlier wiki text quoting "500M+ atoms in RAM" treated the OOM ceiling as demonstrated capacity — corrected.
mork_ffi: Rust FFI bindings for SWI-Prolog. 65s for 1M atoms (vs OOM with predicate store). Two distinct Prolog roles: (1) PR #49 unification oracle in experiments/unification_test_laws/; (2) PeTTa runtime bridge via SWI-Prolog predicate mork/3 at mork_ffi/mork.c:33-36 + morkspaces.pl:7-32.
faiss_ffi: FAISS vector similarity FFI for Prolog, enabling structural random indexing.
MeTTa-IL: Compilation target — routes execution to MORK for local low-latency reasoning.
DAS: DAS contains a code-real MorkDB AtomDB backend (subclass of RedisMongoDB) that talks to a MORK HTTP server (server-branch deployment) and Mongo-side metadata. Link/S-expression delete is hard-failed at MorkDB.cc:268-270 ("MORKDB does not support deleting links") — DAS-as-MORK-backend is integration-ready for loads and queries, NOT a Decko-compatible mutable store. See DAS Full.
ByteFlow and Tensor Logic: Adaptive block packing for dense numerical data. Relations become sparse matrices, joins become matrix products. Operations generalize across semirings.
ShardZipper: Merkle-based distributed state management. RAPTL enhances with triple quantale $(\varphi, \alpha, r)$ and confidence-weighted scoring: $\text{partition\_score}(s) = \text{avg\_confidence}(s) / (\text{predicted\_cost}(s) \cdot \text{locality}(s))$. (Goertzel 2025, RAPTL ShardZipper §3.2–3.3)

MORK Special Forms (MeTTaTron)

MeTTaTron provides four special forms that bridge high-level MeTTa and low-level MORK execution. All use uniform conjunction semantics — the (,) wrapper makes result cardinality explicit and enables meta-programming:

exec (<priority> <antecedent> <consequent>) — Rule execution with conjunction antecedents. All antecedent goals must match (left-to-right, variable bindings threaded through). Consequents can be conjunction results or space-modifying Operations (O (+ fact) (- fact)). Priority determines execution order. Non-deterministic: multiple antecedent solutions produce multiple consequent evaluations.
coalg (<pattern> <templates>) — Coalgebra patterns for tree transformations. Template conjunction cardinality determines result count: (,) = zero results (termination), (, t) = one result, (, t1 t2) = unfold to two. Enables hierarchical decomposition (e.g., tree → contexts → leaf values via lift/explode/drop stages).
lookup (<pattern> <success-goals> <failure-goals>) — Conditional fact queries with branching. Variables bound during pattern match are available in the success branch. Nestable for priority chains.
rulify ($name <pattern> <templates> <antecedent> <consequent>) — Meta-programming: generates exec rules from coalgebra definitions by pattern matching on template arity. Enables runtime rule generation from declarative specifications.

The conjunction pattern provides ~36% parser code reduction, ~40% evaluator simplification, and ~80% fewer edge-case bugs, with negligible runtime overhead (~2 bytes per conjunction, ~10ns per goal evaluation).

(Provenance: repo-doc, MeTTa-Compiler MORK special forms documentation)

Key Cognitive Algorithm Integrations

PLN: Backward chaining (HeadIndex/FactIndex/UnifyIndex) and factor-graph belief propagation (FactorAtom/VariableAtom with near-constant-time neighbor lookups) — paper/proposal/benchmark-only at this snapshot. PLN cluster pilot Sources 8/11 and AtomSpace cluster pilot Source 3 confirm no code-real FactorGraph PLN over MORK in inspected primary repos (MORK/, PathMap/, mork_ffi/, mork-rust-sdk/, mork-ts-sdk/: 0 hits for FactorGraph/factor_graph/belief_propagation/pln/wmpln/lib_pln/AttentionBank/cog-av-sti).
WILLIAM: Trie nodes carry occurrence counts, subtree totals, compression-gain sums for real-time pattern detection
Weighted Atom Sweeps (implementation analogy, NOT ECAN currency): The weighted-atom-sweep repo (HEAD 1471ff2c, 2026-03-03) is a separate adjacent experimental crate outside the canonical MORK 8-member workspace, depending on PathMap and MORK expr. Its AtomHeader is a generic trait — NOT an STI/LTI/TruthValue/AttentionValue — and its match counter is an integer, not an ECAN attention currency. The pattern (aggregate weight counters in trie nodes for importance-proportional sampling) is repurposable for recency/priority/card-version metadata if a Decko-specific AtomHeader and traversal policy are defined, but NOT a code-real ECAN bridge at this snapshot.
MOSES/GEO-EVO: Program templates as content-addressed atoms; near-identical candidates deduplicated automatically
Pattern Mining: Patterns as subtree traversals; counts as capsule summaries at nodes

Implementation Anchors

MORK (primary kernel) — 8-member Rust workspace at HEAD 4cef6f7; nightly toolchain required.
PathMap (foundational substrate, sibling repo) — Luke Peterson's prefix-compressed triemap; declared as MORK ../PathMap/ dep. See PathMap.
MORK server branch (deployment line) — mork-server at origin/server; DAS pin and image-tag reconciliation discussed at Status.
CZ2 — Scala 3 triemap toolkit and scaling experiments.
mork_ffi — Rust FFI bindings for SWI-Prolog/MeTTa integration.
weighted-atom-sweep (adjacent experimental crate, NOT in canonical MORK workspace) — Rust weighted atom sweep on PathMap; implementation analogy for ECAN-style sampling but no ECAN/AtomSpace Value bridge code at this snapshot.
MM2_Structuring_Code — Comprehensive MM2 tutorial (28 examples).

Status and Resources

Current Status

Operational: Triemap storage, ZAM execution, bidirectional pattern matching, MM2 language, PeTTa/MORK integration, mork_ffi for Prolog bridging
Under development: MORK-native PLN (backward chaining + factor graphs — paper/proposal/benchmark-only, no code-real FactorGraph PLN at this snapshot per AtomSpace cluster pilot Source 3), WILLIAM trie instrumentation, ByteFlow GPU acceleration, ShardZipper distributed state, streaming fusion optimization
Proposed: Multi-machine distributed processing, QuantiMORK (neural tensor encoding), WASM edge deployment, native MeTTa-to-machine-code compiler

Implementation Findings (transcript-backed, MORKification Weekly Aug 2025–Apr 2026)

MM2 scale: ~350 grounded functions as of Jan 2026. One MM2 step triggers billions of parallel rewrites due to massive parallelization.
Sources/sinks architecture: Three-layer resource abstraction — resources, sources (readers), sinks (writers). This is the integration surface for ECAN weights, hypervectors, and external systems. Note: CountSink in the kernel (MORK/kernel/src/sinks.rs:512-591) is an MM2 query/reduction primitive (per-execution accumulator), not a persistent revision log — do not target it as a Decko card-history counter.
Compression benchmarks: PathMap uses seven levels of nested shared patterns to represent all 64-bit integers in 8 nodes. JSON import: 20× reduction (780 GB JSON → 40 GB ACT).
RAM scaling benchmark: PeTTa/MORK has been demonstrated up to 400M atoms in RAM (mork_ffi/example_space.metta:13-17: successful 100M/200M/300M/400M; 500M ran out of memory at the same site). Earlier "500M+ atoms in RAM" wiki text treated the OOM ceiling as demonstrated capacity — corrected.
Streaming fusion investigation: Six implementations tested — all ~10× slower than binary operations due to branch traversal overhead. Active investigation with database-inspired query optimization.
Concurrency advantage: MORK surpasses ATRIUM (20,000 threads) on its own benchmarks due to sequential thread coordination rather than threads fighting over shared memory.
Applications: 4×4 Sudoku (~2 ms), CTL model checking, decision tree learning, Blocks World / PDDL planning.

Known Limitations (discussion-backed, MORK Mattermost)

Concurrency ceiling: MORK crashes at ~200 concurrent users (Rejuve.Bio load test, Mar 2026) rather than degrading gracefully.
No automatic persistence: Data lost on restart. Manual save/restore via paths_export() / paths_import().
Negative querying unsound: Removed from MM2. Use != or nested if/not/find instead.
Memory multiplier: ~64 bytes/atom. String-heavy datasets need interning (7 GB → 1.9 GB vs 31 GB default).
WASM deprecated: ~15× overhead. Pure Rust grounded functions now default.
Server-branch versioning: The mork-server deployment line is maintained on a separate server branch. As of 2026-04-29, three references are not reconciled: das-toolbox CLI defaults to image tags trueagi/das:mork-server-1.0.5 + mork-loader-1.0.5; das/src/docker/mork/Dockerfile.server pins MORK commit 578a759 (2025-07-21); local origin/server HEAD is 5b04a1d (2026-04-18) — 49 commits ahead of the DAS pin with deadlock and UTF-8 fixes. Production deployment must reconcile all three references. Notable post-pin fixes in the gap: server shutdown deadlock (5b04a1d), user-status-map cleanup (08116b0), lock-held-too-long deadlock (205dd91), UTF-8 validation for symbol pathway (f284ff6), edge-case + malformed-symbol test coverage (7872975).
Link/S-expression delete unsupported in DAS-MorkDB backend: das/src/atomdb/morkdb/MorkDB.cc:268-270 hard-fails. Node delete works (inherited from RedisMongoDB); link delete does not. flush_pattern + re_index_patterns provide batch-rebuild workarounds, NOT live mutable-store CRUD. See DAS Full.

Open Problems / Research Directions

Multi-machine distribution — scaling across clusters while preserving PathMap locality
QuantiMORK — wavelet/multiresolution DAG encoding for neural structures
GPU/TPU acceleration via ByteFlow for dense numerical kernels
Community and third-party package ecosystem
Formal verification of ZAM correctness properties
Decko-compatible mutable-backend semantics — link delete, transactional history, RichText/file/permission mappings; MORK alone does not provide them, an adapter layer is required (AtomSpace cluster pilot Source 3 R3.G2)

Primary Sources

Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §2.3, §3.6.
Goertzel, B. (2025). Articulating Conditions Where ZAM/MORK Yield Benefit. RawData.
Goertzel, B. (2025). From Path Algebra in MORK to Tensor Logic on GPUs. RawData.
Goertzel, B. (2025). Slot-Centric Indexing vs. Permutation Explosion. RawData.
Peyton Jones, S. et al. Triemaps that Match.
See also: MORK Theory Publication Map.
AtomSpace Backend Integration Cluster Pilot (2026-04-29) — cluster archive at scripts/archive/atomspace_pilot/; Source 3 reconciliation is the canonical record for the corrections on this card.