Core Concept and Data Model

The Metagraph

Formal definition:

\[\mathcal{M} = (\mathcal{A},\; \mathcal{T},\; \tau)\]

Variables: $\mathcal{A}$ = set of atoms, $\mathcal{T}$ = type lattice, $\tau : \mathcal{A} \to \mathcal{T}$ = typing function
Meaning: A metagraph generalizes hypergraphs by allowing edges to contain other edges recursively — any atom can appear inside any expression, enabling arbitrary nesting depth
Source: Goertzel (2025), Hyperon Whitepaper §2.1; Vepstas, AtomSpace Design Notes

Atoms come in four variants (in Hyperon Experimental's formalization):

Symbol: Named, globally unique identifiers (e.g., Human, Mortal, +)
Variable: Bindable placeholders prefixed with $ (e.g., $x, $result)
Expression: Ordered tuples of atoms $(a_1, a_2, \ldots, a_n)$ where each $a_i \in \mathcal{A}$ — this recursion is what makes it a metagraph rather than a flat graph
Grounded: Opaque handles wrapping external data or callable code (Python objects, neural model references, file handles)

Foundational Design Principle: Total State Visibility

The AtomSpace was designed around a single organizing constraint: "all OpenCog state is in the Atomspace. There isn't any state that isn't in the AtomSpace; it can't be found under a rock, or tucked away in some object." This principle — that all state should be visible to all algorithms — extends distributed-computing discipline (where state must be locatable and transportable) to AI reasoning systems. Learning algorithms, logical inference engines, and data mining processes all access a unified, visible state container rather than maintaining hidden internal caches. The schema-free "anything goes" hypergraph structure contrasts with SQL's pre-declared tables, though schemas can be optionally declared via the type system. (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)

Content Addressing and Self-Normalization

Each atom is identified by what it contains — structurally identical atoms are the same atom. Vepstas (2023) demonstrates formally that this content-addressed s-expression representation is ~4× more compact than UUID-based in-RAM pointer representation (48 bytes vs. 184 bytes for a representative metatree). UUIDs are rejected as fundamentally flawed for distributed metagraph storage: they require either a centralized issuing authority (bottleneck) or cryptographic hashes (128–192 bits, expensive to compute), while s-expressions are self-identifying — "anyone can mint it at any time, at very low cost" with no centralized authority needed. Compressed with standard algorithms, s-expression files outperform UUID-based formats by a wide margin. A further formal result: metagraphs are self-normalizing — the normalization problem that consumes vast effort in relational database design "comes for free" with metatrees, because the hierarchical structure inherently avoids the duplication that SQL normalization addresses. (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023)

Each atom is identified by what it contains — structurally identical atoms are the same atom. In MORK, this is realized via trie paths (hash-consing); in the classical AtomSpace, via a global atom table with UUID indexing. Content addressing enables automatic deduplication: identical subexpressions are stored once and referenced many times.

Code Is Data

Because MeTTa programs are themselves expressions in AtomSpace, there is no distinction between code and data. Atomese was explicitly designed to be "used by algorithms, not by humans" — like GIMPLE/GIL/LLVM IR but general purpose. "It's like a macro language that you can expand over and over" (Linas Vepstas). Rules and the KR language are the same language; rules can modify rules; the query language is itself a graph stored in the database. This design philosophy — algorithmic consumption over human readability — is the core reason MeTTa exists as a human-facing layer atop the graph substrate. (mailing-list-backed: Code-as-data-programs-universal-knowledge-base, 2016)

Hypergraph Indexing Advantage

Hypergraph storage is demonstrably more RAM-efficient than ordinary graph storage (Neo4j, property graphs). The key insight: in SQL/NoSQL databases, indexes are hidden and inaccessible — internal-use-only by the DB. In AtomSpace, incoming/outgoing sets are the indexes, user-visible and walkable. "When you use a graph DB, you get direct access to 'indexes' as user-visible and user-controllable objects." The Zipfian square-root profile of real datasets (genomics, Wikipedia) amplifies this advantage. (mailing-list-backed: Atomspace-RAM-and-CPU-usage, 2014)