AtomSpace Deep Dive

Approved by Ursula Addison on 2026-05-07

AtomSpace is Hyperon's universal knowledge substrate β€” a typed, content-addressed metagraph that co-locates symbols, tensors, truth values, motives, and edit operations, so that distinct cognitive processes can interoperate over one shared memory and control plane. This card provides technical depth beyond the concise AtomSpace index card. One thing to know up front: “AtomSpace” is not a single piece of software but a concept implemented across four layers (classical C++, the Rust Hyperon Space, the distributed DAS, and the high-performance MORK kernel) β€” the Implementations section below sorts out which is which.

Authors:

Contributors:

Papers: Goertzel (2025), Hyperon Whitepaper Β§2.1–2.2; Vepstas (2023), Graphs, Metagraphs, RAM, CPU v2.1.1 (also TODS 2024 submission); Vepstas, Sheaves series (sheaf docs)

Status: Current. The AtomSpace concept is implemented across a four-layer taxonomy: Layer 1 Classical AtomSpace (opencog/atomspace, mature C++/Scheme/Python, v5.0+) maintained independently by Linas Vepstas; Layer 2 Hyperon Space (trueagi-io/hyperon-experimental, Rust reimplementation with deep Python integration); Layer 3 DAS (distributed AtomSpace, see DAS Deep Dive); Layer 4 MORK (high-performance triemap kernel, see MORK Deep Dive). The abstract Space API enabling multiple backend implementations is operational. See the Implementations subcard below for the per-layer detail.

Related cards: MORK Deep Dive (high-performance backend), DAS Deep Dive (distributed backend), PathMap (foundational trie substrate), AtomSpace Backend Integration (synthesis / Phase-3-and-Phase-4 integration plan), PLN Deep Dive (reasoning over AtomSpace), MeTTa Deep Dive (execution language), History (historical evolution)

Core Concept and Data Model

The Metagraph

Formal definition:

\[\mathcal{M} = (\mathcal{A},\; \mathcal{T},\; \tau)\]
  • Variables: \(\mathcal{A}\) = set of atoms, \(\mathcal{T}\) = type lattice, \(\tau : \mathcal{A} \to \mathcal{T}\) = typing function
  • Meaning: A metagraph generalizes hypergraphs by allowing edges to contain other edges recursively β€” any atom can appear inside any expression, enabling arbitrary nesting depth
  • Source: Goertzel (2025), Hyperon Whitepaper Β§2.1; Vepstas, AtomSpace Design Notes

Atoms come in four variants (in Hyperon Experimental's formalization):

  • Symbol: Named, globally unique identifiers (e.g., Human, Mortal, +)
  • Variable: Bindable placeholders prefixed with $ (e.g., $x, $result)
  • Expression: Ordered tuples of atoms \((a_1, a_2, \ldots, a_n)\) where each \(a_i \in \mathcal{A}\) β€” this recursion is what makes it a metagraph rather than a flat graph
  • Grounded: Opaque handles wrapping external data or callable code (Python objects, neural model references, file handles)

Foundational Design Principle: Total State Visibility

The AtomSpace was designed around a single organizing constraint: "all OpenCog state is in the Atomspace. There isn't any state that isn't in the AtomSpace; it can't be found under a rock, or tucked away in some object." This principle β€” that all state should be visible to all algorithms β€” extends distributed-computing discipline (where state must be locatable and transportable) to AI reasoning systems. Learning algorithms, logical inference engines, and data mining processes all access a unified, visible state container rather than maintaining hidden internal caches. The schema-free "anything goes" hypergraph structure contrasts with SQL's pre-declared tables, though schemas can be optionally declared via the type system. (Provenance: official-site, wiki.opencog.orgβ€” AtomSpace design notes)

Content Addressing and Self-Normalization

Each atom is identified by what it contains β€” structurally identical atoms are the same atom. Vepstas (2023) demonstrates formally that this content-addressed s-expression representation is ~4Γ— more compact than UUID-based in-RAM pointer representation (48 bytes vs. 184 bytes for a representative metatree). UUIDs are rejected as fundamentally flawed for distributed metagraph storage: they require either a centralized issuing authority (bottleneck) or cryptographic hashes (128–192 bits, expensive to compute), while s-expressions are self-identifying β€” "anyone can mint it at any time, at very low cost" with no centralized authority needed. Compressed with standard algorithms, s-expression files outperform UUID-based formats by a wide margin. A further formal result: metagraphs are self-normalizing β€” the normalization problem that consumes vast effort in relational database design "comes for free" with metatrees, because the hierarchical structure inherently avoids the duplication that SQL normalization addresses. (Provenance: publication, Vepstas β€” "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023)

Each atom is identified by what it contains β€” structurally identical atoms are the same atom. In MORK, this is realized via trie paths (hash-consing); in the classical AtomSpace, via a global atom table with UUID indexing. Content addressing enables automatic deduplication: identical subexpressions are stored once and referenced many times.

Code Is Data

Because MeTTa programs are themselves expressions in AtomSpace, there is no distinction between code and data. Atomese was explicitly designed to be "used by algorithms, not by humans" β€” like GIMPLE/GIL/LLVM IR but general purpose. "It's like a macro language that you can expand over and over" (Linas Vepstas). Rules and the KR language are the same language; rules can modify rules; the query language is itself a graph stored in the database. This design philosophy β€” algorithmic consumption over human readability β€” is the core reason MeTTa exists as a human-facing layer atop the graph substrate. (mailing-list-backed: Code-as-data-programs-universal-knowledge-base, 2016)

Hypergraph Indexing Advantage

Hypergraph storage is demonstrably more RAM-efficient than ordinary graph storage (Neo4j, property graphs). The key insight: in SQL/NoSQL databases, indexes are hidden and inaccessible β€” internal-use-only by the DB. In AtomSpace, incoming/outgoing sets are the indexes, user-visible and walkable. "When you use a graph DB, you get direct access to 'indexes' as user-visible and user-controllable objects." The Zipfian square-root profile of real datasets (genomics, Wikipedia) amplifies this advantage. (mailing-list-backed: Atomspace-RAM-and-CPU-usage, 2014)

Values and Space API

The Atom / Value Distinction

Atoms are the graph structure β€” immutable, globally unique, typed, and indexed. They represent relationships, categories, rules, and long-term stable knowledge. Atoms are heavy-weight objects designed for structural queries and pattern matching. Think of them as the "plumbing." Atom creation costs ~tens of microseconds (indexed).

Values are mutable vectors attached to Atoms via a key-value store. They are not indexed, not globally unique, and designed to be small, fast, and fleeting. Values hold truth values, probabilities, streaming sensor data, attention weights, and any other rapidly-changing metadata. Think of them as the "fluid in the pipes." Values have no indexing overhead.

This separation is a deliberate performance decision: the graph structure changes slowly (adding a new concept or relationship), while valuations change rapidly (updating a confidence score after new evidence). Indexing only the structure keeps the pattern matcher fast even as values churn. For DNN integration, this means tensor data (activations, weights) should use custom Value classes (e.g., TensorFlowValue), while the Pattern Matcher accesses Values indirectly through predicates rather than direct search. The conceptual bridge: "conscious processes over Atoms, subconscious processes over Values." (mailing-list-backed: OpenCog-DNNs-PPLs-Atoms-vs-Values, 2018)

The TruthValue to FloatValue Transition

Original design: Every atom carried a SimpleTruthValue \(\langle s, c \rangle \in [0,1]^2\) β€” strength and confidence β€” central to PLN reasoning. This led to proliferating specialized types: FuzzyTruthValue, DistributionalTruthValue, IndefiniteTruthValue.

Problem identified: "Complex multiple inheritance relations" among proliferating TV types, plus most AtomSpace calculations needing crisp boolean operations, not probabilistic truth values. Mandatory TruthValues "hurt performance and cluttered the API." The transition was blocked for years by unsolved serialization: without a serialize/deserialize proposal, PropertyMaps were "a non-starter." (mailing-list-backed: Replacing-TV-and-AV-objects-with-property-maps, 2015)

Resolution: TruthValues were generalized to FloatValue β€” generic vectors \(\mathbf{v} \in \mathbb{R}^n\) of arbitrary dimension. Update formulas moved out of C++ into Atomese arithmetic, making the value algebra programmable rather than hardcoded. In Hyperon, the PLN truth value algebra is implemented in MeTTa rather than baked into the storage layer.

The Space API

Hyperon abstracts AtomSpace behind the Space API β€” a universal interface that any backend can implement. Core operations:

  • match(pattern, space): Find all substitutions \(\sigma\) such that \(\sigma(\text{pattern})\) exists in the space.
  • add(atom, space) / remove(atom, space): Modify the space contents.
  • rewrite: Match a pattern and replace with a template under the computed substitution β€” the fundamental MeTTa evaluation step.

Multiple Space implementations coexist:

  • MORK-backed: High-performance in-RAM triemap for local reasoning
  • DAS-backed: Distributed storage across Redis/MongoDB clusters
  • Neural Spaces: DNNs wrapped as queryable AtomSpaces β€” matching returns approximate nearest neighbors in embedding space
  • Rholang AtomSpace: Capability-secured execution on ASI Chain for decentralized cognitive processes
  • In-memory (reference): Simple implementation in Hyperon Experimental for development and testing

MeTTa code is largely Space-independent β€” the same program can target different backends by naming different Spaces.

Implementations

Four-Layer AtomSpace Taxonomy

"AtomSpace" at this point in time is not a single coherent backend β€” the ecosystem spans four distinct implementation layers with different repos, runtime characteristics, and Decko-integration suitability. Readers who encounter "AtomSpace" without qualification should check which of these four layers is meant before drawing conclusions.

# Layer Repos / Evidence Decko relevance
1 Classical AtomSpace StorageNode opencog/atomspace+ atomspace-storage+ atomspace-pgres+ atomspace-rocks+ atomspace-cog+ atomspace-bridge Best read-side SQL import ancestor; not Decko-write-ready
2 Hyperon Space trueagi-io/hyperon-experimental:lib_spaces.metta GroundingSpace / SpaceMut / DynSpace; MeTTa-facing demos; not primary Decko backend
3 DAS AtomDB + services singnet/das: AtomDB + Query Engine + AttentionBroker + agents; MorkDB Candidate later query/deployment layer; delete + server-pin caveats
4 MORK native substrate trueagi-io/MORK+mork_ffi PathMap + + SDK + server branch Performance substrate; requires adapter layer for Decko semantics

Source: a 2026-04-29 source-code review across the four layers; archive at scripts/archive/atomspace_pilot/ in the wiki repository. The H4 sections that follow give per-layer detail (Classical β†’ Layer 1; Hyperon Experimental β†’ Layer 2). Layer 3 (DAS) is detailed at DAS Full; Layer 4 (MORK native) is detailed at MORK Full.

Classical AtomSpace Architecture (C++)

The opencog/atomspaceC++ implementation provides the mature, battle-tested realization:

  • Atom type hierarchy: ~150 predefined types (Node, Link, ConceptNode, PredicateNode, EvaluationLink, etc.) organized in a class hierarchy. New types can be defined at runtime.
  • Pattern matching engine: Two-phase execution β€” compilation extracts variables and builds connectivity maps; execution via PatternMatchEngine with callback mixins for custom match semantics. Type checking and pattern matching are theoretically identical operations β€” "there is absolutely zero theoretical difference" for first-order types, though higher-order types require additional logical deduction (Linas Vepstas, 2014). This equivalence directly influenced Hyperon's approach of unifying type checking and matching. (mailing-list-backed: Atomspace-type-checking, 2014)
  • PLN/PM semantic duality: PLN link types and Pattern Matcher link types form a Kripke semantics pair: "BindLink is the Kripke equivalent of ImplicationLink." PLN links operate on truth values (probabilistic), while PM links check for structural presence. (mailing-list-backed: IfElseLink, 2015)
  • Language bindings: Deep integration with GNU Guile (Scheme) and Python. Scheme is the primary scripting interface for legacy OpenCog; Python via the opencog.atomspace module.
  • Sheaf-theoretic foundations: The opencog/sheaf/ module provides formal mathematical foundations (sheaf axioms, tensor algebra, mereological structures). The intellectual genealogy traces to Link Grammar: Linas argued that LG's connector/section formalism replaces production rules, inference, and deduction with "assembly" β€” a more general, symmetric operation that eliminates forced directionality. (mailing-list-backed: Link-Grammar-influence-on-AtomSpace-design, 2016) The formal bridge from metatrees to sheaves runs through the jigsaw puzzle piece metaphor (Vepstas 2023, Β§10.4): beta-reduction is jigsaw-puzzle assembly β€” connecting a slot variable to a tab value. Connectors have typed slots and tabs that must match types but have opposite "sexes" (directions). Partially assembled jigsaws obey the sheaf axioms. This is not merely an analogy: the ArrowLink (function type) is a special case of a general ConnectorSeq with typed directional connectors, and Link Grammar's connector formalism makes this explicit. The conclusion: "metatrees are naturally typed; those types are naturally reified; the reifications are recursive, and the level of recursion is limited by the imagination." (Provenance: publication, Vepstas β€” "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023; also TODS 2024 submission)
  • Persistence layer: Pluggable via StorageNode/BackingStore API β€” backends include atomspace-rocks(RocksDB), atomspace-cog(network), and atomspace-bridge(SQL(mailing-list-backed: bridge). The BackingStore abstraction is deliberately narrow: developers only implement 3 table structures and 4-5 query methods, estimated at 1-5 weeks of work. PostgreSQL was chosen over Neo4j (10Γ— slower) and NoSQL databases ("absolutely terrible performance" for 50-100 byte atoms). Why-Postgresql-Used, ArangoDB-as-backend-for-atomspace, 2014)

Hyperon Experimental (Rust Reimplementation)

The trueagi-io/hyperon-experimentalRust implementation is the reference for Hyperon's MeTTa:

  • Multi-crate workspace: lib (core MeTTa interpreter), c (C API for Python/foreign bindings), python (hyperon Python package)
  • Four atom variants: Symbol, Variable, Expression, Grounded β€” a cleaner, more minimal type system than the C++ hierarchy
  • Grounded atoms: Foreign objects (Python callables, tensors, file handles) wrapped as first-class atoms queryable through the Space API. This resolves the classical ExecutionLink limitation where do_execute could only return Handles, not TruthValues or arbitrary types. (mailing-list-backed: Semantics of ExecutionLink and GroundedSchemaNode, 2014)
  • Module system: Each module encapsulates a unique Space and Tokenizer, forming a hierarchical namespace. Three import modes and a catalog-based name resolution system manage dependencies. See MeTTa Full for details.
  • Prioritizes flexibility and semantic correctness over raw performance β€” PeTTa/MORK provides the high-performance path
  • First-class DAS support (default feature): lib/Cargo.toml declares metta-bus-client from singnet/das tag 1.0.2 with a default-enabled das feature; new-das! constructs a DistributedAtomSpace and returns a DynSpace (das.rs:156-199). DAS Layer 3 is wired in, not doc-only.

System Interfaces

  • MeTTa: AtomSpace is the execution environment β€” MeTTa programs are graph transformations over AtomSpaces via the Space API.
  • MORK: High-performance backend implementing the Space API with PathMap triemaps, content addressing, and ZAM execution.
  • DAS: Distributed backend implementing the Space API with Redis/MongoDB sharding and Attention Broker.
  • Python: The hyperon Python library provides bidirectional MeTTa-Python interop. Grounded atoms can wrap Python objects and callable code.
  • All PRIMUS components: PLN, MOSES, ECAN, pattern mining, MetaMo β€” everything operates over AtomSpace as the shared substrate.

Implementation Anchors

Status and Resources

Last verified: 2026-06-01

Current Status

  • Operational: Classical C++ AtomSpace (mature, production-grade); Hyperon Experimental Rust reference (active development, Python packages available); Space API abstraction with multiple backends
  • Under development: MORK as primary high-performance Space backend; MeTTa-4 semantic model alignment; Windows Python packages for Hyperon Experimental
  • Proposed: Neural Spaces (DNNs as queryable AtomSpaces); Rholang AtomSpace for decentralized execution; native MeTTa-to-machine-code compilation

Open Problems / Research Directions

  • Convergence of the classical C++ type hierarchy and Hyperon's minimal 4-variant system β€” which atom types should be primitive vs. user-defined?
  • Space API standardization across implementations (MORK, DAS, hyperon-experimental, JeTTa)
  • Neural Space design β€” defining meaningful match/bind/rewrite semantics for embedding-backed AtomSpaces
  • Formal metagraph theory β€” extending the sheaf-theoretic foundations to cover the full Hyperon Space API

Primary Sources

History and design lineage β†’



Discussion