AtomSpace Full

Approved by Ursula Addison on 2026-05-07

Home / About Hyperon / AtomSpace / AtomSpace Full

Responsible: Linas Vepstas (classical AtomSpace — historical architect; now independently maintains the opencog/* repos outside the Hyperon project), Vitaly Bogdanov, Alexey Potapov (Hyperon Experimental), Ben Goertzel (architecture)

Papers: Goertzel (2025), Hyperon Whitepaper §2.1–2.2; Goertzel et al. (2023), OpenCog Hyperon; Vepstas (2023), Graphs, Metagraphs, RAM, CPU v2.1.1 (also TODS 2024 submission); Vepstas, Sheaves series (sheaf docs)

Status: Current. The AtomSpace concept is implemented across a four-layer taxonomy (locked in by the AtomSpace Backend Integration cluster pilot, closed 2026-04-29): Layer 1 Classical AtomSpace (opencog/atomspace, mature C++/Scheme/Python, v5.0+) maintained independently by Linas Vepstas; Layer 2 Hyperon Space (trueagi-io/hyperon-experimental, Rust reimplementation with deep Python integration); Layer 3 DAS (distributed AtomSpace, see DAS Full); Layer 4 MORK (high-performance triemap kernel, see MORK Full). The abstract Space API enabling multiple backend implementations is operational. See the Implementations subcard below for the cluster-pilot lock-in section with per-layer reconciled findings.

This card provides technical depth beyond the concise AtomSpace index card. AtomSpace is Hyperon's universal knowledge substrate — a typed, content-addressed metagraph that co-locates symbols, tensors, truth values, motives, and edit operations, enabling distinct cognitive processes to interoperate over one shared memory and control plane. For high-performance storage internals see MORK Full, and for distributed deployment see DAS Full.

Related cards: MORK Full (high-performance backend), DAS Full (distributed backend), PathMap (foundational trie substrate), AtomSpace Backend Integration (synthesis / Phase-3-and-Phase-4 integration plan), PLN Full (reasoning over AtomSpace), MeTTa Full (execution language), OpenCog Legacy Full (historical evolution)

Core Concept and Data Model

The Metagraph

Formal definition:

\[\mathcal{M} = (\mathcal{A},\; \mathcal{T},\; \tau)\]

Variables: $\mathcal{A}$ = set of atoms, $\mathcal{T}$ = type lattice, $\tau : \mathcal{A} \to \mathcal{T}$ = typing function
Meaning: A metagraph generalizes hypergraphs by allowing edges to contain other edges recursively — any atom can appear inside any expression, enabling arbitrary nesting depth
Source: Goertzel (2025), Hyperon Whitepaper §2.1; Vepstas, AtomSpace Design Notes

Atoms come in four variants (in Hyperon Experimental's formalization):

Symbol: Named, globally unique identifiers (e.g., Human, Mortal, +)
Variable: Bindable placeholders prefixed with $ (e.g., $x, $result)
Expression: Ordered tuples of atoms $(a_1, a_2, \ldots, a_n)$ where each $a_i \in \mathcal{A}$ — this recursion is what makes it a metagraph rather than a flat graph
Grounded: Opaque handles wrapping external data or callable code (Python objects, neural model references, file handles)

Foundational Design Principle: Total State Visibility

The AtomSpace was designed around a single organizing constraint: "all OpenCog state is in the Atomspace. There isn't any state that isn't in the AtomSpace; it can't be found under a rock, or tucked away in some object." This principle — that all state should be visible to all algorithms — extends distributed-computing discipline (where state must be locatable and transportable) to AI reasoning systems. Learning algorithms, logical inference engines, and data mining processes all access a unified, visible state container rather than maintaining hidden internal caches. The schema-free "anything goes" hypergraph structure contrasts with SQL's pre-declared tables, though schemas can be optionally declared via the type system. (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)

Content Addressing and Self-Normalization

Each atom is identified by what it contains — structurally identical atoms are the same atom. Vepstas (2023) demonstrates formally that this content-addressed s-expression representation is ~4× more compact than UUID-based in-RAM pointer representation (48 bytes vs. 184 bytes for a representative metatree). UUIDs are rejected as fundamentally flawed for distributed metagraph storage: they require either a centralized issuing authority (bottleneck) or cryptographic hashes (128–192 bits, expensive to compute), while s-expressions are self-identifying — "anyone can mint it at any time, at very low cost" with no centralized authority needed. Compressed with standard algorithms, s-expression files outperform UUID-based formats by a wide margin. A further formal result: metagraphs are self-normalizing — the normalization problem that consumes vast effort in relational database design "comes for free" with metatrees, because the hierarchical structure inherently avoids the duplication that SQL normalization addresses. (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023)

Each atom is identified by what it contains — structurally identical atoms are the same atom. In MORK, this is realized via trie paths (hash-consing); in the classical AtomSpace, via a global atom table with UUID indexing. Content addressing enables automatic deduplication: identical subexpressions are stored once and referenced many times.

Code Is Data

Because MeTTa programs are themselves expressions in AtomSpace, there is no distinction between code and data. Atomese was explicitly designed to be "used by algorithms, not by humans" — like GIMPLE/GIL/LLVM IR but general purpose. "It's like a macro language that you can expand over and over" (Linas Vepstas). Rules and the KR language are the same language; rules can modify rules; the query language is itself a graph stored in the database. This design philosophy — algorithmic consumption over human readability — is the core reason MeTTa exists as a human-facing layer atop the graph substrate. (mailing-list-backed: Code-as-data-programs-universal-knowledge-base, 2016)

Hypergraph Indexing Advantage

Hypergraph storage is demonstrably more RAM-efficient than ordinary graph storage (Neo4j, property graphs). The key insight: in SQL/NoSQL databases, indexes are hidden and inaccessible — internal-use-only by the DB. In AtomSpace, incoming/outgoing sets are the indexes, user-visible and walkable. "When you use a graph DB, you get direct access to 'indexes' as user-visible and user-controllable objects." The Zipfian square-root profile of real datasets (genomics, Wikipedia) amplifies this advantage. (mailing-list-backed: Atomspace-RAM-and-CPU-usage, 2014)

Values and Space API

The Atom / Value Distinction

Atoms are the graph structure — immutable, globally unique, typed, and indexed. They represent relationships, categories, rules, and long-term stable knowledge. Atoms are heavy-weight objects designed for structural queries and pattern matching. Think of them as the "plumbing." Atom creation costs ~tens of microseconds (indexed).

Values are mutable vectors attached to Atoms via a key-value store. They are not indexed, not globally unique, and designed to be small, fast, and fleeting. Values hold truth values, probabilities, streaming sensor data, attention weights, and any other rapidly-changing metadata. Think of them as the "fluid in the pipes." Values have no indexing overhead.

This separation is a deliberate performance decision: the graph structure changes slowly (adding a new concept or relationship), while valuations change rapidly (updating a confidence score after new evidence). Indexing only the structure keeps the pattern matcher fast even as values churn. For DNN integration, this means tensor data (activations, weights) should use custom Value classes (e.g., TensorFlowValue), while the Pattern Matcher accesses Values indirectly through predicates rather than direct search. The conceptual bridge: "conscious processes over Atoms, subconscious processes over Values." (mailing-list-backed: OpenCog-DNNs-PPLs-Atoms-vs-Values, 2018)

The TruthValue to FloatValue Transition

Original design: Every atom carried a SimpleTruthValue $\langle s, c \rangle \in [0,1]^2$ — strength and confidence — central to PLN reasoning. This led to proliferating specialized types: FuzzyTruthValue, DistributionalTruthValue, IndefiniteTruthValue.

Problem identified: "Complex multiple inheritance relations" among proliferating TV types, plus most AtomSpace calculations needing crisp boolean operations, not probabilistic truth values. Mandatory TruthValues "hurt performance and cluttered the API." The transition was blocked for years by unsolved serialization: without a serialize/deserialize proposal, PropertyMaps were "a non-starter." (mailing-list-backed: Replacing-TV-and-AV-objects-with-property-maps, 2015)

Resolution: TruthValues were generalized to FloatValue — generic vectors $\mathbf{v} \in \mathbb{R}^n$ of arbitrary dimension. Update formulas moved out of C++ into Atomese arithmetic, making the value algebra programmable rather than hardcoded. In Hyperon, the PLN truth value algebra is implemented in MeTTa rather than baked into the storage layer.

The Space API

Hyperon abstracts AtomSpace behind the Space API — a universal interface that any backend can implement. Core operations:

match(pattern, space): Find all substitutions $\sigma$ such that $\sigma(\text{pattern})$ exists in the space.
add(atom, space) / remove(atom, space): Modify the space contents.
rewrite: Match a pattern and replace with a template under the computed substitution — the fundamental MeTTa evaluation step.

Multiple Space implementations coexist:

MORK-backed: High-performance in-RAM triemap for local reasoning
DAS-backed: Distributed storage across Redis/MongoDB clusters
Neural Spaces: DNNs wrapped as queryable AtomSpaces — matching returns approximate nearest neighbors in embedding space
Rholang AtomSpace: Capability-secured execution on ASI Chain for decentralized cognitive processes
In-memory (reference): Simple implementation in Hyperon Experimental for development and testing

MeTTa code is largely Space-independent — the same program can target different backends by naming different Spaces.

Implementations

Four-Layer AtomSpace Taxonomy (Cluster-Pilot Lock-In, 2026-04-29)

"AtomSpace" at this point in time is not a single coherent backend — the ecosystem spans four distinct implementation layers with different repos, runtime characteristics, and Decko-integration suitability. Future agents reading "AtomSpace" without qualification should resolve which layer is meant before drawing conclusions.

#	Layer	Repos / Evidence	Decko relevance
1	Classical AtomSpace StorageNode	opencog/atomspace+ atomspace-storage+ atomspace-pgres+ atomspace-rocks+ atomspace-cog+ atomspace-bridge	Best read-side SQL import ancestor; not Decko-write-ready
2	Hyperon Space	trueagi-io/hyperon-experimental:lib_spaces.metta GroundingSpace / SpaceMut / DynSpace;	MeTTa-facing demos; not primary Decko backend
3	DAS AtomDB + services	singnet/das: AtomDB + Query Engine + AttentionBroker + agents; MorkDB	Candidate later query/deployment layer; delete + server-pin caveats
4	MORK native substrate	trueagi-io/MORK+mork_ffi PathMap + + SDK + server branch	Performance substrate; requires adapter layer for Decko semantics

Source: AtomSpace Backend Integration Cluster Pilot (2026-04-29) — R4.J1 lock-in across Sources 1-4; cluster archive at scripts/archive/atomspace_pilot/. The H4 sections that follow give per-layer detail (Classical → Layer 1; Hyperon Experimental → Layer 2). Layer 3 (DAS) is detailed at DAS Full; Layer 4 (MORK native) is detailed at MORK Full.

Classical AtomSpace Architecture (C++)

The opencog/atomspaceC++ implementation provides the mature, battle-tested realization:

Atom type hierarchy: ~150 predefined types (Node, Link, ConceptNode, PredicateNode, EvaluationLink, etc.) organized in a class hierarchy. New types can be defined at runtime.
Pattern matching engine: Two-phase execution — compilation extracts variables and builds connectivity maps; execution via PatternMatchEngine with callback mixins for custom match semantics. Type checking and pattern matching are theoretically identical operations — "there is absolutely zero theoretical difference" for first-order types, though higher-order types require additional logical deduction (Linas Vepstas, 2014). This equivalence directly influenced Hyperon's approach of unifying type checking and matching. (mailing-list-backed: Atomspace-type-checking, 2014)
PLN/PM semantic duality: PLN link types and Pattern Matcher link types form a Kripke semantics pair: "BindLink is the Kripke equivalent of ImplicationLink." PLN links operate on truth values (probabilistic), while PM links check for structural presence. (mailing-list-backed: IfElseLink, 2015)
Language bindings: Deep integration with GNU Guile (Scheme) and Python. Scheme is the primary scripting interface for legacy OpenCog; Python via the opencog.atomspace module.
Sheaf-theoretic foundations: The opencog/sheaf/ module provides formal mathematical foundations (sheaf axioms, tensor algebra, mereological structures). The intellectual genealogy traces to Link Grammar: Linas argued that LG's connector/section formalism replaces production rules, inference, and deduction with "assembly" — a more general, symmetric operation that eliminates forced directionality. (mailing-list-backed: Link-Grammar-influence-on-AtomSpace-design, 2016) The formal bridge from metatrees to sheaves runs through the jigsaw puzzle piece metaphor (Vepstas 2023, §10.4): beta-reduction is jigsaw-puzzle assembly — connecting a slot variable to a tab value. Connectors have typed slots and tabs that must match types but have opposite "sexes" (directions). Partially assembled jigsaws obey the sheaf axioms. This is not merely an analogy: the ArrowLink (function type) is a special case of a general ConnectorSeq with typed directional connectors, and Link Grammar's connector formalism makes this explicit. The conclusion: "metatrees are naturally typed; those types are naturally reified; the reifications are recursive, and the level of recursion is limited by the imagination." (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023; also TODS 2024 submission)
Persistence layer: Pluggable via StorageNode/BackingStore API — backends include atomspace-rocks(RocksDB), atomspace-cog(network), and atomspace-bridge(SQL(mailing-list-backed: bridge). The BackingStore abstraction is deliberately narrow: developers only implement 3 table structures and 4-5 query methods, estimated at 1-5 weeks of work. PostgreSQL was chosen over Neo4j (10× slower) and NoSQL databases ("absolutely terrible performance" for 50-100 byte atoms). Why-Postgresql-Used, ArangoDB-as-backend-for-atomspace, 2014)

Hyperon Experimental (Rust Reimplementation)

The trueagi-io/hyperon-experimentalRust implementation is the reference for Hyperon's MeTTa:

Multi-crate workspace: lib (core MeTTa interpreter), c (C API for Python/foreign bindings), python (hyperon Python package)
Four atom variants: Symbol, Variable, Expression, Grounded — a cleaner, more minimal type system than the C++ hierarchy
Grounded atoms: Foreign objects (Python callables, tensors, file handles) wrapped as first-class atoms queryable through the Space API. This resolves the classical ExecutionLink limitation where do_execute could only return Handles, not TruthValues or arbitrary types. (mailing-list-backed: Semantics of ExecutionLink and GroundedSchemaNode, 2014)
Module system: Each module encapsulates a unique Space and Tokenizer, forming a hierarchical namespace. Three import modes and a catalog-based name resolution system manage dependencies. See MeTTa Full for details.
Prioritizes flexibility and semantic correctness over raw performance — PeTTa/MORK provides the high-performance path
First-class DAS support (default feature): lib/Cargo.toml declares metta-bus-client from singnet/das tag 1.0.2 with a default-enabled das feature; new-das! constructs a DistributedAtomSpace and returns a DynSpace (das.rs:156-199). DAS Layer 3 is wired in, not doc-only.

System Interfaces

MeTTa: AtomSpace is the execution environment — MeTTa programs are graph transformations over AtomSpaces via the Space API.
MORK: High-performance backend implementing the Space API with PathMap triemaps, content addressing, and ZAM execution.
DAS: Distributed backend implementing the Space API with Redis/MongoDB sharding and Attention Broker.
Python: The hyperon Python library provides bidirectional MeTTa-Python interop. Grounded atoms can wrap Python objects and callable code.
All PRIMUS components: PLN, MOSES, ECAN, pattern mining, MetaMo — everything operates over AtomSpace as the shared substrate.

Implementation Anchors

atomspace (Layer 1, classical C++) — Mature implementation with ~150 atom types, pattern matching, Scheme/Python bindings, sheaf-theoretic foundations, pluggable persistence.
hyperon-experimental (Layer 2, Rust reference) — Multi-crate workspace with minimal 4-variant atom system, deep Python integration, reference MeTTa interpreter, default DAS feature.
Storage backends (Layer 1 family): atomspace-rocks(RocksDB), atomspace-cog(network), atomspace-storage(base API), atomspace-bridge(read-only SQL→Atomese loader)
Visualization: atomspace-viz(HTML/JS), atomspace-typescript(React/TypeScript)

Design Evolution and Performance

Design Evolution: What Was Tried and Why (mailing-list-backed, opencog-ml 2014–2023)

The current AtomSpace design was shaped by a decade of experimentation with alternatives, each abandoned for specific technical reasons:

Atoms immutable by design (2014): "Easiest and best way to support multi-threading — making them mutable would require locks and crazy logic in all sorts of obscure places" (Linas Vepstas). Identical atoms deduplicate to single instance automatically. The formal argument (Vepstas 2023) is stronger than convenience: because a metatree may be shared as a subtree of many larger trees, editing any node requires deciding what happens to all containing trees — the only consistent solution is copy-on-write, making immutability necessary, not merely desirable. Immutable metatrees can be traversed lock-free even while other threads create or delete. The mutable form (the top-level master index over immutable subtrees) is the "database" — in OpenCog, this is the AtomSpace. (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023)
IPFS backend abandoned (2019): Code-complete, 6/7 tests passing, but fundamentally unsuitable — centralized index, DHT queries taking minutes, only hundreds of atoms/sec vs. 100K+ in-RAM. IPFS is "surprisingly terrible" for this use case.
OpenDHT (Kademlia) abandoned (2020): Hashing atoms across the planet destroys locality of reference. Solution: use DHT for indexes only, serve actual atoms via "seeders" (BitTorrent-style).
UUID-based identity rejected (2021): Requires central authority (bottleneck), creates ~30% RAM overhead for lookup tables. Solution: use atom name directly (globally unique, easy to compute).
Serialization overhead is the primary bottleneck (2020): Postgres with ZeroMQ/protobuf: ~100 atoms/sec. Neo4j: 95% CPU spent serializing. ASCII file reader: ~100K atoms/sec. Raw in-RAM: 700K nodes/sec. "Converting 12-byte objects into other representations has just a huge overhead." Conclusion: "Placing atoms into a database is pointless and useless" for active reasoning.
Fractional indexing at O(1) (2020): AtomSpace maintains per-atom incoming/outgoing sets rather than global indexes. Adding one atom updates O(1) fractional indexes, vs. commercial DBs' O(N log K). Three index entries per binary link. Cost: ~632 bytes/atom in RAM (MOZI dataset: 7M atoms = 4.3 GB) vs. 55 bytes as s-expressions.
Natural chunking via recursive incoming sets (2020): "Given atom X, the natural chunk is the entire recursive incoming set of X." Hypergraphs have natural boundaries unlike regular graphs which snowball. This insight eventually informed MORK's ShardZipper partitioning.
Automatic alpha-conversion was contentious (2017): Silently renaming bound variables on scope-link insertion caused practical problems for URE and PLN developers. The eventual conclusion: the chainer should do alpha-conversion on the fly, not the AtomSpace on insertion. (Is-automatic-alpha-conversion-evil)
Contextual AtomSpaces proposed (2014): An AtomSpace could have an associated context atom, so all contents would implicitly be in that context. This foreshadowed Hyperon's multi-Space architecture but was not implemented in OpenCog Classic. (Contextual Atomspaces)

What survived: immutable atoms, name-based identity, fractional indexes, pattern matcher as core query engine, s-expression serialization, no eventual consistency requirement. AtomSpace Frames (2022) added snapshot changesets for inference context, implemented atop RocksDB.

Production validation: The classical AtomSpace has "been used in production systems, pumping through tens of billions of Atoms in dozens of threads, with run-times extending into weeks, without crashing." (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)

Rejected serialization formats: RESTful APIs, ZeroMQ, Neo4J, Protocol Buffers, and JSON were all evaluated and rejected because "Atoms are tiny, and converting them from native Atomese to other formats is a giant waste of CPU time." RocksDB succeeded by storing bare s-expression strings directly — lossless compression achieves "a few dozen bytes" per atom, making 100M-atom databases only "a few GBytes." (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)

Threading scaling (classical): Thread-safety via C++ std::shared_ptr<> with atomic reference counting, constrained by CPU cache-line availability for hardware atomic locks. Observed scaling: AMD Opteron 12-core achieved only 3× speedup (4 hardware locks); AMD Ryzen 5 3400G achieved 8×; AMD Ryzen 9 3900X achieved only 7× on 24 threads — illustrating the diminishing returns of cache-line contention. (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)

Hyperon scalability targets: Current baseline is ~100 million atoms per live instance and ~1 billion atoms storable via StorageNode (~50GB file), with half-a-dozen networked AtomSpaces via ProxyNode. The Hyperon redesign targets "going beyond these current limits" with static pattern matching using free variables in both queries and knowledge base entries — "substantially different from the current query engine" and enabling "efficient distributed implementation." A key unresolved design question: whether to implement only distributed AtomSpace, only distributed episodic memory (via grounded atoms), or both as separate container types. (Provenance: official-site, wiki.opencog.org— Hyperon:Atomspace design notes)

Performance Architecture (mailing-list-backed)

Performance observations from the classical AtomSpace that informed Hyperon's design:

Atom add/remove throughput: ~500K atoms/second in-RAM. The callback-notification mechanism for atom changes was "in the critical performance path" capable of 10-100× slowdown if misused. (Atomspace-dynamics-visualisation, 2014)
Pattern Matcher O(N²) on numerical domains: PM works "very slowly" for NumberNode/GreaterThanLink queries because it enumerates all pairs for virtual links. Proposed solutions: SMT-style delegation to specialized solvers, SpaceServer integration for spatial queries, and cover trees for numerical indexing. (Pattern-Matching-performance, 2018)
No systematic benchmarks existed: An accidental 5-10× performance regression went unnoticed for months. Linas revealed "I accidentally slowed down the atomspace performance by maybe 5x or 10x and basically, no one noticed." (Performance-benchmarks, 2018)
Parallelization: most time in callbacks, not the matcher: "The time actually spent in the pattern matcher tends to be small, compared to the time spent in the callback." Running large independent queries in parallel is more effective than micro-parallelizing the matcher itself. This insight directly influenced MORK's parallel architecture. (Contributing-to-Parallelizing-Pattern-Matcher, 2018)
Pattern mining vs. pattern matching: Using the PM for mining (estimated 96,351 hours for 2-gram mining of a 2M-atom Wikipedia corpus) was a fundamental algorithmic mismatch — PM is designed for "find X such that P(X)" while mining requires "find P such that count(P) is large." (A-disappointing-evaluation, 2014)
Sparse graph memory bottleneck: A 5.3M node graph with Zipfian edge distribution requires one hash-table or btree access per float multiply during matrix operations. The 4KB paging MMU granularity is a fundamental bottleneck — the insight that motivates MORK's locality-optimized triemap layout. (Perfect-architecture-for-OpenCog, 2018)
TinkerPop/Gremlin comparison: A Gremlin traversal equals a single-clause pattern match; AtomSpace's multi-clause queries, typed links, and hypergraph support give it fundamental advantages. Marketing was identified as the core adoption barrier: Neo4j had 466K Google results vs. AtomSpace's 4K. (Graph-Traversal-Machine-Close-Encounters, 2018)

Status and Resources

Current Status

Operational: Classical C++ AtomSpace (mature, production-grade); Hyperon Experimental Rust reference (active development, Python packages available); Space API abstraction with multiple backends
Under development: MORK as primary high-performance Space backend; MeTTa-4 semantic model alignment; Windows Python packages for Hyperon Experimental
Proposed: Neural Spaces (DNNs as queryable AtomSpaces); Rholang AtomSpace for decentralized execution; native MeTTa-to-machine-code compilation

Open Problems / Research Directions

Convergence of the classical C++ type hierarchy and Hyperon's minimal 4-variant system — which atom types should be primitive vs. user-defined?
Space API standardization across implementations (MORK, DAS, hyperon-experimental, JeTTa)
Neural Space design — defining meaningful match/bind/rewrite semantics for embedding-backed AtomSpaces
Formal metagraph theory — extending the sheaf-theoretic foundations to cover the full Hyperon Space API

Primary Sources

Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §2.1–2.2: AtomSpace, Space API.
Goertzel, B. et al. (2023). OpenCog Hyperon: A Framework for AGI at the Human Level and Beyond.
Vepstas, L. AtomSpace Design Notes (ram-cpu.pdf). opencog/atomspace/opencog/sheaf/docs/.