Home / About Hyperon / AtomSpace / AtomSpace Full
Responsible: Linas Vepstas (classical AtomSpace — historical architect; now independently maintains the opencog/* repos outside the Hyperon project), Vitaly Bogdanov, Alexey Potapov (Hyperon Experimental), Ben Goertzel (architecture)
Papers: Goertzel (2025), Hyperon Whitepaper §2.1–2.2; Goertzel et al. (2023), OpenCog Hyperon; Vepstas (2023), Graphs, Metagraphs, RAM, CPU v2.1.1 (also TODS 2024 submission); Vepstas, Sheaves series (sheaf docs)
Status: Current. The AtomSpace concept is implemented across a four-layer taxonomy (locked in by the AtomSpace Backend Integration cluster pilot, closed 2026-04-29): Layer 1 Classical AtomSpace (opencog/atomspace, mature C++/Scheme/Python, v5.0+) maintained independently by Linas Vepstas; Layer 2 Hyperon Space (trueagi-io/hyperon-experimental, Rust reimplementation with deep Python integration); Layer 3 DAS (distributed AtomSpace, see DAS Full); Layer 4 MORK (high-performance triemap kernel, see MORK Full). The abstract Space API enabling multiple backend implementations is operational. See the Implementations subcard below for the cluster-pilot lock-in section with per-layer reconciled findings.
This card provides technical depth beyond the concise AtomSpace index card. AtomSpace is Hyperon's universal knowledge substrate — a typed, content-addressed metagraph that co-locates symbols, tensors, truth values, motives, and edit operations, enabling distinct cognitive processes to interoperate over one shared memory and control plane. For high-performance storage internals see MORK Full, and for distributed deployment see DAS Full.
Related cards: MORK Full (high-performance backend), DAS Full (distributed backend), PathMap (foundational trie substrate), AtomSpace Backend Integration (synthesis / Phase-3-and-Phase-4 integration plan), PLN Full (reasoning over AtomSpace), MeTTa Full (execution language), OpenCog Legacy Full (historical evolution)
Formal definition:
\[\mathcal{M} = (\mathcal{A},\; \mathcal{T},\; \tau)\]Atoms come in four variants (in Hyperon Experimental's formalization):
Human, Mortal, +)$ (e.g., $x, $result)The AtomSpace was designed around a single organizing constraint: "all OpenCog state is in the Atomspace. There isn't any state that isn't in the AtomSpace; it can't be found under a rock, or tucked away in some object." This principle — that all state should be visible to all algorithms — extends distributed-computing discipline (where state must be locatable and transportable) to AI reasoning systems. Learning algorithms, logical inference engines, and data mining processes all access a unified, visible state container rather than maintaining hidden internal caches. The schema-free "anything goes" hypergraph structure contrasts with SQL's pre-declared tables, though schemas can be optionally declared via the type system. (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)
Each atom is identified by what it contains — structurally identical atoms are the same atom. Vepstas (2023) demonstrates formally that this content-addressed s-expression representation is ~4× more compact than UUID-based in-RAM pointer representation (48 bytes vs. 184 bytes for a representative metatree). UUIDs are rejected as fundamentally flawed for distributed metagraph storage: they require either a centralized issuing authority (bottleneck) or cryptographic hashes (128–192 bits, expensive to compute), while s-expressions are self-identifying — "anyone can mint it at any time, at very low cost" with no centralized authority needed. Compressed with standard algorithms, s-expression files outperform UUID-based formats by a wide margin. A further formal result: metagraphs are self-normalizing — the normalization problem that consumes vast effort in relational database design "comes for free" with metatrees, because the hierarchical structure inherently avoids the duplication that SQL normalization addresses. (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023)
Each atom is identified by what it contains — structurally identical atoms are the same atom. In MORK, this is realized via trie paths (hash-consing); in the classical AtomSpace, via a global atom table with UUID indexing. Content addressing enables automatic deduplication: identical subexpressions are stored once and referenced many times.
Because MeTTa programs are themselves expressions in AtomSpace, there is no distinction between code and data. Atomese was explicitly designed to be "used by algorithms, not by humans" — like GIMPLE/GIL/LLVM IR but general purpose. "It's like a macro language that you can expand over and over" (Linas Vepstas). Rules and the KR language are the same language; rules can modify rules; the query language is itself a graph stored in the database. This design philosophy — algorithmic consumption over human readability — is the core reason MeTTa exists as a human-facing layer atop the graph substrate. (mailing-list-backed: Code-as-data-programs-universal-knowledge-base, 2016)
Hypergraph storage is demonstrably more RAM-efficient than ordinary graph storage (Neo4j, property graphs). The key insight: in SQL/NoSQL databases, indexes are hidden and inaccessible — internal-use-only by the DB. In AtomSpace, incoming/outgoing sets are the indexes, user-visible and walkable. "When you use a graph DB, you get direct access to 'indexes' as user-visible and user-controllable objects." The Zipfian square-root profile of real datasets (genomics, Wikipedia) amplifies this advantage. (mailing-list-backed: Atomspace-RAM-and-CPU-usage, 2014)
Atoms are the graph structure — immutable, globally unique, typed, and indexed. They represent relationships, categories, rules, and long-term stable knowledge. Atoms are heavy-weight objects designed for structural queries and pattern matching. Think of them as the "plumbing." Atom creation costs ~tens of microseconds (indexed).
Values are mutable vectors attached to Atoms via a key-value store. They are not indexed, not globally unique, and designed to be small, fast, and fleeting. Values hold truth values, probabilities, streaming sensor data, attention weights, and any other rapidly-changing metadata. Think of them as the "fluid in the pipes." Values have no indexing overhead.
This separation is a deliberate performance decision: the graph structure changes slowly (adding a new concept or relationship), while valuations change rapidly (updating a confidence score after new evidence). Indexing only the structure keeps the pattern matcher fast even as values churn. For DNN integration, this means tensor data (activations, weights) should use custom Value classes (e.g., TensorFlowValue), while the Pattern Matcher accesses Values indirectly through predicates rather than direct search. The conceptual bridge: "conscious processes over Atoms, subconscious processes over Values." (mailing-list-backed: OpenCog-DNNs-PPLs-Atoms-vs-Values, 2018)
Original design: Every atom carried a SimpleTruthValue \(\langle s, c \rangle \in [0,1]^2\) — strength and confidence — central to PLN reasoning. This led to proliferating specialized types: FuzzyTruthValue, DistributionalTruthValue, IndefiniteTruthValue.
Problem identified: "Complex multiple inheritance relations" among proliferating TV types, plus most AtomSpace calculations needing crisp boolean operations, not probabilistic truth values. Mandatory TruthValues "hurt performance and cluttered the API." The transition was blocked for years by unsolved serialization: without a serialize/deserialize proposal, PropertyMaps were "a non-starter." (mailing-list-backed: Replacing-TV-and-AV-objects-with-property-maps, 2015)
Resolution: TruthValues were generalized to FloatValue — generic vectors \(\mathbf{v} \in \mathbb{R}^n\) of arbitrary dimension. Update formulas moved out of C++ into Atomese arithmetic, making the value algebra programmable rather than hardcoded. In Hyperon, the PLN truth value algebra is implemented in MeTTa rather than baked into the storage layer.
Hyperon abstracts AtomSpace behind the Space API — a universal interface that any backend can implement. Core operations:
Multiple Space implementations coexist:
MeTTa code is largely Space-independent — the same program can target different backends by naming different Spaces.
"AtomSpace" at this point in time is not a single coherent backend — the ecosystem spans four distinct implementation layers with different repos, runtime characteristics, and Decko-integration suitability. Future agents reading "AtomSpace" without qualification should resolve which layer is meant before drawing conclusions.
| # | Layer | Repos / Evidence | Decko relevance |
|---|---|---|---|
| 1 | Classical AtomSpace StorageNode | opencog/atomspace+ atomspace-storage+ atomspace-pgres+ atomspace-rocks+ atomspace-cog+ atomspace-bridge | Best read-side SQL import ancestor; not Decko-write-ready |
| 2 | Hyperon Space | trueagi-io/hyperon-experimental:lib_spaces.metta GroundingSpace / SpaceMut / DynSpace; | MeTTa-facing demos; not primary Decko backend |
| 3 | DAS AtomDB + services | singnet/das: AtomDB + Query Engine + AttentionBroker + agents; MorkDB | Candidate later query/deployment layer; delete + server-pin caveats |
| 4 | MORK native substrate | trueagi-io/MORK+mork_ffi PathMap + + SDK + server branch | Performance substrate; requires adapter layer for Decko semantics |
Source: AtomSpace Backend Integration Cluster Pilot (2026-04-29) — R4.J1 lock-in across Sources 1-4; cluster archive at scripts/archive/atomspace_pilot/. The H4 sections that follow give per-layer detail (Classical → Layer 1; Hyperon Experimental → Layer 2). Layer 3 (DAS) is detailed at DAS Full; Layer 4 (MORK native) is detailed at MORK Full.
The opencog/atomspaceC++ implementation provides the mature, battle-tested realization:
opencog.atomspace module.opencog/sheaf/ module provides formal mathematical foundations (sheaf axioms, tensor algebra, mereological structures). The intellectual genealogy traces to Link Grammar: Linas argued that LG's connector/section formalism replaces production rules, inference, and deduction with "assembly" — a more general, symmetric operation that eliminates forced directionality. (mailing-list-backed: Link-Grammar-influence-on-AtomSpace-design, 2016) The formal bridge from metatrees to sheaves runs through the jigsaw puzzle piece metaphor (Vepstas 2023, §10.4): beta-reduction is jigsaw-puzzle assembly — connecting a slot variable to a tab value. Connectors have typed slots and tabs that must match types but have opposite "sexes" (directions). Partially assembled jigsaws obey the sheaf axioms. This is not merely an analogy: the ArrowLink (function type) is a special case of a general ConnectorSeq with typed directional connectors, and Link Grammar's connector formalism makes this explicit. The conclusion: "metatrees are naturally typed; those types are naturally reified; the reifications are recursive, and the level of recursion is limited by the imagination." (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023; also TODS 2024 submission)
The trueagi-io/hyperon-experimentalRust implementation is the reference for Hyperon's MeTTa:
lib/Cargo.toml declares metta-bus-client from singnet/das tag 1.0.2 with a default-enabled das feature; new-das! constructs a DistributedAtomSpace and returns a DynSpace (das.rs:156-199). DAS Layer 3 is wired in, not doc-only.The current AtomSpace design was shaped by a decade of experimentation with alternatives, each abandoned for specific technical reasons:
What survived: immutable atoms, name-based identity, fractional indexes, pattern matcher as core query engine, s-expression serialization, no eventual consistency requirement. AtomSpace Frames (2022) added snapshot changesets for inference context, implemented atop RocksDB.
Production validation: The classical AtomSpace has "been used in production systems, pumping through tens of billions of Atoms in dozens of threads, with run-times extending into weeks, without crashing." (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)
Rejected serialization formats: RESTful APIs, ZeroMQ, Neo4J, Protocol Buffers, and JSON were all evaluated and rejected because "Atoms are tiny, and converting them from native Atomese to other formats is a giant waste of CPU time." RocksDB succeeded by storing bare s-expression strings directly — lossless compression achieves "a few dozen bytes" per atom, making 100M-atom databases only "a few GBytes." (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)
Threading scaling (classical): Thread-safety via C++ std::shared_ptr<> with atomic reference counting, constrained by CPU cache-line availability for hardware atomic locks. Observed scaling: AMD Opteron 12-core achieved only 3× speedup (4 hardware locks); AMD Ryzen 5 3400G achieved 8×; AMD Ryzen 9 3900X achieved only 7× on 24 threads — illustrating the diminishing returns of cache-line contention. (Provenance: official-site, wiki.opencog.org— AtomSpace design notes)
Hyperon scalability targets: Current baseline is ~100 million atoms per live instance and ~1 billion atoms storable via StorageNode (~50GB file), with half-a-dozen networked AtomSpaces via ProxyNode. The Hyperon redesign targets "going beyond these current limits" with static pattern matching using free variables in both queries and knowledge base entries — "substantially different from the current query engine" and enabling "efficient distributed implementation." A key unresolved design question: whether to implement only distributed AtomSpace, only distributed episodic memory (via grounded atoms), or both as separate container types. (Provenance: official-site, wiki.opencog.org— Hyperon:Atomspace design notes)
Performance observations from the classical AtomSpace that informed Hyperon's design: