Knowledge Substrates

Draft — This content has not been approved for publication.

Scope

All repositories implementing knowledge storage, retrieval, and distribution β€” the AtomSpace concept and its backends. For theoretical foundations, see AtomSpace Full, MORK Full, and DAS Full.

Active Repositories

RepoLanguageRoleMaturityPurpose
atomspaceC++Classical implementationOperationalProduction-grade in-RAM hypergraph with pattern matching, ~150 atom types, Scheme/Python bindings. Requires cogutil.
MORKRust (nightly)High-performance kernelOperationalTriemap-based engine with ZAM, bidirectional matching, MM2. 7-crate workspace. Requires sibling PathMap checkout.
dasC++ + PythonDistributed layerOperationalDistributed AtomSpace with Redis/MongoDB backends, Attention Broker, and cognitive query agents. Bazel/Docker build.
atomspace-storageC++Base storage APIOperationalStorageNode interface (v4.3.0). File, JSON, Prolog, MeTTa, CSV I/O. ProxyNodes for mirroring/caching. Required by all storage backends.
atomspace-rocksC++Local persistenceOperationalRocksDB backend (v1.6.1). Single-user, single-host. Two modes: full DAG (RocksStorageNode) and simple (MonoStorageNode).
atomspace-cogC++Network distributionOperationalNetwork client (v1.2.0) for CogServer. Multi-threaded (4 sockets). Frame support incomplete.
atomspace-bridgeC++SQL bridgeOperationalBidirectional PostgreSQL-to-AtomSpace bridge (v0.2.1). Motivated by FlyBase genome database use case.
das-metta-parserC (Flex/Bison)DAS ingestionOperationalParses MeTTa files into MongoDB/Redis for DAS. Docker-based build.
das-toolboxPythonDAS CLI toolingOperationaldas-cli for infrastructure management (containers, OpenFaaS, MeTTa operations).
mork_ffiRust + CMORK-Prolog bridgeOperational~150 lines exposing MORK to SWI-Prolog: add-atoms, remove-atoms, match, mm2-exec. Used by PeTTa.
CZ2Scala 3Triemap researchExperimentalPrefix-compressed triemap toolkit (v0.2.17). Cross-compiles to JVM/JS/Native. Inspired by Peyton Jones's paper.
MM2_Structuring_CodeRust / MM2MM2 tutorialExperimental30+ progressive examples for MORK's MM2 dataflow language. Requires MORK binary.
mork-rust-sdkRustMORK API clientExperimentalRust client SDK for MORK API. iCog Labs.
mork-ts-sdkTypeScriptMORK API clientOperationalTypeScript client SDK for MORK HTTP API. iCog Labs.
faiss_ffiRust + CVector similarity bridgeOperationalFAISS vector similarity FFI for Prolog/MeTTa. Creates atom-indexed vector spaces for similarity-based retrieval.
generateC++ / GuileGraph generationExperimentalConstraint-guided network synthesis using sheaf theory and jigsaw-puzzle connector semantics. Generates parse trees, deduction chains, pathways. Requires cogutil + atomspace. Independently maintained.
opencog-cyclPythonKB ingestionExperimentalCycL-to-Atomese translator mapping OpenCyc knowledge base entries into AtomSpace. Script-based pipeline. Early-stage research.
atomese-simdC++ / OpenCL / CUDAGPU compute bridgeExperimentalBridges Atomese symbolic descriptions to GPU/SIMD hardware via sensory-motor agency model. Generates Atomese IDL for GPU kernel introspection. Built on the sensory system. Independently maintained.
cogserverC++Network service layerIndependently maintainedNetwork server providing telnet/WebSocket/HTTP access to AtomSpace. Server half of atomspace-cog. Independently maintained by original OpenCog contributors outside the Hyperon project.
atomspace-pgresC++SQL persistence (deprecated)Legacy / DeprecatedPostgreSQL persistent backend for AtomSpace. Superseded by atomspace-rocks. Independently maintained.

How They Fit Together

This family has a clear layered architecture with two parallel lineages:

Classical lineage (OpenCog C++): These repos are independently maintained by original OpenCog contributors, separate from the Hyperon project's active development.

cogutil β†’ atomspace β†’ atomspace-storage β†’ atomspace-rocks (local)
                                         β†’ atomspace-cog (network)
                                         β†’ atomspace-bridge (SQL)

Build order matters: cogutil must be installed first, then atomspace, then atomspace-storage, then any specific backend. All use CMake with the same mkdir build && cd build && cmake .. && make -j pattern.

Hyperon lineage (Rust/distributed):

MORK (+ sibling PathMap) ← mork_ffi β†’ PeTTa (Prolog compiler)
                         ← CZ2 (Scala research prototype)
                         ← MM2_Structuring_Code (tutorial)

das ← das-metta-parser (ingestion)
    ← das-toolbox (CLI management)
    ← MORK (planned high-performance backend)

The two lineages are bridged by the Space API abstraction β€” MeTTa code can target either lineage via named Spaces.

Extensions and ingestion tools: generate extends AtomSpace with constraint-guided graph synthesis (sheaf theory). opencog-cycl converts external knowledge bases (CycL/OpenCyc) into Atomese. atomese-simd extends AtomSpace to GPU hardware via Atomese IDL, building on the sensory system. These are independently maintained research extensions rather than core infrastructure.

Quick Start

# Classical AtomSpace (requires cogutil installed first)
cd atomspace && mkdir build && cd build && cmake .. && make -j
sudo make install && sudo ldconfig

# MORK (requires nightly Rust + sibling PathMap checkout)
cd MORK && cargo +nightly build --workspace --release
cargo +nightly test --workspace

# DAS (Docker-based)
cd das && make build-all    # Builds all components
make test-all               # Requires running AtomDB services

# MORK FFI for PeTTa
cd mork_ffi && RUSTFLAGS="-C target-cpu=native" cargo build -p mork_ffi --release
./build.sh

Living Documentation

Active development decisions for MORK, DAS, and the broader substrate layer are discussed in weekly team calls. These transcripts capture the why behind implementation choices β€” design trade-offs, performance benchmarks, integration priorities β€” that the code and commit history alone do not preserve.

  • MORKification Weekly β€” Primary development log for MORK. Covers triemap architecture decisions, ZAM evolution, MM2 dataflow design, PathMap integration, and performance trade-offs.
  • Magi Weekly β€” Broader project coordination touching DAS integration, substrate boundary decisions, and infrastructure planning.
  • MeTTa Study Group β€” Language-level discussions that inform substrate API requirements β€” Space semantics, type system interactions, and grounding patterns.

For agents and future contributors: these transcripts are the best source for understanding why the current architecture looks the way it does, especially for decisions that predate the current codebase state.

Current State vs. Whitepaper

  • MORK as primary substrate (whitepaper Β§2.3): Operational for local in-RAM processing via PeTTa. The reported 500M+ atom scale is on powerful development hardware with PeTTa/MORK integration.
  • DAS + MORK integration (whitepaper Β§2.5): Under development. DAS handles distributed persistence; MORK handles hot compute. The boundary definition is an active research question.
  • Neural Spaces (whitepaper Β§2.2): Proposed β€” no implementation exists wrapping DNNs as queryable AtomSpaces.
  • ShardZipper (whitepaper): Proposed Merkle-based distributed state management for MORK. Not yet implemented.
  • ByteFlow GPU acceleration (whitepaper): Proposed adaptive block packing for dense tensors in MORK. atomese-simd represents an earlier, independent approach to GPU integration via Atomese descriptions, but differs from the ByteFlow vision.

Forks and Mirrors

  • MORK forks: trueagi-io/MORK is canonical. A local mirror tracks ngeiswei/MORK (experimental fork).
  • atomspace-pgres: Deprecated PostgreSQL backend, superseded by atomspace-rocks. Still exists in the reference collection.
  • atomspace-gpu: Experimental OpenCL/CUDA AtomSpace β€” a related but distinct effort from atomese-simd. Neither is actively developed.
  • atomspace mirrors: A local mirror tracks a contributor fork of opencog/atomspace.

Explicitly excluded: Visualization tools (atomspace-viz, atomspace-typescript, atomspace-explorer) are developer debugging aids, not storage substrates. They may warrant a future "Developer Tools" family card.

Recommended Entry Points

  • Learning AtomSpace concepts: Start with the classical C++ atomspace README β€” it has the clearest explanation of the Atom/Value distinction.
  • High-performance MeTTa: Use PeTTa with mork_ffi for MORK-backed execution.
  • Distributed deployment: Use das with das-toolbox for infrastructure management.
  • Learning MM2: Work through MM2_Structuring_Code's 28 progressive examples.
  • Triemap research: CZ2 provides a clean Scala 3 implementation without MORK's Rust nightly requirements.

Gaps and Consolidation Opportunities

  • No unified Space API test suite: The Space API is conceptual β€” no conformance tests verify that MORK, DAS, and hyperon-experimental implement the same interface.
  • atomspace-cog frame support incomplete: Network-distributed frames don't fully work yet.
  • MORK requires sibling PathMap checkout: This external dependency isn't documented in all places and can surprise new developers.
  • DAS Bazel build is Docker-only: No native build path documented for DAS outside Docker containers.
  • GPU integration fragmented: atomspace-gpu and atomese-simd represent two different approaches to GPU acceleration β€” neither is active, and neither aligns with the whitepaper's ByteFlow vision.



Discussion