About Hyperon+AtomSpace+AtomSpace Full+Design Evolution and Performance

Atoms immutable by design (2014): "Easiest and best way to support multi-threading — making them mutable would require locks and crazy logic in all sorts of obscure places" (Linas Vepstas). Identical atoms deduplicate to single instance automatically. The formal argument (Vepstas 2023) is stronger than convenience: because a metatree may be shared as a subtree of many larger trees, editing any node requires deciding what happens to all containing trees — the only consistent solution is copy-on-write, making immutability necessary, not merely desirable. Immutable metatrees can be traversed lock-free even while other threads create or delete. The mutable form (the top-level master index over immutable subtrees) is the "database" — in OpenCog, this is the AtomSpace. (Provenance: publication, Vepstas — "Graphs, Metagraphs, RAM, CPU" v2.1.1, 2023)

IPFS backend abandoned (2019): Code-complete, 6/7 tests passing, but fundamentally unsuitable — centralized index, DHT queries taking minutes, only hundreds of atoms/sec vs. 100K+ in-RAM. IPFS is "surprisingly terrible" for this use case.

OpenDHT (Kademlia) abandoned (2020): Hashing atoms across the planet destroys locality of reference. Solution: use DHT for indexes only, serve actual atoms via "seeders" (BitTorrent-style).

UUID-based identity rejected (2021): Requires central authority (bottleneck), creates ~30% RAM overhead for lookup tables. Solution: use atom name directly (globally unique, easy to compute).

Serialization overhead is the primary bottleneck (2020): Postgres with ZeroMQ/protobuf: ~100 atoms/sec. Neo4j: 95% CPU spent serializing. ASCII file reader: ~100K atoms/sec. Raw in-RAM: 700K nodes/sec. "Converting 12-byte objects into other representations has just a huge overhead." Conclusion: "Placing atoms into a database is pointless and useless" for active reasoning.

Fractional indexing at O(1) (2020): AtomSpace maintains per-atom incoming/outgoing sets rather than global indexes. Adding one atom updates O(1) fractional indexes, vs. commercial DBs' O(N log K). Three index entries per binary link. Cost: ~632 bytes/atom in RAM (MOZI dataset: 7M atoms = 4.3 GB) vs. 55 bytes as s-expressions.

Natural chunking via recursive incoming sets (2020): "Given atom X, the natural chunk is the entire recursive incoming set of X." Hypergraphs have natural boundaries unlike regular graphs which snowball. This insight eventually informed MORK's ShardZipper partitioning.

Automatic alpha-conversion was contentious (2017): Silently renaming bound variables on scope-link insertion caused practical problems for URE and PLN developers. The eventual conclusion: the chainer should do alpha-conversion on the fly, not the AtomSpace on insertion. (Is-automatic-alpha-conversion-evil)

Contextual AtomSpaces proposed (2014): An AtomSpace could have an associated context atom, so all contents would implicitly be in that context. This foreshadowed Hyperon's multi-Space architecture but was not implemented in OpenCog Classic. (Contextual Atomspaces)