Neural Pattern Mining

Approved by Ursula Addison on 2026-05-15

Scope

Hyperon-era neural / GNN-based pattern miners, distinct from the symbolic mining tradition. Identified by the Perception/Neural-Symbolic cluster pilot Source 4 (2026-04-30) as a paradigm-distinct third tradition alongside the hyperon-miner trio (symbolic) and the Vepstas perception portfolio (perception/embodiment).

Active Repositories

RepoLanguageUpstreamHEADMaturityPurpose
neural-subgraph-matcher-minerPython (PyTorch + PyG)rejuve-biocfc23f8Research / OperationalSPMiner-lineage GNN-geometric subgraph mining adapted for AtomSpace/MeTTa data. R-GCN backbones for relation-aware embedding.

Lineage and Paradigm

[SPMiner-LINEAGE] -- The repo is project-line descended from the Stanford SNAP neural-subgraph-learning-GNN codebase (the SPMiner paper of Ying et al. ICML 2020). The lineage is project-link, not paper-citation: rejuve-bio does not cite the SPMiner paper in the README, but the code structure (subgraph_mining/ + subgraph_matching/ + common/ directory tree; train / decoder / order embedding module split) directly inherits the SPMiner code organization. Adaptations are AtomSpace-data-side, not algorithm-side: data loaders for .scm/.metta Atomese inputs replace SPMiner's molecular/social-network defaults; the GNN model and order-embedding loss are the SPMiner architecture.

[GNN-NEURAL-MINING] -- The mining is end-to-end GNN-geometric, not symbolic combinatorial. Pattern frequency is approximated via order-embedding scores in a learned latent space, not via direct subgraph-isomorphism enumeration. R-GCN backbones at common/models.py:266-294 handle relation-typed edges -- this is the architectural choice for AtomSpace where edges carry types.

[PARADIGM-DISTINCT-NEURAL] -- This is paradigm-distinct from the hyperon-miner trio symbolic mining tradition. Zero algorithmic, code, or author overlap. The trio implements a MeTTa port of opencog/miner (Pattern-Match-Frequent-Subhypergraph algorithm; deterministic enumeration with pruning); rejuve-bio implements approximate GNN-embedding subgraph search with stochastic gradient training. The two traditions cannot share data structures, intermediate results, or pruning heuristics without a paradigm-bridge layer that does not currently exist.

[STRICTLY-EMPIRICAL] -- The codebase has zero formalization layer: no PLN-style truth-value calculus on top of the GNN scores, no probabilistic-logic interpretation of the order-embedding distance, no proof-theoretic semantics for the learned mining decisions. Patterns are discovered empirically (training loss; validation accuracy on held-out subgraphs) and reported as raw GNN outputs. This is paradigm-aligned with the SPMiner project but contrasts sharply with the trio's MeTTa-runtime evaluator semantics.

Code Structure (verified at cfc23f8)

  • subgraph_mining/ -- search agents directory. Contains search/ subdirectory with agent strategies: base.py, beam.py, greedy.py, mcts.py. (Note: README directory tree at the repo root lists subgraph_mining/search_agents.py but that file does not exist; actual structure is the search/ subdirectory just listed. README directory tree is stale relative to HEAD.)
  • subgraph_matching/ -- order-embedding training pipeline. train.py for the learned embedding; test.py for evaluation; alignment with SPMiner's training scaffold.
  • common/ -- model definitions and shared utilities. models.py:266-294 defines the R-GCN backbone variants. Order-embedding losses also here.
  • data/ -- AtomSpace/MeTTa input loaders. The adaptation surface from SPMiner's defaults to Atomese.

Team Authorship

Distinctly iCog/Rejuve team-led, contrasting with the sole-authored Vepstas perception portfolio:

  • kedistkid -- 123 commits
  • Samrawit -- 65 commits
  • DagmawiKK -- 32 commits
  • AlexKalll -- 27 commits

(Plus smaller contributors.) This is the same broader iCog/Rejuve organizational sphere as iCog-Labs-Dev (which leads the symbolic miner trio) -- but the team-member overlap between the two efforts is small, consistent with the paradigm-distinct framing.

Stack Identity: MeTTa-runtime adjacent

The repo is Python-side and integrates with AtomSpace/MeTTa data via input loaders. It is not a MeTTa-program-defining repo; it is a Python-program with MeTTa-shaped inputs. Stack-wise, it sits in the Neural-Symbolic and LLM Integration family rather than the AtomSpace-Scheme family that the Vepstas portfolio occupies.

Bidirectional cross-grep across the rejuve-bio neural miner and the AtomSpace-Scheme perception portfolio returned ZERO references in either direction (Codex token list at S5: sensory, opencog/sensory, OllamaNode, neural-subgraph, etc.). The two traditions operate in code-isolation. [PARALLEL-NON-INTEGRATED] holds at the cluster level.

Quick Start

# Clone (depends on PyTorch + PyTorch Geometric)
git clone https://github.com/rejuve-bio/neural-subgraph-matcher-miner.git
cd neural-subgraph-matcher-miner
pip install -r requirements.txt

# Train order embedding on AtomSpace data
python -m subgraph_matching.train --dataset atomspace_sample

# Mine frequent subgraphs with greedy search
python -m subgraph_mining.search.greedy --model_path checkpoints/order_embed.pt

Trilateral Tradition Map (Perception cluster pilot lock)

The Perception/Neural-Symbolic cluster pilot Source 5 (2026-05-01) locked CF5.1: Hyperon-era pattern-mining and perception/neural-symbolic integration is TRILATERAL across three traditions with zero algorithmic, code, or author overlap:

  • Tradition 1.5 (Symbolic mining): hyperon-miner trio -- MeTTa port of opencog/miner; deterministic combinatorial enumeration; iCog-Labs-Dev / SingularityNET Ethiopia leadership; MeTTa-runtime stack.
  • Tradition 7 candidate (Neural mining): rejuve-bio/neural-subgraph-matcher-miner (this card) -- SPMiner-lineage GNN-geometric; approximate embedding-based search; iCog/Rejuve team leadership; Python+PyTorch.
  • Tradition 8 candidate (Perception/embodiment): Vepstas portfolio -- AtomSpace-Scheme stack; sole-authored; runtime-wired sensory atom types but not connected to either mining tradition.

The "candidate" qualifier on Traditions 7 and 8 marks them as cluster-pilot identifications pending broader cross-org sweeps (asi-alliance, fetchai, F1R3FLY-io, Xcceleran-do, gitlab.com/nunet -- all unaudited at this snapshot per CF5.5).

Open Questions

  • Cross-tradition bridge. The neural and symbolic traditions emit incompatible outputs (GNN scores vs deterministic frequency counts; latent-space embeddings vs explicit pattern lattices). Whether to design a bridge that translates between them, or to treat them as parallel research portfolios producing distinct artifacts, is an open Hyperon-ecosystem decision.
  • Formalization layer. The [STRICTLY-EMPIRICAL] classification reflects an absence, not a design choice -- there is no in-flight effort to add PLN-style truth-value semantics on top of GNN scores. Whether such a layer is desirable is itself an open research question.
  • Performance benchmarks. Neither the trio nor rejuve-bio publishes head-to-head benchmarks against each other on shared AtomSpace datasets. Comparative evaluation is impossible at this snapshot.
  • R-GCN scaling on Hyperon AtomSpace. R-GCN training cost scales with edge-type cardinality; AtomSpace Atomese has very high edge-type cardinality. Whether the rejuve-bio approach scales to production AtomSpace sizes (hundreds of millions of atoms; thousands of edge types) is an open empirical question.

References



Discussion