Neural Pattern Mining

Approved by Ursula Addison on 2026-05-15

Contributors: kedistkid, Samrawit (neural-pattern-mining), DagmawiKK, AlexKalll

Scope

Hyperon-era neural / GNN-based pattern miners, distinct from the symbolic mining tradition. Identified by the Perception/Neural-Symbolic cluster pilot Source 4 (2026-04-30) as a paradigm-distinct third tradition alongside the hyperon-miner trio (symbolic) and the Vepstas perception portfolio (perception/embodiment).

Active Repositories

Repo	Language	Upstream	HEAD	Maturity	Purpose
neural-subgraph-matcher-miner	Python (PyTorch + PyG)	rejuve-bio	`cfc23f8`	Research / Operational	SPMiner-lineage GNN-geometric subgraph mining adapted for AtomSpace/MeTTa data. R-GCN backbones for relation-aware embedding.

Lineage and Paradigm

[SPMiner-LINEAGE] -- The repo is project-line descended from the Stanford SNAP neural-subgraph-learning-GNN codebase (the SPMiner paper of Ying et al. ICML 2020). The lineage is project-link, not paper-citation: rejuve-bio does not cite the SPMiner paper in the README, but the code structure (subgraph_mining/ + subgraph_matching/ + common/ directory tree; train / decoder / order embedding module split) directly inherits the SPMiner code organization. Adaptations are AtomSpace-data-side, not algorithm-side: data loaders for .scm/.metta Atomese inputs replace SPMiner's molecular/social-network defaults; the GNN model and order-embedding loss are the SPMiner architecture.

[GNN-NEURAL-MINING] -- The mining is end-to-end GNN-geometric, not symbolic combinatorial. Pattern frequency is approximated via order-embedding scores in a learned latent space, not via direct subgraph-isomorphism enumeration. R-GCN backbones at common/models.py:266-294 handle relation-typed edges -- this is the architectural choice for AtomSpace where edges carry types.

[PARADIGM-DISTINCT-NEURAL] -- This is paradigm-distinct from the hyperon-miner trio symbolic mining tradition. Zero algorithmic, code, or author overlap. The trio implements a MeTTa port of opencog/miner (Pattern-Match-Frequent-Subhypergraph algorithm; deterministic enumeration with pruning); rejuve-bio implements approximate GNN-embedding subgraph search with stochastic gradient training. The two traditions cannot share data structures, intermediate results, or pruning heuristics without a paradigm-bridge layer that does not currently exist.

[STRICTLY-EMPIRICAL] -- The codebase has zero formalization layer: no PLN-style truth-value calculus on top of the GNN scores, no probabilistic-logic interpretation of the order-embedding distance, no proof-theoretic semantics for the learned mining decisions. Patterns are discovered empirically (training loss; validation accuracy on held-out subgraphs) and reported as raw GNN outputs. This is paradigm-aligned with the SPMiner project but contrasts sharply with the trio's MeTTa-runtime evaluator semantics.

Code Structure (verified at `cfc23f8`)

subgraph_mining/ -- search agents directory. Contains search/ subdirectory with agent strategies: base.py, beam.py, greedy.py, mcts.py. (Note: README directory tree at the repo root lists subgraph_mining/search_agents.py but that file does not exist; actual structure is the search/ subdirectory just listed. README directory tree is stale relative to HEAD.)
subgraph_matching/ -- order-embedding training pipeline. train.py for the learned embedding; test.py for evaluation; alignment with SPMiner's training scaffold.
common/ -- model definitions and shared utilities. models.py:266-294 defines the R-GCN backbone variants. Order-embedding losses also here.
data/ -- AtomSpace/MeTTa input loaders. The adaptation surface from SPMiner's defaults to Atomese.

Team Authorship

Distinctly iCog/Rejuve team-led, contrasting with the sole-authored Vepstas perception portfolio:

kedistkid -- 123 commits
Samrawit -- 65 commits
DagmawiKK -- 32 commits
AlexKalll -- 27 commits

(Plus smaller contributors.) This is the same broader iCog/Rejuve organizational sphere as iCog-Labs-Dev (which leads the symbolic miner trio) -- but the team-member overlap between the two efforts is small, consistent with the paradigm-distinct framing.

Stack Identity: MeTTa-runtime adjacent

The repo is Python-side and integrates with AtomSpace/MeTTa data via input loaders. It is not a MeTTa-program-defining repo; it is a Python-program with MeTTa-shaped inputs. Stack-wise, it sits in the Neural-Symbolic and LLM Integration family rather than the AtomSpace-Scheme family that the Vepstas portfolio occupies.

Bidirectional cross-grep across the rejuve-bio neural miner and the AtomSpace-Scheme perception portfolio returned ZERO references in either direction (Codex token list at S5: sensory, opencog/sensory, OllamaNode, neural-subgraph, etc.). The two traditions operate in code-isolation. [PARALLEL-NON-INTEGRATED] holds at the cluster level.

Quick Start

# Clone (depends on PyTorch + PyTorch Geometric)
git clone https://github.com/rejuve-bio/neural-subgraph-matcher-miner.git
cd neural-subgraph-matcher-miner
pip install -r requirements.txt

# Train order embedding on AtomSpace data
python -m subgraph_matching.train --dataset atomspace_sample

# Mine frequent subgraphs with greedy search
python -m subgraph_mining.search.greedy --model_path checkpoints/order_embed.pt

Trilateral Tradition Map (Perception cluster pilot lock)

The Perception/Neural-Symbolic cluster pilot Source 5 (2026-05-01) locked CF5.1: Hyperon-era pattern-mining and perception/neural-symbolic integration is TRILATERAL across three traditions with zero algorithmic, code, or author overlap:

Tradition 1.5 (Symbolic mining): hyperon-miner trio -- MeTTa port of opencog/miner; deterministic combinatorial enumeration; iCog-Labs-Dev / SingularityNET Ethiopia leadership; MeTTa-runtime stack.
Tradition 7 candidate (Neural mining): rejuve-bio/neural-subgraph-matcher-miner (this card) -- SPMiner-lineage GNN-geometric; approximate embedding-based search; iCog/Rejuve team leadership; Python+PyTorch.
Tradition 8 candidate (Perception/embodiment): Vepstas portfolio -- AtomSpace-Scheme stack; sole-authored; runtime-wired sensory atom types but not connected to either mining tradition.

The "candidate" qualifier on Traditions 7 and 8 marks them as cluster-pilot identifications pending broader cross-org sweeps (asi-alliance, fetchai, F1R3FLY-io, Xcceleran-do, gitlab.com/nunet -- all unaudited at this snapshot per CF5.5).

Open Questions

Cross-tradition bridge. The neural and symbolic traditions emit incompatible outputs (GNN scores vs deterministic frequency counts; latent-space embeddings vs explicit pattern lattices). Whether to design a bridge that translates between them, or to treat them as parallel research portfolios producing distinct artifacts, is an open Hyperon-ecosystem decision.
Formalization layer. The [STRICTLY-EMPIRICAL] classification reflects an absence, not a design choice -- there is no in-flight effort to add PLN-style truth-value semantics on top of GNN scores. Whether such a layer is desirable is itself an open research question.
Performance benchmarks. Neither the trio nor rejuve-bio publishes head-to-head benchmarks against each other on shared AtomSpace datasets. Comparative evaluation is impossible at this snapshot.
R-GCN scaling on Hyperon AtomSpace. R-GCN training cost scales with edge-type cardinality; AtomSpace Atomese has very high edge-type cardinality. Whether the rejuve-bio approach scales to production AtomSpace sizes (hundreds of millions of atoms; thousands of edge types) is an open empirical question.

References

Cluster-pilot extraction archive: scripts/archive/perception_pilot/source4_hyperon_neural_perception/.
SPMiner paper (algorithm origin): Ying et al., "Frequent Subgraph Mining by Walking in Order Embedding Space," ICML 2020.
Sister cards: Sensory (Tradition 8 candidate); ECAN Full → Development and Historical Context (Tradition 1.5 cluster-pilot context).
Project-line origin: snap-stanford/neural-subgraph-learning-GNN.

Neural Pattern Mining

Scope

Active Repositories

Lineage and Paradigm

Code Structure (verified at `cfc23f8`)

Team Authorship

Stack Identity: MeTTa-runtime adjacent

Quick Start

Trilateral Tradition Map (Perception cluster pilot lock)

Open Questions

References

Tags

Discussion

Neural Pattern Mining

Scope

Active Repositories

Lineage and Paradigm

Code Structure (verified at cfc23f8)

Team Authorship

Stack Identity: MeTTa-runtime adjacent

Quick Start

Trilateral Tradition Map (Perception cluster pilot lock)

Open Questions

References

Tags

Discussion

Code Structure (verified at `cfc23f8`)