Neural Pattern Mining
Scope
Hyperon-era neural / GNN-based pattern miners, distinct from the symbolic mining tradition. Identified by the Perception/Neural-Symbolic cluster pilot Source 4 (2026-04-30) as a paradigm-distinct third tradition alongside the hyperon-miner trio (symbolic) and the Vepstas perception portfolio (perception/embodiment).
Active Repositories
| Repo | Language | Upstream | HEAD | Maturity | Purpose |
|---|---|---|---|---|---|
| neural-subgraph-matcher-miner | Python (PyTorch + PyG) | rejuve-bio | cfc23f8 | Research / Operational | SPMiner-lineage GNN-geometric subgraph mining adapted for AtomSpace/MeTTa data. R-GCN backbones for relation-aware embedding. |
Lineage and Paradigm
[SPMiner-LINEAGE] -- The repo is project-line descended from the Stanford SNAP neural-subgraph-learning-GNN codebase (the SPMiner paper of Ying et al. ICML 2020). The lineage is project-link, not paper-citation: rejuve-bio does not cite the SPMiner paper in the README, but the code structure (subgraph_mining/ + subgraph_matching/ + common/ directory tree; train / decoder / order embedding module split) directly inherits the SPMiner code organization. Adaptations are AtomSpace-data-side, not algorithm-side: data loaders for .scm/.metta Atomese inputs replace SPMiner's molecular/social-network defaults; the GNN model and order-embedding loss are the SPMiner architecture.
[GNN-NEURAL-MINING] -- The mining is end-to-end GNN-geometric, not symbolic combinatorial. Pattern frequency is approximated via order-embedding scores in a learned latent space, not via direct subgraph-isomorphism enumeration. R-GCN backbones at common/models.py:266-294 handle relation-typed edges -- this is the architectural choice for AtomSpace where edges carry types.
[PARADIGM-DISTINCT-NEURAL] -- This is paradigm-distinct from the hyperon-miner trio symbolic mining tradition. Zero algorithmic, code, or author overlap. The trio implements a MeTTa port of opencog/miner (Pattern-Match-Frequent-Subhypergraph algorithm; deterministic enumeration with pruning); rejuve-bio implements approximate GNN-embedding subgraph search with stochastic gradient training. The two traditions cannot share data structures, intermediate results, or pruning heuristics without a paradigm-bridge layer that does not currently exist.
[STRICTLY-EMPIRICAL] -- The codebase has zero formalization layer: no PLN-style truth-value calculus on top of the GNN scores, no probabilistic-logic interpretation of the order-embedding distance, no proof-theoretic semantics for the learned mining decisions. Patterns are discovered empirically (training loss; validation accuracy on held-out subgraphs) and reported as raw GNN outputs. This is paradigm-aligned with the SPMiner project but contrasts sharply with the trio's MeTTa-runtime evaluator semantics.
Code Structure (verified at cfc23f8)
subgraph_mining/-- search agents directory. Containssearch/subdirectory with agent strategies:base.py,beam.py,greedy.py,mcts.py. (Note: README directory tree at the repo root listssubgraph_mining/search_agents.pybut that file does not exist; actual structure is thesearch/subdirectory just listed. README directory tree is stale relative to HEAD.)subgraph_matching/-- order-embedding training pipeline.train.pyfor the learned embedding;test.pyfor evaluation; alignment with SPMiner's training scaffold.common/-- model definitions and shared utilities.models.py:266-294defines the R-GCN backbone variants. Order-embedding losses also here.data/-- AtomSpace/MeTTa input loaders. The adaptation surface from SPMiner's defaults to Atomese.
Team Authorship
Distinctly iCog/Rejuve team-led, contrasting with the sole-authored Vepstas perception portfolio:
- kedistkid -- 123 commits
- Samrawit -- 65 commits
- DagmawiKK -- 32 commits
- AlexKalll -- 27 commits
(Plus smaller contributors.) This is the same broader iCog/Rejuve organizational sphere as iCog-Labs-Dev (which leads the symbolic miner trio) -- but the team-member overlap between the two efforts is small, consistent with the paradigm-distinct framing.
Stack Identity: MeTTa-runtime adjacent
The repo is Python-side and integrates with AtomSpace/MeTTa data via input loaders. It is not a MeTTa-program-defining repo; it is a Python-program with MeTTa-shaped inputs. Stack-wise, it sits in the Neural-Symbolic and LLM Integration family rather than the AtomSpace-Scheme family that the Vepstas portfolio occupies.
Bidirectional cross-grep across the rejuve-bio neural miner and the AtomSpace-Scheme perception portfolio returned ZERO references in either direction (Codex token list at S5: sensory, opencog/sensory, OllamaNode, neural-subgraph, etc.). The two traditions operate in code-isolation. [PARALLEL-NON-INTEGRATED] holds at the cluster level.
Quick Start
# Clone (depends on PyTorch + PyTorch Geometric)
git clone https://github.com/rejuve-bio/neural-subgraph-matcher-miner.git
cd neural-subgraph-matcher-miner
pip install -r requirements.txt
# Train order embedding on AtomSpace data
python -m subgraph_matching.train --dataset atomspace_sample
# Mine frequent subgraphs with greedy search
python -m subgraph_mining.search.greedy --model_path checkpoints/order_embed.pt
Trilateral Tradition Map (Perception cluster pilot lock)
The Perception/Neural-Symbolic cluster pilot Source 5 (2026-05-01) locked CF5.1: Hyperon-era pattern-mining and perception/neural-symbolic integration is TRILATERAL across three traditions with zero algorithmic, code, or author overlap:
- Tradition 1.5 (Symbolic mining): hyperon-miner trio -- MeTTa port of
opencog/miner; deterministic combinatorial enumeration; iCog-Labs-Dev / SingularityNET Ethiopia leadership; MeTTa-runtime stack. - Tradition 7 candidate (Neural mining):
rejuve-bio/neural-subgraph-matcher-miner(this card) -- SPMiner-lineage GNN-geometric; approximate embedding-based search; iCog/Rejuve team leadership; Python+PyTorch. - Tradition 8 candidate (Perception/embodiment): Vepstas portfolio -- AtomSpace-Scheme stack; sole-authored; runtime-wired sensory atom types but not connected to either mining tradition.
The "candidate" qualifier on Traditions 7 and 8 marks them as cluster-pilot identifications pending broader cross-org sweeps (asi-alliance, fetchai, F1R3FLY-io, Xcceleran-do, gitlab.com/nunet -- all unaudited at this snapshot per CF5.5).
Open Questions
- Cross-tradition bridge. The neural and symbolic traditions emit incompatible outputs (GNN scores vs deterministic frequency counts; latent-space embeddings vs explicit pattern lattices). Whether to design a bridge that translates between them, or to treat them as parallel research portfolios producing distinct artifacts, is an open Hyperon-ecosystem decision.
- Formalization layer. The
[STRICTLY-EMPIRICAL]classification reflects an absence, not a design choice -- there is no in-flight effort to add PLN-style truth-value semantics on top of GNN scores. Whether such a layer is desirable is itself an open research question. - Performance benchmarks. Neither the trio nor rejuve-bio publishes head-to-head benchmarks against each other on shared AtomSpace datasets. Comparative evaluation is impossible at this snapshot.
- R-GCN scaling on Hyperon AtomSpace. R-GCN training cost scales with edge-type cardinality; AtomSpace Atomese has very high edge-type cardinality. Whether the rejuve-bio approach scales to production AtomSpace sizes (hundreds of millions of atoms; thousands of edge types) is an open empirical question.
References
- Cluster-pilot extraction archive:
scripts/archive/perception_pilot/source4_hyperon_neural_perception/. - SPMiner paper (algorithm origin): Ying et al., "Frequent Subgraph Mining by Walking in Order Embedding Space," ICML 2020.
- Sister cards: Sensory (Tradition 8 candidate); ECAN Full → Development and Historical Context (Tradition 1.5 cluster-pilot context).
- Project-line origin: snap-stanford/neural-subgraph-learning-GNN.
Tags
Discussion