Draft — This content has not been approved for publication.

Responsible: Ben Goertzel (architecture); iCog Labs (data pipelines)

Papers: Hyperon for AGI⇒ASI Whitepaper (2025), §9.3

GitHub: agi-bio (genomics/proteomics), biochatter-metta (NL-to-MeTTa queries), pubchem2metta (PubChem conversion), bio-semantic-parser (biological data parsing), bio-data-semantic-parsing (bio data semantic parsing pipeline)

Status: Active pilot. Data ingestion pipelines operational (agi-bio, biochatter-metta, pubchem2metta). End-to-end hypothesis generation on longevity datasets is a near-term milestone.

Biology is fundamentally structured as graphs: genes connect to proteins, proteins form pathways, pathways influence phenotypes, drugs modulate these relationships. Hyperon's graph-native architecture is well-suited to combining noisy biological graphs, mining meaningful motifs, running uncertain chains of reasoning, and proposing ranked hypotheses with clear rationales.

Data Pipeline

Data flows into AtomSpace through BioSpace adapters that transform omics matrices into node attributes, protein-protein interactions and pathways into edges, literature triples into assertions with provenance, and clinical outcomes into noisy links with confidence scores. Everything receives CIDs, making merges auditable and reproducible. Existing tools include:

  • agi-bio — OpenCog-era genomic and proteomic data exploration (C++/Scheme/Python), extended by MOZI.AI as SingularityNET services
  • biochatter-metta — Converts natural language biomedical questions into MeTTa queries against the Human BioAtomspace knowledge graph using LLMs with BioCypher schema
  • pubchem2metta — Converts PubChem RDF chemical data into MeTTa format via BioCypher adapters
  • bio-semantic-parser — iCog Labs biological data parsing tool for extracting structured representations from biological datasets
  • bio-data-semantic-parsing — iCog Labs pipeline for semantic parsing of biological data into knowledge graph-compatible formats

Proposed Hypothesis Generation Pipeline

The whitepaper describes a pipeline where Pattern Miner identifies motifs (e.g., "gene A ↔ pathway P ↔ phenotype Y with drug D evidence") ranked by I-surprisingness, WILLIAM promotes frequent subgraphs to reusable templates, PLN factor-graphs propagate graded truth over ontologies and experimental results, and MOSES/GEO-EVO evolves predictive programs. TransWeave would move mechanism components across cohorts or omics platforms when matches hold strong.

The proposed output would be ranked hypothesis packs — auditable CID bundles containing mechanism graphs, predictors, expected biomarkers, and counter-evidence — with experiment selection guided by geodesic f·g control.

Key References

  • Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §9.3: Bioinformatics



Discussion