Bio-AI and Cheminformatics

Draft — This content has not been approved for publication.

Scope

Repositories applying Hyperon/OpenCog to biological data analysis, molecular chemistry, and biomedical knowledge graph construction. These repos demonstrate the ecosystem's capacity for real-world scientific data processing — converting genomic, proteomic, and chemical datasets into AtomSpace/MeTTa representations for reasoning.

Active Repositories

Hyperon-Era Pipelines (MeTTa / Python)

RepoLanguageUpstreamMaturityPurpose
biochatter-mettaPythoniCog-Labs-DevOperationalNL-to-MeTTa query converter for Human BioAtomspace knowledge graph via OpenAI LLMs + BioCypher. Linux only.
biochatter-metta-clientVue.js / TypeScriptiCog-Labs-DevOperationalWeb frontend for BioChatter MeTTa chat application.
biochatter-metta-serverPython (Django)iCog-Labs-DevOperationalDjango REST backend handling NL-to-MeTTa query conversion and chat sessions. Requires OPENAI_API_KEY.
bio-semantic-parserPython (FastAPI) + ReactiCog-Labs-DevOperationalFull-stack pipeline converting GEO/PubMed data into structured MeTTa and FOL for AtomSpace. Docker-first with pytest tests.
bio-data-semantic-parsingPython (Jupyter)iCog-Labs-DevExperimentalLLM experiments parsing biological datasets (DrugAge, GEO) into FOL, PLN predicates, and MeTTa. Notebook-driven research workspace.
pubchem2mettaPythoniCog-Labs-DevOperationalPubChem RDF Turtle → MeTTa converter via BioCypher adapters. Outputs nodes/edges to metta_out/.

Legacy OpenCog (C++ / Scheme)

RepoLanguageUpstreamMaturityPurpose
agi-bioC++ / Scheme / PythonopencogLegacyGenomic and proteomic research using OpenCog (MOSES, PLN, pattern mining) for bioinformatics. Requires full classic OpenCog stack.
cheminformaticsC++ / Scheme / CythonopencogLegacyMolecular chemistry in AtomSpace with compiled atom types and Scheme workflows. Minimal content.

How They Fit Together

The Hyperon-era Bio-AI repos form a coherent data ingestion → query → reasoning pipeline:

  1. Data ingestion: pubchem2metta converts chemical data (PubChem RDF) into MeTTa; bio-semantic-parser converts genomic/literature data (GEO, PubMed) into MeTTa and FOL
  2. Query interface: biochatter-metta + client + server provide an NL chat interface that converts user questions into MeTTa queries against the resulting knowledge graphs
  3. Research: bio-data-semantic-parsing provides notebook-based experimentation for new data sources and parsing strategies

The legacy repos (agi-bio, cheminformatics) operated on the same principle but used the classical C++ AtomSpace + MOSES/PLN stack. The Hyperon-era repos use Python + LLMs + BioCypher for data conversion, which is faster to develop but less formally grounded.

All Hyperon-era repos in this family are from iCog-Labs-Dev.

Quick Start

# biochatter-metta (NL-to-MeTTa query, requires OPENAI_API_KEY)
cd biochatter-metta && pip install -r requirements.txt && python3 main.py

# bio-semantic-parser (full-stack, Docker)
cd bio-semantic-parser/Backend && docker compose build && docker compose up

# pubchem2metta (PubChem → MeTTa)
cd pubchem2metta && poetry install && poetry shell && python create_knowledge_graph.py

# Legacy agi-bio (requires full OpenCog stack)
cd agi-bio && mkdir build && cd build && cmake .. && make -j4

Excluded from This Family

  • mcp-xp: Galaxy bioinformatics chatbot with MCP integration — placed in Neural-Symbolic and LLM Integration as it primarily demonstrates MCP/LLM patterns rather than bio-specific data processing.

Gaps and Consolidation Opportunities

  • biochatter-metta is three repos: Core + client + server could potentially be a monorepo for simpler deployment.
  • No PLN reasoning over bio data: The ingestion pipelines produce MeTTa knowledge graphs, but no demonstration connects them to PLN for actual biological inference.
  • BioCypher dependency: Multiple repos depend on BioCypher for schema mapping — changes upstream could affect the whole family.
  • Legacy repos require deep OpenCog stack: agi-bio needs cogutil → atomspace → ure → MOSES, making casual exploration difficult.



Discussion