Bio-AI and Cheminformatics
Draft — This content has not been approved for publication.
Scope
Repositories applying Hyperon/OpenCog to biological data analysis, molecular chemistry, and biomedical knowledge graph construction. These repos demonstrate the ecosystem's capacity for real-world scientific data processing — converting genomic, proteomic, and chemical datasets into AtomSpace/MeTTa representations for reasoning.
Active Repositories
Hyperon-Era Pipelines (MeTTa / Python)
| Repo | Language | Upstream | Maturity | Purpose |
|---|---|---|---|---|
| biochatter-metta | Python | iCog-Labs-Dev | Operational | NL-to-MeTTa query converter for Human BioAtomspace knowledge graph via OpenAI LLMs + BioCypher. Linux only. |
| biochatter-metta-client | Vue.js / TypeScript | iCog-Labs-Dev | Operational | Web frontend for BioChatter MeTTa chat application. |
| biochatter-metta-server | Python (Django) | iCog-Labs-Dev | Operational | Django REST backend handling NL-to-MeTTa query conversion and chat sessions. Requires OPENAI_API_KEY. |
| bio-semantic-parser | Python (FastAPI) + React | iCog-Labs-Dev | Operational | Full-stack pipeline converting GEO/PubMed data into structured MeTTa and FOL for AtomSpace. Docker-first with pytest tests. |
| bio-data-semantic-parsing | Python (Jupyter) | iCog-Labs-Dev | Experimental | LLM experiments parsing biological datasets (DrugAge, GEO) into FOL, PLN predicates, and MeTTa. Notebook-driven research workspace. |
| pubchem2metta | Python | iCog-Labs-Dev | Operational | PubChem RDF Turtle → MeTTa converter via BioCypher adapters. Outputs nodes/edges to metta_out/. |
Legacy OpenCog (C++ / Scheme)
| Repo | Language | Upstream | Maturity | Purpose |
|---|---|---|---|---|
| agi-bio | C++ / Scheme / Python | opencog | Legacy | Genomic and proteomic research using OpenCog (MOSES, PLN, pattern mining) for bioinformatics. Requires full classic OpenCog stack. |
| cheminformatics | C++ / Scheme / Cython | opencog | Legacy | Molecular chemistry in AtomSpace with compiled atom types and Scheme workflows. Minimal content. |
How They Fit Together
The Hyperon-era Bio-AI repos form a coherent data ingestion → query → reasoning pipeline:
- Data ingestion: pubchem2metta converts chemical data (PubChem RDF) into MeTTa; bio-semantic-parser converts genomic/literature data (GEO, PubMed) into MeTTa and FOL
- Query interface: biochatter-metta + client + server provide an NL chat interface that converts user questions into MeTTa queries against the resulting knowledge graphs
- Research: bio-data-semantic-parsing provides notebook-based experimentation for new data sources and parsing strategies
The legacy repos (agi-bio, cheminformatics) operated on the same principle but used the classical C++ AtomSpace + MOSES/PLN stack. The Hyperon-era repos use Python + LLMs + BioCypher for data conversion, which is faster to develop but less formally grounded.
All Hyperon-era repos in this family are from iCog-Labs-Dev.
Quick Start
# biochatter-metta (NL-to-MeTTa query, requires OPENAI_API_KEY)
cd biochatter-metta && pip install -r requirements.txt && python3 main.py
# bio-semantic-parser (full-stack, Docker)
cd bio-semantic-parser/Backend && docker compose build && docker compose up
# pubchem2metta (PubChem → MeTTa)
cd pubchem2metta && poetry install && poetry shell && python create_knowledge_graph.py
# Legacy agi-bio (requires full OpenCog stack)
cd agi-bio && mkdir build && cd build && cmake .. && make -j4
Excluded from This Family
- mcp-xp: Galaxy bioinformatics chatbot with MCP integration — placed in Neural-Symbolic and LLM Integration as it primarily demonstrates MCP/LLM patterns rather than bio-specific data processing.
Gaps and Consolidation Opportunities
- biochatter-metta is three repos: Core + client + server could potentially be a monorepo for simpler deployment.
- No PLN reasoning over bio data: The ingestion pipelines produce MeTTa knowledge graphs, but no demonstration connects them to PLN for actual biological inference.
- BioCypher dependency: Multiple repos depend on BioCypher for schema mapping — changes upstream could affect the whole family.
- Legacy repos require deep OpenCog stack: agi-bio needs cogutil → atomspace → ure → MOSES, making casual exploration difficult.
Tags
Discussion