Status and Resources
System Interfaces
- PLN: Semantic parses feed directly into PLN as ground-level atoms for uncertain reasoning. NL-to-MeTTa quality determines PLN's ability to reason about natural language content.
- Pattern Mining: Mined subgraph patterns from parsed text serve as template keys for Symbolic Heads. A planned feedback loop: pattern miner finds candidates, PLN evaluates them, and the ones that pass guide the next round of mining. (mailing-list-backed: A-very-small-LG-relex-bug, 2014)
- WILLIAM: Compression-driven mining identifies the most reusable symbolic templates from parsed corpora.
- MeTTa-Motto: LLM integration library providing the neural NL understanding that complements symbolic parsing.
Implementation Anchors
- link-grammar — C parser, v5.13.0, multi-language, actively maintained. LGPL.
- lg-atomese — C++ bridge to AtomSpace, v2.0, production use.
- learn — Guile Scheme structure learning, active V2 development.
- matrix — Guile Scheme sparse vector/matrix library for AtomSpace.
- generate — C++ network generation via sheaf theory.
- nl2pln_demo — LLM-assisted NL-to-PLN/MeTTa conversion demo.
- metta-nl-corpus — Dagster pipeline for NL-to-MeTTa annotation with validation.
- bio-semantic-parser — Full-stack NL-to-MeTTa pipeline for biological data.
- Legacy: relex(Java, unmaintained since ~2016)
Current Status
- Operational: Link Grammar parser (v5.13.0); lg-atomese bridge (v2.0); learn/matrix structure learning; nl2pln_demo; bio-semantic-parser
- Active development: metta-nl-corpus (SNLI-based annotation pipeline); learn Version Two
- Proposed: SENF canonical forms; Symbolic Transformer Heads; dependent-type NL semantics; full integration of grammar induction with MORK-native pattern mining
Historical Design Rationale (mailing-list-backed, opencog-ml 2014–2022)
- Link Grammar in C (not Python/Scheme): LG v1.0 dates to 1991 (Temperley & Sleator, Carnegie Mellon). C retained because the core data structure requires direct CPU cache-line tuning for deeply recursive parsing. Current parser ~100× faster than original; Python would be 10-20× slower. (Linas Vepstas, Oct 2021)
- 5-example grammar learning threshold: Unsupervised grammar induction discovers correct grammatical form with only 5+ word observations. "From first principles, 5 is minimum for beating random chance on MST parses." (Linas Vepstas, Apr 2019)
- Structure preservation property: The LG→disjuncts pipeline recovers the input grammar at high F1 when fed LG-English parses — establishing that it's "structure-preserving." (Linas Vepstas, Jun 2019)
- Convergence hypothesis ("Linas Claim"): Non-lexical input converges to the same lexical output as MI-weighted input given sufficient sampling. Requires 10-100× larger training sets for visible convergence. (Linas Vepstas, Jun 2019)
- Morpho-syntax unity: LG formalism can learn both morphology and syntax "in one gulp." Demonstrated for Tagalog, Hebrew, Amharic. More powerful than FSTs for non-concatenative languages (Semitic). (GSoC 2014, Linas Vepstas)
- Anaphora as downstream reasoning: Hobbs algorithm operates on AtomSpace output (post-RelEx), not in the parser. Separation enables integration with PLN for selection restriction filtering. (Hujie Wang, May 2014)
- Word Grammar algebraic formalization: Ben Goertzel's algebraic formalization of WG maps directly to SHIQ description logic. Richard Hudson (WG creator) engaged directly, publishing a rethink of WG word order rules in the Journal of Linguistics. (Algebraic-view-of-word-grammar, 2014)
- Instance/lemma representation tension: R2L must use word instances ("Mike@111", "eats@222") to distinguish different events, but SuReal needs lemmas with tense/number features for morphological output. This tension is what SENF normalization is designed to resolve. (sureal-and-normalization, 2016)
Open Problems / Research Directions
- Defining SENF formally — what rewrite rules normalize NL variations to canonical MeTTa form?
- Scaling Symbolic Heads from proposed design to demonstrated system with mined AtomSpace templates
- Validating NL-to-MeTTa conversion quality at scale — the metta-nl-corpus pipeline addresses this but needs larger gold-standard datasets
- Bridging Link Grammar's typed links to MeTTa's type system — enabling the legacy parser to feed directly into Hyperon reasoning
- Dependent-type representations for NL semantics — formalizing the Curry-Howard grounding approach
- Unified parsing-reasoning convergence — can PLN-based Word Grammar parsing become practical with MORK-level performance?
Primary Sources
- Goertzel, B., Suarez-Madrigal, A., Yu, S. (2020). Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities. arXiv:2005.12533.
- Sleator, D. and Temperley, D. (1993). Parsing English with a Link Grammar. CMU-CS-91-196.
- Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §7.2: Symbolic Heads.