Status and Resources

System Interfaces

  • PLN: Semantic parses feed directly into PLN as ground-level atoms for uncertain reasoning. NL-to-MeTTa quality determines PLN's ability to reason about natural language content.
  • Pattern Mining: Mined subgraph patterns from parsed text serve as template keys for Symbolic Heads. A planned feedback loop: pattern miner finds candidates, PLN evaluates them, and the ones that pass guide the next round of mining. (mailing-list-backed: A-very-small-LG-relex-bug, 2014)
  • WILLIAM: Compression-driven mining identifies the most reusable symbolic templates from parsed corpora.
  • MeTTa-Motto: LLM integration library providing the neural NL understanding that complements symbolic parsing.

Implementation Anchors

  • link-grammar — C parser, v5.13.0, multi-language, actively maintained. LGPL.
  • lg-atomese — C++ bridge to AtomSpace, v2.0, production use.
  • learn — Guile Scheme structure learning, active V2 development.
  • matrix — Guile Scheme sparse vector/matrix library for AtomSpace.
  • generate — C++ network generation via sheaf theory.
  • nl2pln_demo — LLM-assisted NL-to-PLN/MeTTa conversion demo.
  • metta-nl-corpus — Dagster pipeline for NL-to-MeTTa annotation with validation.
  • bio-semantic-parser — Full-stack NL-to-MeTTa pipeline for biological data.
  • Legacy: relex(Java, unmaintained since ~2016)

Current Status

  • Operational: Link Grammar parser (v5.13.0); lg-atomese bridge (v2.0); learn/matrix structure learning; nl2pln_demo; bio-semantic-parser
  • Active development: metta-nl-corpus (SNLI-based annotation pipeline); learn Version Two
  • Proposed: SENF canonical forms; Symbolic Transformer Heads; dependent-type NL semantics; full integration of grammar induction with MORK-native pattern mining

Historical Design Rationale (mailing-list-backed, opencog-ml 2014–2022)

  • Link Grammar in C (not Python/Scheme): LG v1.0 dates to 1991 (Temperley & Sleator, Carnegie Mellon). C retained because the core data structure requires direct CPU cache-line tuning for deeply recursive parsing. Current parser ~100× faster than original; Python would be 10-20× slower. (Linas Vepstas, Oct 2021)
  • 5-example grammar learning threshold: Unsupervised grammar induction discovers correct grammatical form with only 5+ word observations. "From first principles, 5 is minimum for beating random chance on MST parses." (Linas Vepstas, Apr 2019)
  • Structure preservation property: The LG→disjuncts pipeline recovers the input grammar at high F1 when fed LG-English parses — establishing that it's "structure-preserving." (Linas Vepstas, Jun 2019)
  • Convergence hypothesis ("Linas Claim"): Non-lexical input converges to the same lexical output as MI-weighted input given sufficient sampling. Requires 10-100× larger training sets for visible convergence. (Linas Vepstas, Jun 2019)
  • Morpho-syntax unity: LG formalism can learn both morphology and syntax "in one gulp." Demonstrated for Tagalog, Hebrew, Amharic. More powerful than FSTs for non-concatenative languages (Semitic). (GSoC 2014, Linas Vepstas)
  • Anaphora as downstream reasoning: Hobbs algorithm operates on AtomSpace output (post-RelEx), not in the parser. Separation enables integration with PLN for selection restriction filtering. (Hujie Wang, May 2014)
  • Word Grammar algebraic formalization: Ben Goertzel's algebraic formalization of WG maps directly to SHIQ description logic. Richard Hudson (WG creator) engaged directly, publishing a rethink of WG word order rules in the Journal of Linguistics. (Algebraic-view-of-word-grammar, 2014)
  • Instance/lemma representation tension: R2L must use word instances ("Mike@111", "eats@222") to distinguish different events, but SuReal needs lemmas with tense/number features for morphological output. This tension is what SENF normalization is designed to resolve. (sureal-and-normalization, 2016)

Open Problems / Research Directions

  • Defining SENF formally — what rewrite rules normalize NL variations to canonical MeTTa form?
  • Scaling Symbolic Heads from proposed design to demonstrated system with mined AtomSpace templates
  • Validating NL-to-MeTTa conversion quality at scale — the metta-nl-corpus pipeline addresses this but needs larger gold-standard datasets
  • Bridging Link Grammar's typed links to MeTTa's type system — enabling the legacy parser to feed directly into Hyperon reasoning
  • Dependent-type representations for NL semantics — formalizing the Curry-Howard grounding approach
  • Unified parsing-reasoning convergence — can PLN-based Word Grammar parsing become practical with MORK-level performance?

Primary Sources

  • Goertzel, B., Suarez-Madrigal, A., Yu, S. (2020). Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities. arXiv:2005.12533.
  • Sleator, D. and Temperley, D. (1993). Parsing English with a Link Grammar. CMU-CS-91-196.
  • Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §7.2: Symbolic Heads.