Human Approved — by Ursula Addison on 2026-05-21

Responsible: Ben Goertzel

Papers: Hyperon for AGI⇒ASI Whitepaper (2025), §5.9; Goertzel (2025), SubRep: Certified Subgoal Representation (draft/deck)

Status: Proposed. Framework for certified subgoal management described in the 2025 whitepaper. Theoretical design; not yet implemented.

SubRep (Subgoal Representation) proposes transforming subgoal and option learning into a disciplined, certifiable practice. Its core mechanisms are formal admission rules, a co-learned decomposition network, and algebraic residuation — together determining when a subgoal genuinely serves a larger purpose and when component solutions can be safely composed.

The Problem

Standard reinforcement learning discovers subgoals by shaping a single scalar reward. This approach is brittle: skills overfit to specific reward signals, transfer negatively to related tasks, and cannot be reliably composed. PRIMUS needs subgoals that work across multiple motives, compose safely, and serve both neural controllers and symbolic reasoners.

Core Mechanisms

SubRep's design centers on four elements described in the whitepaper and supporting glossary:

  • CDS (Cone-Dominant Subgoals): An option is admitted if robust across a family of motives — not just optimal for one reward, but beneficial across a "cone" of related objectives in motive space. Formally, the CDS margin for candidate \(o\) is \(m_o(x) = \inf_{w \in W}\bigl(B_o(x;w) - B_{\text{base}}(x;w)\bigr)\); admit if \(m_o(x) \geq 0\) across update states. This prevents brittle specialization.
  • PDS (Pareto-Dominant Subgoals): When genuine trade-offs exist between motives, an option is admitted if Pareto-good on a small covering set \(W_{\text{ref}} = \{w^{(1)}, \ldots, w^{(K)}\}\) — meaning no other available option dominates it on all objectives simultaneously. PDS allows complementary skills (e.g., one safety-focused, one speed-focused).
  • MDN (Motive Decomposition Network): A co-learned network that decomposes high-level motives into achievable subgoals, mapping motive space into option space. The MDN learns which skills serve which purposes and identifies gaps. A key monotonicity property holds: tightening the cone \(W\) or refining \(W_{\text{ref}}\) never invalidates prior admissions.
  • Residuation (\(A^* = S/B\)): SubRep uses algebraic residuation from the weakness framework to compute the "weakest-sufficient" missing piece of a plan — the minimal additional capability needed to bridge the gap between current skills and a target goal. This prevents over-specialized subgoals that fail to transfer.

Decision Transformers and the Planning Algebra

Each admitted option \(o\) is modeled as a decision transformer \(T_o\) (not the neural-net "Transformer" architecture — a mathematical operator). \(T_o\) maps a downstream value function into an upstream one by inserting the option's expectation model:

\[[T_o \, v](x) = \hat{r}(x, o) + v\bigl(\hat{n}(x, o)\bigr)\]

where \(\hat{r}(x,o)\) is the expected cumulative payoff while \(o\) runs, and \(\hat{n}(x,o) \in \mathbb{R}^d\) is the expected discounted successor-feature vector at termination. On a linear value slice \(v_w(x) = w^\top x\), this reduces to the backed-up value \(B_o(x;w) = \hat{r}(x,o) + w^\top \hat{n}(x,o)\).

The planner forms a join (pointwise max) over all admitted transformers: \((B_G \, v)(x) = \max_{g \in G} [T_g \, v](x)\). These transformers, together with join and sequential composition, form a residuated quantale — the algebraic structure that makes residuation possible.

Residuation: Computing Weakest-Sufficient Plan Fragments

In the residuated quantale \((K, \otimes, \bigvee, \leq)\) of decision transformers:

\[S / T = \bigvee \{ R \in K : R \otimes T \leq S \}\]

This gives the weakest (least committed) prefix \(R\) such that running \(R\) then \(T\) still achieves the target \(S\). Dually, \(T \backslash S\) gives the weakest sufficient suffix.

Worked example (robot arm). Suppose the target \(S\) says: "from any pose in region \(\Omega\), guarantee sufficient value to place an object in the bin within a bounded number of steps." Fix a suffix \(B\) = grasp-and-lift macro (already reliable once grasping position is reached). Then:

\[A^* = S / B\]

says exactly what is minimally needed before calling \(B\): "reach a pre-grasp pose with grasp margin \(\geq 0\) under the motive slice." The planner then searches the admitted library for options approximating \(A^*\) — e.g., a learned curved-approach trajectory or a logical macro "if occlusion is high, rotate wrist to clear line of sight."

The key property: residuation yields the weakest such fragment. A stronger prefix (one that achieves more than necessary) would also work but would be over-specialized and harder to transfer. By computing the weakest-sufficient piece, SubRep keeps solutions simple and maximally reusable.

Multi-objective case. With a finite weight cover \(W_{\text{ref}}\), planning lives in a product quantale and residuals compute componentwise: \(S/B = \bigoplus_k S_k / B_k\). This gives the weakest prefix per motive weight, stacked together.

Certificates and Safety Guarantees

Certificates are expressed as backed-up values over AtomSpace features, incorporating model-uncertainty slacks:

\[m_o^{\text{rob}}(x) = \inf_{w \in W}\bigl(\hat{r} + w^\top \hat{n} - B_{\text{base}}\bigr) - \varepsilon_r - \|w\|_1 \varepsilon_n\]

where \(\varepsilon_r\) and \(\varepsilon_n\) bound model errors in the cumulant and successor-feature predictions. This ensures only options that help even under uncertainty are admitted. A join-safety theorem guarantees that adding a certified option can never decrease the planner's backed-up target on the motive slice.

Neurosymbolic Design

Unlike standard RL option frameworks, SubRep is modality-neutral. Neural controllers, PLN-derived logic macros, and MOSES/GEO-EVO evolved programs all export the same interface \((\hat{r}, \hat{n})\) and are screened by the same CDS/PDS admission rules. This lets the planner compose skills from different paradigms under one certified framework.

Relationship to MetaMo and TransWeave

SubRep is designed as a complement to MetaMo: MetaMo defines what the system cares about as motive geometries; SubRep validates which skills serve those motives with formal certificates. When TransWeave transfers skills to new tasks, SubRep certificates travel with them — the residuation algebra ensures that "weakest-sufficient" transfers remain valid under the receiving task's motive geometry.

Key References

  • Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §5.9: SubRep
  • Goertzel, B. (2025). SubRep: Certified Subgoal Representation (draft paper and explanatory deck)



Discussion