expand_less
Responsible: Ben Goertzel
Papers: Hyperon for AGI⇒ASI Whitepaper (2025),§5.9§5.9; Goertzel (2025), SubRep: Certified Subgoal Representation (draft/deck)
Status: Proposed. Framework for certified subgoal management described in the 2025 whitepaper. Theoretical design; not yet implemented.
SubRep (Subgoal Representation) proposes transforming subgoal and option learning into a disciplined, certifiable practice. Its core mechanisms are formal admission rules, a co-learned decomposition network, and algebraic residuation — together determining when a subgoal genuinely serves a larger purpose and when component solutions can be safely composed.
The Problem
Standard reinforcement learning discovers subgoals by shaping a single scalar reward. This approach is brittle: skills overfit to specific reward signals, transfer negatively to related tasks, and cannot be reliably composed. PRIMUS needs subgoals that work across multiple motives, compose safely, and serve both neural controllers and symbolic reasoners.
Core Mechanisms
SubRep's design centers on four elements described in the whitepaper and supporting glossary:
CDS (Cone-Dominant Subgoals): An option is admitted if robust across a family of motives — not just optimal for one reward, but beneficial across a "cone" of related objectives in motive space. Formally, the CDS margin for candidate \(o\) is \(m_o(x) = \inf_{w \in W}\bigl(B_o(x;w) - B_{\text{base}}(x;w)\bigr)\); admit if \(m_o(x) \geq 0\) across update states. This prevents brittle specialization.
PDS (Pareto-Dominant Subgoals): When genuine trade-offs exist between motives, an option is admitted if Pareto-good on a small covering set \(W_{\text{ref}} = \{w^{(1)}, \ldots, w^{(K)}\}\) — meaning no other available option dominates it on all objectives simultaneously. PDS allows complementary skills (e.g., one safety-focused, one speed-focused).
MDN (Motive Decomposition Network): A co-learned network that decomposes high-level motives into achievable subgoals, mapping motive space into option space. The MDN learns which skills serve which purposes and identifies gaps. A key monotonicity property holds: tightening the cone \(W\) or refining \(W_{\text{ref}}\) never invalidates prior admissions.
Residuation(A*(\(A^* = S/B):S/B\)): SubRep uses algebraic residuation from the weakness framework to compute the "weakest-sufficient" missing piece of a plan — the minimal additional capability needed to bridge the gap between current skills and a target goal. This prevents over-specialized subgoals that fail to transfer.
Decision Transformers and the Planning Algebra
Each admitted option \(o\) isassociatedmodeled as a decision transformer \(T_o\) (not the neural-net "Transformer" architecture — a mathematical operator). \(T_o\) maps a downstream value function into an upstream one by inserting the option's expectation model:
\[[T_o \, v](x) = \hat{r}(x, o) + v\bigl(\hat{n}(x, o)\bigr)\]
where \(\hat{r}(x,o)\) is the expected cumulative payoff while \(o\) runs, and \(\hat{n}(x,o) \in \mathbb{R}^d\) is the expected discounted successor-feature vector at termination. On a linear value slice \(v_w(x) = w^\top x\), this reduces to the backed-up value \(B_o(x;w) = \hat{r}(x,o) + w^\top \hat{n}(x,o)\).
The planner forms a join (pointwise max) over all admitted transformers: \((B_G \, v)(x) = \max_{g \in G} [T_g \, v](x)\). These transformers, together with join and sequential composition, form aDecisionresiduated Transformerquantale (T—o) the algebraic structure that makes residuation possible.
Residuation: Computing Weakest-Sufficient Plan Fragments
In the residuated quantale \((K, \otimes, \bigvee, \leq)\) of decision transformers:
\[S / T = \bigvee \{ R \in K : R \otimes T \leq S \}\]
This gives the weakest (least committed) prefix \(R\) such that running \(R\) then \(T\) still achieves the target \(S\). Dually, \(T \backslash S\) gives the weakest sufficient suffix.
Worked example (robot arm). Suppose the target \(S\) says: "from any pose in region \(\Omega\), guarantee sufficient value to place an object in the bin within a bounded number of steps." Fix a suffix \(B\) = grasp-and-lift macro (already reliable once grasping position is reached). Then:
\[A^* = S / B\]
says exactly what is minimally needed before calling \(B\): "reach a pre-grasp pose with grasp margin \(\geq 0\) under the motive slice." The planner then searches the admitted library for options approximating \(A^*\) — e.g., a learnedpolicycurved-approach conditionedtrajectory onor a logical macro "if occlusion is high, rotate wrist to clear line of sight."
The key property: residuation yields theoption'sweakest initiationsuch fragment. A stronger prefix (one that achieves more than necessary) would also work but would be over-specialized and terminationharder conditions.to transfer. By computing the weakest-sufficient piece, SubRep keeps solutions simple and maximally reusable.
Multi-objective case. With a finite weight cover \(W_{\text{ref}}\), planning lives in a product quantale and residuals compute componentwise: \(S/B = \bigoplus_k S_k / B_k\). This gives the weakest prefix per motive weight, stacked together.
Certificates and Safety Guarantees
Certificates are expressed as backed-up values over AtomSpace features,makingincorporating themmodel-uncertainty nativeslacks:
\[m_o^{\text{rob}}(x)to= \inf_{w \in W}\bigl(\hat{r} + w^\top \hat{n} - B_{\text{base}}\bigr) - \varepsilon_r - \|w\|_1 \varepsilon_n\]
where \(\varepsilon_r\) and \(\varepsilon_n\) bound model errors in thebroadercumulant neurosymbolicand loop.successor-feature predictions. This ensures only options that help even under uncertainty are admitted. A join-safety theorem guarantees that adding a certified option can never decrease the planner's backed-up target on the motive slice.
Neurosymbolic Design
Unlike standard RL option frameworks, SubRep isdesignedmodality-neutral. toNeural work equally for neural controllers, PLN-derived logic macros, and MOSES/GEO-EVO evolved programs — all export the same interface \((\hat{r}, \hat{n})\) and are screened by the same CDS/PDS admission rulesrules. andThis composedlets by the sameplanner planner.compose skills from different paradigms under one certified framework.
Relationship to MetaMo and TransWeave
SubRep is designed as a complement to MetaMo: MetaMo defines what the system cares about as motive geometries; SubRep validates which skills serve those motives with formal certificates. When TransWeave transfersskills,skills to new tasks, SubRep certificates would travel with them.them — the residuation algebra ensures that "weakest-sufficient" transfers remain valid under the receiving task's motive geometry.
Key References
Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §5.9: SubRep
Goertzel, B. (2025). SubRep: Certified Subgoal Representation (draft paper and explanatory deck)
Papers: Hyperon for AGI⇒ASI Whitepaper (2025),
Status: Proposed. Framework for certified subgoal management described in the 2025 whitepaper. Theoretical design; not yet implemented.
SubRep (Subgoal Representation) proposes transforming subgoal and option learning into a disciplined, certifiable practice. Its core mechanisms are formal admission rules, a co-learned decomposition network, and algebraic residuation — together determining when a subgoal genuinely serves a larger purpose and when component solutions can be safely composed.
The Problem
Standard reinforcement learning discovers subgoals by shaping a single scalar reward. This approach is brittle: skills overfit to specific reward signals, transfer negatively to related tasks, and cannot be reliably composed. PRIMUS needs subgoals that work across multiple motives, compose safely, and serve both neural controllers and symbolic reasoners.
Core Mechanisms
SubRep's design centers on four elements described in the whitepaper and supporting glossary:
CDS (Cone-Dominant Subgoals): An option is admitted if robust across a family of motives — not just optimal for one reward, but beneficial across a "cone" of related objectives in motive space. Formally, the CDS margin for candidate \(o\) is \(m_o(x) = \inf_{w \in W}\bigl(B_o(x;w) - B_{\text{base}}(x;w)\bigr)\); admit if \(m_o(x) \geq 0\) across update states. This prevents brittle specialization.
PDS (Pareto-Dominant Subgoals): When genuine trade-offs exist between motives, an option is admitted if Pareto-good on a small covering set \(W_{\text{ref}} = \{w^{(1)}, \ldots, w^{(K)}\}\) — meaning no other available option dominates it on all objectives simultaneously. PDS allows complementary skills (e.g., one safety-focused, one speed-focused).
MDN (Motive Decomposition Network): A co-learned network that decomposes high-level motives into achievable subgoals, mapping motive space into option space. The MDN learns which skills serve which purposes and identifies gaps. A key monotonicity property holds: tightening the cone \(W\) or refining \(W_{\text{ref}}\) never invalidates prior admissions.
Residuation
Decision Transformers and the Planning Algebra
Each admitted option \(o\) is
\[[T_o \, v](x) = \hat{r}(x, o) + v\bigl(\hat{n}(x, o)\bigr)\]
where \(\hat{r}(x,o)\) is the expected cumulative payoff while \(o\) runs, and \(\hat{n}(x,o) \in \mathbb{R}^d\) is the expected discounted successor-feature vector at termination. On a linear value slice \(v_w(x) = w^\top x\), this reduces to the backed-up value \(B_o(x;w) = \hat{r}(x,o) + w^\top \hat{n}(x,o)\).
The planner forms a join (pointwise max) over all admitted transformers: \((B_G \, v)(x) = \max_{g \in G} [T_g \, v](x)\). These transformers, together with join and sequential composition, form a
Residuation: Computing Weakest-Sufficient Plan Fragments
In the residuated quantale \((K, \otimes, \bigvee, \leq)\) of decision transformers:
\[S / T = \bigvee \{ R \in K : R \otimes T \leq S \}\]
This gives the weakest (least committed) prefix \(R\) such that running \(R\) then \(T\) still achieves the target \(S\). Dually, \(T \backslash S\) gives the weakest sufficient suffix.
Worked example (robot arm). Suppose the target \(S\) says: "from any pose in region \(\Omega\), guarantee sufficient value to place an object in the bin within a bounded number of steps." Fix a suffix \(B\) = grasp-and-lift macro (already reliable once grasping position is reached). Then:
\[A^* = S / B\]
says exactly what is minimally needed before calling \(B\): "reach a pre-grasp pose with grasp margin \(\geq 0\) under the motive slice." The planner then searches the admitted library for options approximating \(A^*\) — e.g., a learned
The key property: residuation yields the
Multi-objective case. With a finite weight cover \(W_{\text{ref}}\), planning lives in a product quantale and residuals compute componentwise: \(S/B = \bigoplus_k S_k / B_k\). This gives the weakest prefix per motive weight, stacked together.
Certificates and Safety Guarantees
Certificates are expressed as backed-up values over AtomSpace features,
\[m_o^{\text{rob}}(x)
where \(\varepsilon_r\) and \(\varepsilon_n\) bound model errors in the
Neurosymbolic Design
Unlike standard RL option frameworks, SubRep is
Relationship to MetaMo and TransWeave
SubRep is designed as a complement to MetaMo: MetaMo defines what the system cares about as motive geometries; SubRep validates which skills serve those motives with formal certificates. When TransWeave transfers
Key References
Goertzel, B. (2025). Hyperon for AGI⇒ASI Whitepaper, §5.9: SubRep
Goertzel, B. (2025). SubRep: Certified Subgoal Representation (draft paper and explanatory deck)