Responsible: Ben Goertzel
Papers: Hyperon for AGI⇒ASI Whitepaper (2025), §5.9; Goertzel (2025), SubRep: Certified Subgoal Representation (draft/deck)
Status: Proposed. Framework for certified subgoal management described in the 2025 whitepaper. Theoretical design; not yet implemented.
SubRep (Subgoal Representation) proposes transforming subgoal and option learning into a disciplined, certifiable practice. Its core mechanisms are formal admission rules, a co-learned decomposition network, and algebraic residuation — together determining when a subgoal genuinely serves a larger purpose and when component solutions can be safely composed.
Standard reinforcement learning discovers subgoals by shaping a single scalar reward. This approach is brittle: skills overfit to specific reward signals, transfer negatively to related tasks, and cannot be reliably composed. PRIMUS needs subgoals that work across multiple motives, compose safely, and serve both neural controllers and symbolic reasoners.
SubRep's design centers on four elements described in the whitepaper and supporting glossary:
Each admitted option \(o\) is modeled as a decision transformer \(T_o\) (not the neural-net "Transformer" architecture — a mathematical operator). \(T_o\) maps a downstream value function into an upstream one by inserting the option's expectation model:
\[[T_o \, v](x) = \hat{r}(x, o) + v\bigl(\hat{n}(x, o)\bigr)\]where \(\hat{r}(x,o)\) is the expected cumulative payoff while \(o\) runs, and \(\hat{n}(x,o) \in \mathbb{R}^d\) is the expected discounted successor-feature vector at termination. On a linear value slice \(v_w(x) = w^\top x\), this reduces to the backed-up value \(B_o(x;w) = \hat{r}(x,o) + w^\top \hat{n}(x,o)\).
The planner forms a join (pointwise max) over all admitted transformers: \((B_G \, v)(x) = \max_{g \in G} [T_g \, v](x)\). These transformers, together with join and sequential composition, form a residuated quantale — the algebraic structure that makes residuation possible.
In the residuated quantale \((K, \otimes, \bigvee, \leq)\) of decision transformers:
\[S / T = \bigvee \{ R \in K : R \otimes T \leq S \}\]This gives the weakest (least committed) prefix \(R\) such that running \(R\) then \(T\) still achieves the target \(S\). Dually, \(T \backslash S\) gives the weakest sufficient suffix.
Worked example (robot arm). Suppose the target \(S\) says: "from any pose in region \(\Omega\), guarantee sufficient value to place an object in the bin within a bounded number of steps." Fix a suffix \(B\) = grasp-and-lift macro (already reliable once grasping position is reached). Then:
\[A^* = S / B\]says exactly what is minimally needed before calling \(B\): "reach a pre-grasp pose with grasp margin \(\geq 0\) under the motive slice." The planner then searches the admitted library for options approximating \(A^*\) — e.g., a learned curved-approach trajectory or a logical macro "if occlusion is high, rotate wrist to clear line of sight."
The key property: residuation yields the weakest such fragment. A stronger prefix (one that achieves more than necessary) would also work but would be over-specialized and harder to transfer. By computing the weakest-sufficient piece, SubRep keeps solutions simple and maximally reusable.
Multi-objective case. With a finite weight cover \(W_{\text{ref}}\), planning lives in a product quantale and residuals compute componentwise: \(S/B = \bigoplus_k S_k / B_k\). This gives the weakest prefix per motive weight, stacked together.
Certificates are expressed as backed-up values over AtomSpace features, incorporating model-uncertainty slacks:
\[m_o^{\text{rob}}(x) = \inf_{w \in W}\bigl(\hat{r} + w^\top \hat{n} - B_{\text{base}}\bigr) - \varepsilon_r - \|w\|_1 \varepsilon_n\]where \(\varepsilon_r\) and \(\varepsilon_n\) bound model errors in the cumulant and successor-feature predictions. This ensures only options that help even under uncertainty are admitted. A join-safety theorem guarantees that adding a certified option can never decrease the planner's backed-up target on the motive slice.
Unlike standard RL option frameworks, SubRep is modality-neutral. Neural controllers, PLN-derived logic macros, and MOSES/GEO-EVO evolved programs all export the same interface \((\hat{r}, \hat{n})\) and are screened by the same CDS/PDS admission rules. This lets the planner compose skills from different paradigms under one certified framework.
SubRep is designed as a complement to MetaMo: MetaMo defines what the system cares about as motive geometries; SubRep validates which skills serve those motives with formal certificates. When TransWeave transfers skills to new tasks, SubRep certificates travel with them — the residuation algebra ensures that "weakest-sufficient" transfers remain valid under the receiving task's motive geometry.