Learning in Infants using Intrinsically Motivated Goal Conditioned Reinforcement Learning

Darsan, T I2026-06-112026-06-1738p.http://hdl.handle.net/10263/7717This dissertation has been completed under the supervision of Prof. Jochen Triesch & Dr. Malay BhattacharyyaTraditional artificial intelligence models learnby passively digesting large datasets. In contrast, human infants discover skills by actively interacting with their bodies and environments without explicit external rewards. This thesis introduces the Composer Architecture, a machine learning framework designed to mimic this autonomous, open-ended development. The Composer architecture operates in a multi-stage loop, the latent model using Contrastive Learning Through Time (CLTT) to compress high-dimensional raw data from visual, proprioceptive, and touch sensors into a low-dimensional space. To preserve data relationships and prevent topological collapse, a Softmax activation forces these latent representations to lie smoothly on a probability simplex. A goal-conditioned reinforcement learning policy then trains on this space by targeting randomly sampled one-hot goals. We evaluated the architecture on the MIMo platform, a highly realistic simulation of an 18-month-old child embedded in theMuJoCo physics engine. Testing progressed from primitive shapes to complex multi-joint control channels on the robot. On a single-finger setup, the architecture mapped latent extrema directly to full extension and flexion, while five-finger trials isolated whole-hand opening and closing configurations. To scale control, a hierarchical extension called Hi-Composer successfully coordinates complex limb motions by routing high level commands as sub-goals to localized lower-level finger policies. Finally, full body exploration benchmarks on a rollover task validated the "thin pancake" hypothesis. Compared to white-noise walks, the Composer architecture expands the agent’s spatial reach by 134 percent while successfully restricting local exploration to an organized, lower-dimensional submanifold. This demonstrates that self generated latent goals effectively guide open-ended exploration into structured movements.enReinforcement LearningInfant LearningCognitive NeuroscienceLearning in Infants using Intrinsically Motivated Goal Conditioned Reinforcement LearningThesis