Workshop on World Models and Predictive Coding in Cognitive Robotics

Invited talks

Tetsuya Ogata

(Waseda University, Japan)

Predictive Coding-inspired Robotics: Advancing Adaptability through Deep Predictive Learning

Deep learning wields remarkable power in tasks like image and natural language processing. Yet, practical deployment, especially in robotics, faces substantial obstacles. A natural use case for deep learning in robotics is "image recognition," such as identifying object grasp points. However, labeling demands are high, and image learning alone doesn't ensure successful grasping due to physical attributes like friction and center of gravity. Researchers are increasingly drawn to deep reinforcement learning, but real-world trial and error with robots is excessively expensive. The Sim2Real method offers a partial solution, though challenges persist. Our "deep predictive learning" inspired by the concept of predictive coding builds on the notion of incomplete real-world understanding. It adapts the model's internal state and generates real-world motions to dynamically reduce prediction errors. This approach is vital for self-reliant, responsive agents and aligns with "predictive coding." In 2016, our deep predictive learning enabled a humanoid robot to fold towels, fostering collaborations with industry partners and commercial outcomes. A pivotal focus is our moonshot endeavor, backed by Japan's Cabinet Office: the "AIREC (AI-driven Robot for Embrace and Care)" smart robot, blending AI and care. Existing robots necessitate dedicated hardware for specific tasks, limiting adaptability. Deep learning's potential to amplify generalization parallels how smartphones consolidate functions into one device, birthing fresh value. We aim to advance our project by introducing the deep predictive learning framework to equip the versatile AIREC with diverse capabilities. The future envisions intelligent technologies like deep learning greatly extending robots' original generalization capacities. While a smartphone's individual functions may not match those of dedicated devices, amalgamating them spawns innovation. Our strategy involves empowering AIREC, the multipurpose robot, with our deep predictive learning framework to facilitate its adeptness across varied tasks.

Pablo Lanillos

(Donders Institute for Brain, Cognition and Behaviour, Netherlands)

Neuroscience-inspired generative models of action
(on-line)

Despite the peak performance of generative artificial intelligence in producing images, sound, and text, action generation is still an elusive challenge. Particularly, how AI systems and robots should transform intentions or abstract goals into meaningful and safe control actions is still an open problem. In this talk, conversely to other ML approaches that literally connect large language models to robots, we take inspiration from neuroscience to build a foundation model that may allow generative AI for control from first principles. To this end, I will first revise relevant unconscious action-generation strategies that humans perform to improve adaptation and how to model them in robots through active inference. Second, I will describe how structured representation learning can connect grounded object representations to action generation and more importantly, action generation from the agent preferences (soft neurosymbolic goals). Finally, I will sketch recent research directions from my lab in active inference for robotics and other neuroscience-inspired AI technologies related to the Spikeference and Metatool EU projects.

Giulio Sandini
(Italian Institute of Technology, Italy)

“The Role of Imagination in Social Interaction”

The use mathematical models to describe human perceptual and motor functions has a very long and successful history while the design and implementation of embodied artificial systems to investigate human sensorimotor and cognitive abilities is a relatively recent endeavour struggling, to some extent, to go beyond a superficial, technology-driven, biomimetic approach. Besides its intrinsic scientific and engineering value, the view emerging is that of a fragmented collection of individual functions missing the opportunity to exploit the origin and timeframe of human adaptive abilities including the role and complementary contribution of evolutionary, epigenetic, developmental and learning processes.

Stemming from these considerations and on the need of a more convergent approach based on a reference cognitive architecture I will focus my presentation on how to exploit the use of robots to advance our knowledge of the mechanisms at the basis of human-human interaction and in particular in our ability to anticipate our own actions and those of others. I will argue that robots as experimental tools to investigate embodied intelligence and the cognitive aspects of social interaction, can be exploited not only as physical models of biological systems but, more interestingly, as experimental platforms to investigate aspects of social interactions such as the kinematic and dynamic signatures of motor contagion, turn taking, vitality forms and emotions.

Takamitsu Matsubara
(NAIST, Japan)

Disturbance-injected Robust Imitation Learning with Structured Policies for Complex Tasks

DART (Disturbances for Augmenting Robot Trajectories, Laskey+2017) is a robust imitation learning, having an agent injects disturbances into expert demonstrations to mitigate the difference between the distribution of expert demonstrations and the distribution of learned policies. The concept is clear; however, existing methods are restricted by simple policy models since the disturbance design depends on the policy models. Therefore, it is important to explore an extended framework that can employ more structured policy models but can design appropriate disturbances according to their structures to solve complex tasks in real environments. In this talk, we present two frameworks that we have recently proposed: the first is Bayesian Disturbance Injection (BDI), which employs stochastic multimodal policies for complex assembly tasks. The second is Disturbance Injection under Partial Automation (DIPA), which employs deterministic hierarchical policies with semi-autonomous control modes for long-horizon excavation tasks. The experimental validation results and open issues are discussed.

Masahiro Suzuki
(The University of Tokyo, Japan)

Perspectives on World Models and Predictive Coding in Cognitive Robotics

Abstract: In this talk, I delve into key concepts in world models and predictive coding in facilitating the emergence of autonomous robots capable of continuous learning through active interaction with their environment, as identified in our comprehensive survey, "World Models and Predictive Coding for Cognitive and Developmental Robotics: Frontiers and Challenges." World models have received much attention in the deep learning community in recent years, describing how the internal state of the world evolves from an agent's actions and given sensory inputs. Predictive coding is a neuroscience-oriented paradigm, and more general concepts known as the free energy principle and active inference, which deal with perception, control, and learning in a unified manner in terms of minimizing variational free energy in the past, present, and future. I will outline these concepts and then briefly contrast these paradigms and their role in fostering cognitive development that contributes to lifelong learning in robots and humans. I then describe recent advances in deep generative models and discuss their importance in implementing these paradigms. Throughout this talk, I will argue that deep generative models are important in steering us toward a future in which they contribute significantly to the cognitive and developmental capabilities of autonomous systems.

Page updated

Google Sites

Report abuse