Swiss researchers build robots that learn complex tasks by watching humans and the implications run deeper than coffee

Scientists at EPFL in Switzerland have developed robots capable of learning complex multi-step tasks through human observation, a breakthrough that moves embodied AI meaningfully closer to practical household and workplace deployment.

Sthithpragya Gupta wants his robot to make him a coffee. That sounds like a modest ambition until you consider what making a coffee actually requires: recognizing objects, understanding sequence, adapting to a workspace that was not pre-configured for you, and executing fine motor tasks with the kind of contextual judgment that robotics has historically struggled to replicate outside of tightly controlled industrial settings. Gupta and his colleagues at École Polytechnique Fédérale de Lausanne have been working on exactly that problem, and their recent progress suggests the gap between demonstration and deployment is closing faster than the field expected.

The core advance is a self-modeling capability that allows robots to observe human actions and translate those observations into executable behavior, without requiring explicit programming for each new task. The robot builds and maintains an internal model of itself in relation to its environment, which is what the researchers mean by self-aware in this context. It is not consciousness. It is spatial and functional self-knowledge, the ability to understand where its own body is, what it can reach, and how its movements map onto the actions it has just watched a human perform.

Teaching a robot by demonstration rather than explicit code has been a goal in robotics research for decades, and the reason it has taken this long to make meaningful progress reflects the genuine difficulty of the problem. A human watching another human make coffee uses a vast store of background knowledge: understanding of object permanence, gravity, cause and effect, and the physical properties of containers and liquids. None of that is given to a robot by default. It has to be built up through training, and the EPFL team's approach to building that foundation through self-modeling is what distinguishes their work from earlier imitation-learning systems that required far more controlled conditions to function.

The practical test cases the team has been working with go beyond coffee. The research targets complex instruction-following, meaning tasks that involve multiple steps, conditional decisions, and physical manipulation in environments that were not pre-staged for the robot's convenience. That is a significantly higher bar than most robotic demonstrations set, and clearing it under real lab conditions rather than idealized simulations is what makes the EPFL results worth paying attention to.

The Capability and the Question It Raises

The research team itself acknowledges the dual nature of what they are building. A robot that can learn complex tasks by watching humans and then act on that learning with increasing autonomy is, almost by definition, a system whose behavior becomes harder to fully predict as its capabilities expand. Gupta and his colleagues have been direct about the fact that this kind of technology raises questions not just about usefulness but about the boundary between helpful and harmful action. That is not a reason to stop the research. It is a reason to take the safety architecture as seriously as the capability architecture from the very beginning, which is precisely the conversation the field needs to be having before deployment rather than after.

The commercial trajectory here is straightforward to map even if the timeline remains uncertain. Industrial robotics has long operated in controlled environments with pre-programmed tasks and physical safety separation from human workers. The EPFL approach points toward a different category: robots that operate in unstructured environments alongside people, adapting to new tasks through observation rather than requiring factory-floor-style configuration. The home, the hospital, the restaurant kitchen, the small warehouse with constantly changing inventory. These are the settings where robotic assistance would deliver the most value and where rigid pre-programming has consistently failed to translate from lab to real world.

Boston Dynamics, Figure, and a growing roster of well-funded humanoid robotics startups are all chasing variations of this same capability set. What academic research like the EPFL work contributes is the underlying science that commercial teams will eventually build products on top of. The self-modeling approach to learning from demonstration, if it generalizes robustly across task types, gives the entire humanoid robotics sector a more reliable path to the flexible, instruction-following behavior that investors have been funding on the premise that it is coming. The coffee machine in the EPFL lab is a small symbol of something considerably larger.

Also read: OpenAI's open-weight Privacy Filter kills the last excuse against enterprise AI • Manitoba bans social media and AI chatbots for kids, first in Canada • Stanford's AI virus designs cross from theory to lab reality