NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Cheng G PhD, editor. Humanoid Robotics and Neuroscience: Science, Engineering and Society. Boca Raton (FL): CRC Press/Taylor & Francis; 2015.

Cover of Humanoid Robotics and Neuroscience

Humanoid Robotics and Neuroscience: Science, Engineering and Society.

Show details

Chapter 3Hands, Dexterity, and the Brain

and .


Our hands are centrally involved in many of our daily activities. Reaching for objects and grasping and manipulating them usually is an almost effortless activity. Whatever our hands do, it always appears very simple to us. Yet, as children we need many years to learn how to use our hands in increasingly sophisticated ways to feel, explore, grasp, and manipulate objects. Later on, we learn to use a large variety of tools to extend our manual capabilities even further and to connect them with various cognitive skills such as writing or the playing of musical instruments. Our hands are also important mediators of social contact: from early childhood, they are crucial to get into touch with and to feel others, to signify affection, and to enrich our communication with gestural expression.

This all is made possible by the seamless integration of our hands into our cognitive system, making our manual skills an important part of our interaction with the environment and of our capacities for feeling, exploring, acting, planning, and learning.

A deeper understanding of these skills might start from finding out the processes that enable touch and vision and how these modalities are then combined to achieve hand–eye coordination for grasping and manipulating objects. The sheer number of objects that we can grasp and handle makes the analysis of the involved processes a very daunting task, probably not any simpler than an understanding of language (the number of familiar words and of familiar objects perhaps being of the same order of magnitude). Moreover, handling of many objects is not just a matter of physics alone: when we open a bottle, we anticipate that we will access its contents by a very familiar sequence of further manual actions, possibly involving additional objects such as cups or spoons. Not only can we anticipate these actions, we also can imagine how the bottle, the cup, and the spoon will feel in our hands, and any deviation from our expectations triggers a rich repertoire of corrective actions so that we can finish most of our daily activities very safely, despite the absence of really precise information about the geometry of our food items, their friction constants, or their elasticity coefficients. This skill gives us the capacity to shape our environment in planned and coordinated ways to an extent not seen in any other species, exerting in turn a strong driving force for the evolution of a capability to envisage our actions before we actually carry them out. This may have prepared the final dissociation of physical and mental action, giving us the ability of “manipulating” imagined objects, of goal-directed planning, and, ultimately, of conscious thinking.

The so-far last step in this chain of developments seems to be communication, which is based on goal-directed re-arrangements of the thoughts in our conspecifics, appearing as the extension of the reach of our “mental hands” beyond our own thoughts.

It becomes obvious that any deep understanding of human dexterity will almost inevitably lead us into elucidating much of the essence of cognitive interaction from the “physical” sensorimotor level straight up to the highest levels of thinking, language, social, and even emotional interaction. And it becomes clear that any single “theory” can only contribute a highly partial view: the richness of human dexterity, manual action, and its embedding in cognition poses as its perhaps foremost challenge to understand an architecture of interwoven processes that together can bring about this astonishing phenomenon.

This chapter shows how robotics and the quest for a deeper understanding of human dexterity share many research questions, making it natural and productive to look for bridges between the disciplines in order to arrive at a comprehensive picture of what it requires to make hands versatile and central tools for a cognitive system.

In doing so, we do not discuss any of the robots that are in use on today’s factory floors, where they insert windows into cars or perform similar nontrivial assembly steps in narrow domains and in highly repetitive manners. Our interest is on anthropomorphic robots whose body shape enables them to carry out movements that can be very similar to our own. In addition to their obvious potential as useful future assistants that are better matched to comply with our home environments with their devices, furniture, and architectural features all tailored to our human body structure, these robots offer a novel type of research tool to test and develop ideas of how embodied cognitive interaction can work and can be created.

The neurosciences, cognitive psychology, movement science, linguistics, and the social sciences all provide us with different perspectives and different levels of description of the processes that contribute to manual dexterity and its role in cognitive interaction. Although simulation can be a powerful first step to test and integrate some of these insights, simulation easily “falls victim” to idealizations and a necessarily simplified modeling of reality.

In contrast, physical robot platforms offer real-world tests. They can offer stringent proofs of the workability of ideas about how dexterity can arise and how it can be functionally interconnected with our remaining cognitive facilities. Thereby, robots can provide us with strong “idea filters,” helping us very early to identify what we need to implement a particular skill in an interactive system that has to work with real sensors, in real-time and under the limited accuracy of an embodied physical system. For instance, they make us highly aware when a grasping algorithm only works with a precise geometry model of the to-be-grasped object, or when it requires accurate data about the friction between the fingers and the object. They confront us with properties that we often tend to idealize away in our models, such as elasticity or very “non-Cartesian” object shapes of food items. And they highlight what it takes to come closer to the superb “technology” of nature, with hands covered densely with tactile sensors, so far unmatched weight–force ratios, and the real-time control of fast and sophisticated movements with “wetware” that is exceedingly slow from an engineering perspective.

Moreover, endowing such systems with increasing manual competences can suggest new experiments and open new windows into the processes underlying dextrous manipulation. For instance, creating better tactile sensing allows us to observe the touch patterns accompanying human hand actions. Designing different anthropomorphic hand shapes provides us with insights into the role of geometry for hand dexterity. Enabling robots to cope with a multitude of objects and actions challenges our understanding of how such skills need to be represented and how representations in different modalities can get coordinated. Finally, bringing such robots to the point of cooperating with humans can help to connect insights from the neurosciences about imitation learning with robotics research into rapid learning mechanisms and findings from the social sciences about human–robot cooperation.

In the following, we first consider the goal of creating multifingered manipulators whose capabilities can approximate human dexterity to some degree. We highlight some of the major issues involved and describe some representative state-of-the-art hand systems and their properties. Next, we move our focus to the perhaps most fundamental task for a hand: the grasping of objects. We compare different computational and neuroscientific accounts for the necessary processes, ranging from hand–eye coordination to frameworks for grasp characterization, hand preshape selection, and different approaches for realizing grasps. As a next step toward higher-level skills, we discuss the challenges that are associated with the controlled manipulation of a grasped object. This will leads us into a discussion of how objects and actions can become represented for manual action, and how manipulation of objects gives rise to further questions, such as how to handle deformable objects.

We then focus on a different aspect, namely the use of the hand as a perceptual device. This also gives us the opportunity to contrast properties of the human hand with current technological approaches.

As a last major aspect, we briefly summarize some major findings and ideas about the role of the hand and brain for communication. Finally we discuss the significance of a better understanding of “manual intelligence” for a deeper understanding of cognition as a whole.

We are aware that our discussion inevitably is far from exhaustive. Given the limited amount of space, we had to leave out many important aspects of the field and to focus on characteristic examples to illustrate major ideas and approaches. Many of our choices are undoubtedly affected by a strong bias from robotics in general, and from our own research interests in particular, but we have attempted to bring out at least some of the numerous connections among robotics, neuroscience, and cognitive psychology that make up much of the fascination of the field.


Loss or injury of limbs in warfare was since ancient times a major motivation for attempts to replicate arms and hands in order to provide useful or at least cosmetic prostheses. Over centuries, these constructions were rather crude, such as the famous hands of the German mercenary Götz von Berlichingen in the sixteenth century, which had to be actuated by an arrangement of catches and springs.

The sophisticated function of the human hand with its more than 20 independent degrees-of-freedom (DOF), actuated by more than 30 muscles and aided by numerous proprioceptive and tactile sensors, remained for a long time far beyond the reach of any human technology. This began to change only in the late twentieth century, when advances in materials, actuators, sensors, and control electronics offered possibilities for more realistic approximation of our most dextrous extremity. Digital computers and robots created additional interest in versatile robot manipulators and the obvious challenge to realize anthropomorpic robot hands in order to bring dexterity to robots. A timeline is shown in Figure 3.1.

FIGURE 3.1. Bringing dexterity to robots and elucidating its underpinnings.


Bringing dexterity to robots and elucidating its underpinnings. Coarse timeline of major research topics and developments since the availability of the first articulated robot hands.

The Belgrade hand and the Utah–MIT hand belonged to the first hands designed with that goal in mind. They had an anthropomorphic structure with an opposable thumb and their design anticipated some later perfected features, such as coupled joints or the use of tendons to realize the control of a large number of densely arranged joints. These systems were influential in the sense that they contributed to an early appreciation of the factors that need to be considered for realizing hands that can approximate human dexterity.

A first and very fundamental factor is the arrangement, size, and moveability of the fingers. Analyzing the evolution of hands and hand use in primates [72] has revealed that a seemingly simple change, the specialization of one finger as a highly movable “thumb” acting in opposition to the remaining fingers, brought a dramatic step for the evolution of human dexterity, whereas the precise length ratios of the fingers seemed to be much less critical, as indicated by the significant variability of human finger-length ratios.

Simulation packages, such as the GraspIt system [42] or the more recent Open-Grasp simulator [8] have become very effective tools for studying the impact of hand kinematics on the attainable dexterity. These systems allow us to simulate how the given hand kinematics constrains the attainable contact patterns between the hand and arbitrary rigid objects. In this way, they allow accurate predictions about the grasps of which a particular hand design will be capable. Yet, kinematics is only one among many factors contributing to a versatile hand.

Actuators pose an entirely different set of constraints. Human grip forces can reach 400 N and more [58]. Even current technology cannot provide sufficiently strong miniaturized actuators that would fit into the finger phalanxes or at least into the hand palm, a constraint apparently shared with nature and necessitating “extrinsic” actuators, placed in the forearm and using tendons to transmit forces to the finger joints.

This actuating principle had already been adopted in the above-mentioned “historic” hand systems and recurred in many of their successors. A modern example is the Robonaut2 [1] hand which has been designed for space operation and whose kinematics has been extensively optimized in simulation. A major innovation of this hand was the development of a novel tendon material ensuring very high break forces, low tendon friction, and high durability against abrasion. The tendons are actuated with DC motors positioned in an integrated forearm. The fully exerted fingers can exert a tip force of more than 20 N and a tip speed of 20 cm/s. The hand has 12 independently controllable DOF and is only moderately larger than a human hand. Like many other designs, this hand is “underactuated”, that is, it possesses more joints than controllable degrees-of-freedom. These “surplus” joints are controlled by coupling them in a fixed pattern to the movements of the remaining, controlled joints. Although the most frequently adopted pattern is a fixed coupling of the movements in the two outermost finger joints, more sophisticated adaptive schemes may enable parsimonious designs for robot hands that can be very dextrous while requiring only a modest number of controllable degrees-of-freedom [24].

Recent breakthroughs in the miniaturization of electric motors have also created the possibility of hand designs with “intrinsic” actuators, however, at the expense of somewhat reduced finger force levels. A major milestone for such an extremely integrated hand has been the DLR-II hand with four fingers (one opposed as a “thumb”) with integrated brushless DC motors [4]. Each finger has three DOFs, and position and torque sensors integrated in all joints enabled to control “programmable stiffness” for all degrees-of-freedom. The available motors at the time of the construction of this hand enforced an overall size significantly larger than a human hand; however, further advances in motor miniaturization, along with additional ideas and refinements of the design concepts, have enabled the construction of the successor model DLR-II-HIT [41] (see Figure 3.2) with 15 DOF and a size that is only moderately larger than a human hand, while still capable of finger forces of up to 10 N.

FIGURE 3.2. Two modern anthropomorphic robot hands.


Two modern anthropomorphic robot hands. (Top) DLR HIT hand with integrated electric motors and 15 DOFs. (Bottom) tendon-driven Shadow Dextrous Hand with 24 DOFs (for details, cf. text).

If the primary use of a hand is for a prosthesis, weight becomes of paramount importance. Some recent prosthetic hand designs demonstrate that reducing the design to a small number of very carefully selected degrees-of-freedom to sacrifice on actuator weight, and using light weight plastic materials instead of metal led to light hands that still can carry out a useful number of different grasp patterns. Such hands typically have only a single DOF per finger and may delegate the operation of some DOFs even to the other (healthy) hand of the wearer, for example, for switching between sideways or opposition position of the thumb [60].

Hydraulic or pneumatically driven actuators offer an alternative to electric motors that can deliver high forces with low weight. The FRH-4 hand developed at KIT [7] is a modern hand that realizes a high power-to-weight ratio through lightweight fluidic actuators directly integrated into the finger joints. The hand has 11 joints, grouped into eight independently controllable DOFs. A pressurized pneumatic medium inflates the actuator and thereby causes a rotary motion of the associated joint. Pressures are controlled using 16 digital valves (controlling in- and outflow of the medium for each DOF). The actuators can work both with a hydraulic medium or with pressurized air, which yields higher compliance at the expense of a more nonlinear behavior. This offers additional challenges for the realization of suitable force-position control schemes, but simplifies the hand design because air can simply be released into the environment. The actuator torques are always in the same direction; an antagonistic rubber band provides the required retraction force. Joint positions are measured in 12-bit resolution through magnetic Hall-effect rotary sensors. Additional air pressure sensors allow improvement of the control scheme. Joint angle control is achieved through a cascaded control scheme for pressure and joint angle, with pressure in the inner loop.

Speed offers another challenge dimension. A significant part of our dexterity relies on the ability to make very fast finger movements. This allows us to catch thrown objects, to flip something between our fingers, to type rapidly, or to play musical instruments. The objective for the high-speed hand reported in Reference [49] has been to create a research platform for these dynamic aspects of manual action. This requires high speeds and accelerations and, therefore, low weight of the movable parts. To fulfill these requirements and realize a high degree of dexterity at the same time, the hand has three identical fingers, each with two joints. The two outer fingers have an additional joint allowing their movement in opposition to the middle finger. Each of the total of eight joints is actuated by an integrated high-speed motor specially designed to tolerate short, very high bursts of input current. As a result, each joint can accelerate within 10 ms to its maximal speed of 1,800/s. Using this hand in conjunction with a specially developed high-speed vision system, the authors have been able to demonstrate active catching of free-falling objects, or the very impressive dynamic regrasping of a bricklike object, using a strategy of brief throwing and recatching.

A high number of suitably arranged degrees-of-freedom for finger movements is perhaps the most crucial prerequisite for achieving good dexterity. One of the most leading designs in this regard is the Shadow Dextrous Hand [13], which is also one of the very few highly dextrous robot hands that are commercially available (see Figure 3.3). It is human-sized with 24 degrees-of-freedom, 20 of which are independently controllable. In addition to two independent DOFs for bending, each finger can independently be turned in the lateral direction. Together with a 5-DOF thumb this provides the necessary prerequisites for in-hand object manipulation. An extra degree-of-freedom in the palm aids the execution of power grasps. The hand exists in two versions, using for each joint either a McKibben pneumatic “muscle” actuator that contracts under the inflow of pressurized air, or an electric DC motor to pull the tendon. The motor version is the more recent one and also contains design improvements in the tendon routing and pulley attachments, leading to considerably smoother movements as required for dextrous action.

FIGURE 3.3. Bimanual research system with a pair of anthropomorphic robot hands (Shadow Dextrous Hand) mounted on robot arms (PA-10) for positioning.


Bimanual research system with a pair of anthropomorphic robot hands (Shadow Dextrous Hand) mounted on robot arms (PA-10) for positioning. Each hand has 20 and each arm has 7 independently controllable degrees-of-freedom (DOFs), resulting in a 54-DOF platform. (more...)

Accurate finger control requires sufficient kinesthetic feedback to compensate model inaccuracies. Different sensing devices, such as potentiometers, Hall-effect or optical sensors, have been developed to provide accurate information about finger joint angles and joint forces. Even more demanding is the realization of good “cutaneous” sensing through tactile sensors on the finger and hand surface. If each finger has only a single contact in known position, the contact force is computable from the measured joint torques. In all other cases, only a more-or-less extensive coverage of the finger segments and the hand palm with tactile sensing elements can provide detailed information about the forces exerted by a grasp or during manipulation. A wide range of sensing principles has been explored to realize different approximations to the so-far unattained tactile sensing of the human hand (see Section 3.5 for a discussion of the hand as a perceptual device and references therein).

Finally, the contact properties of the fingertips and the palm are major factors for the attainable grasps. Human fingertips are soft and offer good friction on a wide variety of object surfaces. Analyzing the physics of such finger contacts is a complex problem [45] and most analyses and optimizations of robot grasps are based on contact models that are simpler to deal with, but are less favorable for good grasps.

Most of our daily actions are bimanual. To explore and synthesize such skills for robots requires sophisticated research platforms that allow us to bring two robot hands in close opposition to create a workspace with a similar geometry as in human manual action. In addition to dextrous hands this requires highly movable robot arms, preferably with more than 6 DOF per arm to provide the system with redundant degrees-of-freedom for flexible positioning of its hands in a variety of interaction situations. A typical platform is depicted in Figure 3.3. It features two Shadow Dextrous Hands mounted on Mitsubishi PA-10 arms, each of which provides 7 DOF to facilitate avoidance of singularities in the workspace. The entire arm–hand system comprises a total of 54 independent degrees-of-freedom. There are 24 Hall sensors per hand to provide accurate joint angle feedback to control the 80 miniature solenoid on–off valves that adjust air in- and outflow into the pneumatically driven “muscle”-like actuators transmitting their forces via tendons to the fingers. The system is complemented with a Kinnect camera for 3D object segmentation and monitoring of the workspace [70].

Despite still being far away from the capabilities of human hands, platforms such as these begin to cross the critical threshold beyond which one can begin to study issues of advanced manual action in a robotics setting.


Being able to grasp objects allows us to adapt our environment instead of adapting ourselves. Grasping is also the major activity that “brings us into touch” with the world around us. Grasping allows us to feel what we just have seen, thereby constantly connecting our visual and our tactile modality. And, very importantly, grasping leads us from passive perception to active control.

Considering the seemingly simple act of grasping an apple, we notice that it begins with a shift of our visual attention that prepares a coordinated action of motor commands to the hand and to our eye when reaching for the object. This action is accompanied by sensorimotor and visual feedback to control the hand such that it safely approaches the objects in proper orientation and preshape. It proceeds with a coordinated closing of the fingers until their rich sensors signal us a familiar haptic pattern that finally confirms that we have made physical contact with the expected object and in the expected manner for the apple now to be fully at our disposal. And any significant deviation from its expected progression swiftly initiates corrective actions toward ensuring the intended outcome.

When we grasp the apple, all of this complexity is hidden from our conscious perception, making us entirely unaware that our brain has just performed a highly amazing act. An ultimate understanding of how this became possible might be sought in the underlying neural processes, scattered across the brain areas that were involved in the action. The combination of modern imaging methods and numerous painstaking single-cell studies has revealed a substantial number of brain areas connected into what has been termed a “grasping network” [17]. The major inputs to these circuits stem from the visual and the somatosensory systems, whose operation is at best understood at their lower levels, but much less with regard to the higher processed outputs that are sent to the grasping circuits. This makes it very difficult to elucidate how the different parts of the grasping network work together. In addition, most studies are limited to primates, whose dexterity is considerable but yet in stark contrast to what human dexterity can do [9]. Therefore, complementation of neuroscience experiments with computational modeling, the use of computer simulations, and experiments with real robot hands can be an invaluable source of additional information for analyzing different hypotheses about grasp action control.

Such a wider approach also brings into view additional layers of description and understanding: in addition to a “microscopic theory” at the level of the constitutive neural processes, we may strive for “coarser grained” theories focusing at the physical and control level, at the level of behavior, or even at a cognitive level, concerned with internal goals and meaning. Some of these levels may be more easily accessible than others, thus helping to find entry points from which our understanding then might work forward to those levels that are very hard to access directly.

The best observable layer of dextrous grasping is perhaps the hand–arm kinematics itself. Focusing on the parameter of grip aperture, Jeannerod in his now classic work [32] observed a highly stereotypical pattern of gradually increasing grip aperture to a maximum value that is highly correlated with object size and is reached near the last third of the movement, after which it shrinks again until the fingers make contact with the object. This pattern has been confirmed in many subsequent studies that extended the investigation to dependencies on object properties beyond size and on further kinematics parameters, such as object distance. (For a review, see Reference [3].)

Grip force is another important parameter. The proper assignment of grip forces constitutes a difficult task, inasmuch as it depends not only on the shape of the object (which can be seen), but also on parameters such as weight and friction properties, which can only be inferred indirectly and with considerable uncertainty. Moreover, grip forces may need to be rapidly adjusted to prevent slippage of the object as a result of disturbances, or when the object shall be accelerated, such as during lifting. Studies of grip force adjustment [36] have revealed that the brain manages to assign the required grip forces in a very parsimonious manner that keeps only a small safety margin against slippage while coping with a wide range of different situations. An important element in this strategy is a rapid grip tightening reflex that is triggered by cutaneous sensors that react to the vibrations that accompany slippage.

But grasps are also affected by factors that do not exist at the level of physics and geometry alone but are rooted in the anticipation of effects that the grasp will have in the future. A well-studied example is the optimization of “end-state comfort” [57]: when an action, such as putting a mug into a dishwasher, requires releasing the object in a “reversed” (upside down) orientation, we tend choose the “awkward” reversed orientation of the hand for the starting grasp of the object, so that releasing the object in reversed orientation makes the arm and hand end up in their comfortable, normal state. Interestingly, this ability is shared with primates, but it is absent in very young children, who only develop it after several years of grasping practice. Practice lets us also anticipate numerous even more subtle constraints for choosing a grasp, for example, when grasping food items, objects with dangerous or fragile parts, or during handover from another person [73]. In all these cases, our grasp choices are informed by rich background knowledge about the object and a nontrivial understanding of the purpose of the grasp.

The most ambitious approaches are aiming at a replication of the architecture of the “neural grasping network” on robot systems [19,37,50]. Given that our knowledge about these networks still is very limited, these attempts have to fill many gaps with tentative assumptions. On the other hand, this allows us to compare the impact of different assumptions and may generate helpful feedback to refine the focus of neuroscientific studies. One guiding idea is to conceptualize grasping as essentially a multistage mapping problem: the visual system extracts an initial representation which then is further transformed along separate mapping pathways into object location and into grasp-relevant object features, such as shape, size, and orientation. Next, these features are associated with potential grasp types that are suitable for the object and task. From this set of candidates finally a suitable grasp type is chosen and mapped to suitable finger and arm parameters. This requires the connection of object location and object shape as well as orientation information, because suitable arm–hand configurations are affected by each of these factors.

An early concretization of this general idea is the FARS model [19]. It suggests that the first mapping step into grasp-relevant object features occurs within area AIP in the macaque brain, where one finds neurons selectively tuned to grasp relevant object features such as size, shape, and orientation. The second stage, the association with potential grasp configurations and their context-dependent selection, is assumed to be carried out in an area denoted as F5. This area receives inputs from other brain areas, making it a good candidate to satisfy task- and context-dependent constraints when selecting a grasp that then would become executed by passing suitable information from F5 to the motor area M1 which, according to the model, represents the last step in the mapping sequence by creating the finger and arm motion commands for a coordinated arm–hand movement.

Although the FARS model leaves out many details (such as how grasps and contextual constraints are represented, and how these representations interact), its major merit is to provide an at least coarse computational framework for the overall task that paved the way for subsequent refinements, such as the addition of learning. One approach has considered using the selected grasps to derive feedback corrections to the output that was delivered by the previous mapping stages [50]. This model shares with the FARS model its formulation at a very high level of abstraction, leaving many options—and challenges—of how to implement the required steps in more concrete and realistic situations. An exemplary recent attempt in this regard is [51], where the authors have implemented the mapping cascade from the visual input to the hand shape output very concretely. It has a four-layered visual system of alternating “simple” and “complex” cell layers that extracts a set of visual features that are sufficiently correlated with invariant object properties, and is followed by a mixture model that maps the visual features into a probability density on the space of grasp shapes. A final selection stage uses a heuristic scheme to select the most promising “probability peak” that identifies the grasp posture finally applied to the object. This model goes very far in filling in very plausible neural models for the required transformation steps; however, it still is restricted to a simulation study.

A complementary line of modeling leaves out the issue of how the required mapping steps might be mapped to neural structures and focuses on the computational analysis of essential building blocks for the overall process.

An obvious and conceptually appealing starting point is the physics of the grasping situation, shown in Figure 3.4. In the simplest idealization, all contacts are modeled as point contacts with friction, imparting forces and torques on the object. The resulting situation can then be analyzed with respect to the net forces and torques, and, more importantly, with regard to the stability of the grasp, for example, characterized by the ability to resist external forces and torques. A first characterization is through the concept of “force closure” which requires a grasp to be able to resist arbitrarily directed, but small forces and torques on the object by making small modifications to the grasp forces. An extension of force closure [21] considers more realistic finite disturbances and characterizes the “volume” (the “grasp polytope” in the 6D product space of forces and torques) of disturbances that can be resisted by the grasp under given bounds of the available contact forces. Summarizing the 6D volume by the radius of the largest inscribable sphere leads to a compact grasp quality measure and, using ideas from semidefinite programming, has enabled the formulation of systematic optimization algorithms for grasps w.r.t this or similar grasp quality measures [29,30]. This can be computationally costly, but Borst, Fischer, and Hirzinger [12] show that simple randomized generate-and-test approaches can provide good approximations to theoretically optimal grasps, even for hands of the complexity of the DLR-II hand. These optimizations are possible with precisely known geometry and friction models for the involved object, the manipulator, and their mutual contacts.

FIGURE 3.4. (Left) Idealized grasping.


(Left) Idealized grasping. A detailed geometrical model is used to compute optimized grasp points and forces based on explicit contact models, such as point contacts with friction. (Right) Real-world grasping. Hand preshape and topological “encaging” (more...)

In real-world situations, such information may be unavailable or very difficult to obtain. This has motivated approaches that de-emphasize the need for a detailed, physics-based modeling of the interaction situation at the contact points. Instead, they take a more “topological” stance, either trying to identify suitable contact point patterns from visual features, or to abandon the idea of specifying contact points entirely and instead consider a grasp as a dynamic process that seeks its own stable contact points as a result of an “attractor dynamics” arising from a suitably prescribed finger motion started from suitable inital conditions (“pregrasp”).

Examples from the first category of works are found in References [5,62]. Instead of obtaining grasp point candidates from a mathematical optimization scheme, the researchers use a trainable classifier to assign grasp point locations based on visual input of images of objects. Starting from a collection of training objects with humanmarked grasp point locations, they create a large dataset of computer-rendered images containing these training objects in different poses and contexts to achieve generalization. Subsequently, they use this enlarged training set to train their classifier in a supervised fashion to generate grasp point candidates for novel objects that are similar to the training instances. Thereby, they demonstrate how human grasp point selection knowledge can become “compiled” into a vision front end that can suggest grasp point candidates for a robot hand, using visual scene (stereo) images as its only input. An encouraging result is that the trained system can generalize rather well both with regard to the positions of the training objects as well as with regard to novel, similar real-world objects, although thus far the method has been demonstrated only for grasps with “opposing” grip point locations that have been realized by simple two- or three-finger grippers.

The second category of approaches is motivated from observations of human hand–eye coordination that have revealed that many human grasps pass through a “preshape” phase [32], in which the hand is already close to but not yet in contact with the object and shaped such that the fingers begin to surround the object in a cage-like manner. In the final phase the fingers close, thereby shrinking the cage and bringing the object into a stable position within the hand (see Figure 3.5). This appears to be a very attractive “holistic” strategy that requires only a minimum of detailed information, because many details of the final grasp configuration “emerge” as a result of the interaction between fingers and object during the shrinking phase of the cage.

FIGURE 3.5. Hand preshapes for enabling robust grasps with the Shadow Dextrous Hand.


Hand preshapes for enabling robust grasps with the Shadow Dextrous Hand.

The choice of a proper pregrasp can take inspiration from prior work on human grasp taxonomies [16]. This has revealed that the grasps that humans use in different situations can be organized into a taxonomy tree with only a small number of major branches that include power grasps in which the hand wraps around the object to create a large contact surface in order to be able to impart large forces, two- and three-finger (“tripod”) precision grasps, in which object contact is restricted to the fingertips in order to maximize controllability of the object, and grasps that are intermediate between the precision and power grasps, using the thumb in opposition to all other fingers to hold the object. (For a more recent review, see also [58].)

These pregrasps can be directly mapped to anthropomorphic robot manipulators. Using these grasps Röthling et al. [56] have implemented the “preshape” approach for different robot hands: a robot hand with only three fingers to form the cage, and the Shadow Hand with five fingers and 20 independently controllable DOFs (see Figure 3.6). The five-fingered hand produced superior results; however, the cage-based strategy is even implementable for a gripper with only three fingers, although the range of graspable object shapes then is somewhat reduced.

FIGURE 3.6. Preshape-based grasping of daily objects with Shadow Hand.


Preshape-based grasping of daily objects with Shadow Hand. (From Röthling et al. 2007. With permission.) See color insert.

The main grasp-related features required for this method are the preshape of the finger cage and its positioning relative to the target object. Experiments with the Shadow hand revealed that four different pregrasp shapes are sufficient to grasp the majority of a set of 20 typical household objects. Therefore, for many situations the task of determining precise grasp point locations can be reduced to just a discrete choice among a small number of hand preshapes, combined with a suitable finger closing process whose dynamics generates the details of the grasp. As a result, the visual mapping task can be significantly simplified to a categorization of the object into a small set of preshape categories plus a rejection class for those object shapes that may require a more sophisticated grasp.

An interesting element of this way of grasping is that an essential part of the computation becomes “moved into the interaction physics” between the hand and the object. This demonstrates how embodiment can simplify the control of actions. It seems that grasping has a very strong embodiment component where substantial benefits arise from the softness of the finger and hand surface and their excellent friction properties. These factors can significantly facilitate contact formation and grasping. Although algorithms that attempt to model and exploit these factors in detail are very difficult to formulate, it may be much easier to formulate computationally lightweight strategies in which these effects are exploited without modeling them explicitly, and in such a manner that the remaining computations are simple and robust. Hence, an overarching lesson might be that we should be wary of the construction of “fragile algorithmic clockworks,” needing information about a lot of details which are hard to know in real-world scenarios (such as friction, elasticity, precise shape, and mass distribution) and look instead for “holistic solutions” in which robust dynamics develops attractors toward desired states without creating too much need of making underlying model parameters explicit.

An interesting further line of developments combines elements from the previous two categories of approaches [11,43]. Given the availability of 3D geometric models for an increasing number of everyday objects, these authors take a datadriven approach and take a number of objects for which associated optimized grasps are determined by some method (such as algorithmic computation or prespecification by the human). These grasps then include both a specification of contact points and a specification of a suitable hand orientation and shape w.r.t. the grasped object. These associations, together with robustly determinable visual object features, are stored in a large database and the task of the vision system is to extract image features that are suited to index reliably into this database. At first sight, this appears to be a brute force method, but it allows for several optimizations, such as shape decomposition techniques to enhance the generalization ability from stored to novel object shapes. Associating grasps with salient object parts instead of entire objects can reduce the required storage to a much smaller database of “predictive” object parts and allow the obtaining of good grasps for large sets of daily objects [55] when accurate geometry data are available, for example, from 3D vision.


Once an object has been grasped, our fingers enable us to manipulate it in many different ways. This is very different from most simple grippers, whose constrained degrees-of-freedom severely restrict the local motions that can be imparted to the grasped object. At the same time, manipulation planning and control for anthropomorphic hands pose sensorimotor challenges that are even greater than those associated with grasping alone and that may even have acted as a major evolutionary drive for the evolution of higher cognitive abilities, such as tool use and language.

A thorough understanding of multifingered object manipulation has also become of significant interest in robotics because the advent of advanced anthropomorphic hand designs allows us not only to test computational ideas about multifingered manipulation beyond simulation, but makes any results in this field of practical interest for future robots.

From a computational perspective, manipulation has some resemblance to walking: in both cases, the system must coordinate a sequence of contact patterns for achieving desired forces between the agent and an external object. This has given rise to the concept of “finger gaits,” and initial theoretical analyses of manipulation have focused on the precise characterization of the conditions under which such gait sequences exist [34]. This, however, does not yet solve the problem of how such movements can be planned in a robust manner.

Although there exist numerous planning and optimization algorithms for movements that are “smooth,” typical manipulation sequences are of a hybrid nature, connecting smooth state changes during which the hand configuration changes continuously (while maintaining its current contact pattern with the object) with discontinuous transitions that occur when the contact pattern changes. This happens whenever one or more fingers are lifted or set down to create a new contact.

This has given rise to the concept of “stratified state spaces” whose “strata” consist of subsets of hand–object configurations that share the same contact pattern [22]. Different strata are connected by discontinuous transititions between their associated contact patterns. Using very elegant concepts from Lie theory, the authors have developed a general method for planning movement sequences across strata in such spaces.

Finite state automata (FSA) offer an alternative for planning in stratified state spaces. The idea is to consider each stratum as a separate node of a graph in which strata with possible transitions are linked by arcs. The resulting graph (and the corresponding nodes of the FSA) then reflect the “coarse-grained” structure of the manipulation space when only contact patterns are distinguished. Each FSA node then is responsible for handling the trajectory piece within its stratum [67]. Such a scheme is very much in line with emerging ideas about how the brain may use tactile feedback and vision to enable dextrous manipulation sequences by properly sequencing “action phase controllers” in response to tactile events [20,33]. Moreover, the scheme is easily extensible to accommodate more levels, thereby offering a computational framework that can be related to recent ideas about the larger scale architecture of manual action. This may offer another example of how insights from neuroscience and psychology on the one side, and robotics research, can mutually stimulate each other.

A major limitation of the work in Reference [67] (and of similar approaches) is, however, its reliance on accurate geometry models of the objects. So far, this has restricted its use to simulation approaches.

In the following we describe a variant of the scheme proposed recently [Li+MHRB: 2012] and that, although thus far also limited to simulation studies, can lift the need for an accurate object model by combining FSM-guided manipulation with a local feedback scheme for finger control and a merging of finger-repositioning with an online exploration of neighboring object points.

Using four fingers and considering only states with three or four finger contacts, the possible transitions between state-space strata lead to a FSM of the structure depicted in Figure 3.7. Transitions between the center node and four surrounding nodes in this FSM correspond to changing the role of one finger from “support” to “local exploration” or vice versa. In the center node S1, a desired movement of the object is effected by suitable finger motions, assuming point contacts of all four fingers without slippage and rolling. The only approximate honoring of these assumptions causes deviations from the expected motions that are detected and corrected by sensing the true object motion through (simulated) visual feedback. Each of the remaining periphery nodes S2–S5 corresponds to one finger being lifted for local exploration. In these states, the object can be rotated through the remaining three fingers that are still in contact. Such actions do not change the contact pattern and are indicated by the “self-transition” arcs. The local movement of the exploratory finger is controlled under the influence of a measure that attempts to optimize a compromise between good object manipulability and good grasp stability when the finger regains its status of a support finger. A physics-based simulation is used to verify that the necessary steps can be carried out with information that is locally available at the fingers, plus accurate feedback about the resulting global object motion, when the object geometry is not “too intricate.”

FIGURE 3.7. (Top) Finite state automaton (FSA) distinguishing five contact states S1–S5 connected by finger actions A1–A12.


(Top) Finite state automaton (FSA) distinguishing five contact states S1–S5 connected by finger actions A1–A12. Each finger action corresponds to the activation of a specialized, low-level finger controller. (Bottom) Physics-based simulation (more...)

Some of the difficulties of taking the step from such simulations to real systems are connected with control challenges arising from the presence of kinematically closed actuator chains. This is a typical situation for manipulation, and when the actuators are rigid, small positional deviations can give rise to huge forces. For stiff actuators, this then requires very short control cycles of the order of 1 ms or below.

Biological systems cannot regulate motions at such timescales and use compliant structures whose inherent elasticity allows their control at much slower timescales. This has inspired robotics likewise to integrate elastic elements in actuator design, even if this may make models of such actuators more difficult to calibrate and analyze. On the positive side, the inherent compliance of elastic structures can be seen as a low-level control law that automatically provides corrective forces when the actual finger positions begin to deviate from the target configuration. This makes the system tolerant against small errors, for example, when carrying out an otherwise rigidly guided movement, such as turning a handle or unscrewing the lid from a jar. It also facilitates synthesizing manipulation sequences from approximate trajectory information, as obtained, for example, through observation of human manipulation trajectories.

Even then it is important to reduce the a priori very high state-space dimensionality of anthropomorphic manipulators. Modeling the hand as an only 12-dimensional manipulator (the available number of DOFs is actually significantly larger), and resolving only 3 different positions for each joint, would lead into a state-space of 312 ≈ 1 billion different configurations, which would be impossible to visit in the duration of a lifetime (≈1.2 billion seconds). Therefore, actually occurring hand configurations must be highly correlated and “cluster” strongly in lower-dimensional “manipulation manifolds.” For instance, Santello, Flanders, and Soechting [64] found that an only two-dimensional linear subspace can already capture 80% of the variability of a large number of natural hand postures that were recorded with a dataglove. This has encouraged studies that use techniques familiar from principal component analysis for projecting hand postures to lower dimensional spaces that are spanned by high-variance directions termed “eigengrasps” [10]. These linear methods can be generalized to employ manifolds, which, by virtue of their ability to “curve” in a nonlinear way, can capture even more variability with the same low number of dimensions.

Following this idea, Steffen, Haschke, and Ritter [35] generalized methods based on self-organized feature maps to create from raw human action capture data lowdimensional manipulation manifolds in which a highly structured manipulation sequence, such as uncrewing a cap from a bottle or jar, is represented as a motion of only a few major control variables [63].

In that case, the manipulation manifold can be as low as two-dimensional (one dimension for the cap radius, and one dimension as “progression time”), allowing the representation of segments of the cap-turning manipulation movement by smooth curves in this manifold. Using an inherently compliant manipulator system, in this case the 20-DOFs Shadow hand, allowed the demonstration of the opening of a marmalade jar as part of a complex, bimanual manipulation action [63].

Although the key point of this demonstration was on the finger action sequence for the opening of the jar, it also provides an example of an integration of several of the previously discussed aspects of manual interaction in a real-time operating robot system: a vision system recognizes the object location and posture to prepare target reaching of the first arm and proper hand–eye coordination for the initial grasp. A regrasp by the second hand then creates the required configuration from which the unscrewing motion can start. Finally, the action is concluded by lifting the cap once enough turns have occurred.

The actual implementation of apparently “simple” tasks such as this makes explicit that they are based on the coordinated activity of a substantial number of more elemental, constituent processes each of which itself can already be of substantial complexity. Without doubt, our robot solutions are still simple compared to the sophistication of the grasping networks that we begin to recognize in the brain. We may speculate that the detailed complexity of the biological circuits will forever remain unreached by our models, and system diagrams with a small number of “black boxes” may be too coarse to capture much of the essence of the underlying mechanisms. Actual robot implementations, therefore, may offer the chance of a useful “middle” level of abstraction, allowing us to “sketch” suitable processing architectures for manual action in a way that is sufficiently detailed to be validated with regard to their computational feasability and that may help to chart possible functional structures and thereby inform neurobiological modeling and interpretation of structures.

Defining four major levels of abstraction—sensorimotor control, sensorimotor representations, mental representations, and mental control—and assigning to them tentative computational structures that have been found useful to implement manual skills for robots as described here and in the previous sections, Maycock et al. [31,69] present a, still very tentative, attempt to connect ideas and concepts from robotics and psychology toward a general framework for manual action that bypasses the difficulties of matching processing abstractions with neural structures. Instead, this proposal focuses on the different axis of aspects that can be characterized functionally (Figure 3.8). Works such as this can also provide examples of the interdisciplinary research that is needed to make progress with the challenges of cognitive interaction.

FIGURE 3.8. Four-layer framework for manual action and possible implementation in robot systems: control layers at the sensorimotor and at an abstract “mental” level are connected through two representation layers that bridge the gap between these complementary levels of control.


Four-layer framework for manual action and possible implementation in robot systems: control layers at the sensorimotor and at an abstract “mental” level are connected through two representation layers that bridge the gap between these (more...)

In addition to such overarching contributions, there remains a strong need for focused advances for the better understanding of generic skills in manual interaction. In this regard, a particularly interesting challenge appears to be the handling of objects with “biological characteristics.” These are objects that are soft and often deformable, such as plants, food, or fur. They tend to defy our usual representations that are preoccupied with simple Cartesian shape primitives, such as cylinders, spheres, or polyhedra, and cause us instead to elaborate control models capable of reaching beyond the case of rigid or otherwise fixed structures. Manipulating such objects may also pave a way that connects insights about dextrous manipulation with a deeper understanding of the higher cognition without which the handling of such objects would not exist.

To gain insights into the associated challenges, we have begun to explore the handling of paper through robot hands. Paper has an intermediate position between fully rigid and fully deformable objects, and it offers an interesting scope of nontrivial interactions. Already picking up a flat-lying piece of paper can offer an interesting challenge even for human hands, and operations such as bending, folding, tearing, or crumpling of paper pieces are building blocks for higher-order capabilities that we invoke when we put something into an envelope or a bag, or when we use paper to construct even new objects manually.

Due to the deformability of the paper, the robot requires real-time visual feedback to be able to adapt its finger motions suitably with the changing paper shape, for example, for being able to “bulge” the paper suitably with one hand to enable the second hand to pick it up with a precision grip. See Figure 3.9. The work in Elbrechter, Haschke, and Ritter [14] shows how such feedback can be obtained, utilizing a physics-based modeling of the behavior of the paper. For operations such as folding, visual feedback has to be combined with force sensing to ensure proper task execution. This can be achieved by customized low-level controllers that are suitably sequenced with the help of a finite-state automaton properly designed for the task [15]. Examples such as these show how some of the complex multimodal coordination patterns that are typical of most of our daily manual actions can be achieved with current robot hands.

FIGURE 3.9. Bimanual folding of a piece of paper, using visual and tactile feedback.


Bimanual folding of a piece of paper, using visual and tactile feedback. The paper is printed with fiducial markers to simplify visual perception. The inscribed colored grid depicts the robot’s current model of the perceived paper shape. (Courtesy (more...)

Human hand motions integrate a substantial variety of similar and many even much harder manipulation primitives. The challenge is to replicate a representative “vocabulary” of such manual actions and to develop a deeper understanding of how to blend them together into the sophisticated actions that make our hands such special tools of our cognition.


When our hands get into touch with an object, they not only signify to us mechanical contact, but simultaneously provide us with information about the object’s texture and material properties such as friction and thermal conductivity. Manipulating the object briefly between our fingers, we readily get access to further information, such as the object’s weight, firmness, and shape properties.

The sensory basis of these capabilities is a rich sensory equipment of the epidermis. It has been estimated that the hand surface is covered by about 17,000 sensors. Four major sensor types have been identified that differ along the axes of spatial and temporal resolution [33]. Analogous to the visual fovea, there is also a highly nonuniform sensor density on the hand. It is highest at the fingertips, leading to a spatial resolution down to 0.5 mm, whereas the resolution falls to about 5 mm at the back of the hand.

The sensor responses are also affected by their embedding in an elastic skin whose friction and deformation properties during object contact has a significant effect on the sensory responses of the above-mentioned sensor systems. A striking example [59] is the important role that is played by the shape of the fingerprints, which could be shown to act as a sensitivity enhancer for discrimination of fine surface texture during finger movements across a surface.

In contrast to vision, tactile sensing is highly dependent on the active shaping of the contact between hand and object. Haptics is the associated interplay of cutaneous and kinesthetic sensing, involving numerous different processes that contribute to the formation of a haptic percept [LedKla2009]. Correspondingly, there is a participation of neural areas, of which the somatosensory and the motor areas, positioned adjacent and densely interconnected, are closest to the sensorimotor periphery. In both areas, there exist topographic maps of the hand surface, in which neighboring sensors are connected in a topographic fashion to neighboring cortical cells. These maps are adaptive and have been found to reorganize (e.g., to rededicate unused regions after the loss of a finger) while maintaining their topographic structure. Considerable adaptation can also result from extensive training and is reflected, for instance, in the significantly enhanced spatial acuity of Braille-reading people in their “reading finger tip”.

A replication of similar capabilities in robot hands is still a rather elusive goal. Although there is progress toward the development of more and more capable “skin” sensors and their calibration [44], there is still a large gap regarding the sensing capabilities and spatial resolution of the human hand [65].

From a computational perspective, some aspects of early tactile sensing may be amenable to processing tactile “images” by similar algorithms as used for feature extraction and classification in early computer vision. This approach is also encouraged by apparent similarities in the representation of shape in the somatosensory and visual pathways [75], along with similarities in the characterization of receptive field properties, such as Gabor-like response profiles. These findings encourage the use of vision methods, such as the decomposition of tactile images into principal components, to enable haptic pattern discrimination and recognition for robot manipulators [28].

However, a major complication as compared to vision arises from the considerably increased complexity of the geometry of the sensor: whereas the visual fovea has a fixed spherical shape, the shape of the hand surface during tactile exploration of an object is highly variable, and the tactile “images” in the different touch sensor channels are accompanied by the proprioceptive kinesthetic information about the hand shape during manipulation.

How these processes interact is currently only very little understood. (For a review, see [33].) Available computational models usually focus on a single channel, such as the discrimination of object shapes through kinesthetic (hand shape) information when the hand is closed around the object, the discrimination of objects through their surface textures when moving a tactile sensor (“finger”) across the surface, or the classification of spatiotemporal sequences of sensor images in simplified sensor geometries, such as when moving a planar tactile sensor matrix actively around objects [28].

However, robotics experiments can help to elucidate which features may be particularly informative for object discrimination, and how useful information is spread across different feature sets. Looking into this question and comparing over 100 heuristically chosen tactile features as a basis for tactile object classification, Schöpfer et al. [66] present evidence for a rather distributed representation, making it infeasible to “concentrate” a significant share of the information by simple linear methods, such as PCA, in a small set of principal feature dimensions for a typical scenario in which tactile information is gathered with an actively moved tactile sensor.

Another area of research is the construction of object representations from haptic interaction. Klatzky et al. pioneered the idea of “exploratory procedures” [40], which aim to reveal specific properties of an object, such as its friction, texture, or stiffness through suitably tailored actions, such as scratching, squeezing, or poking. There is no universally agreed definition of specific exploratory procedures; however, examples of how this concept can be implemented in robotics have been presented by several authors [71,76].

One challenge that is associated with haptic representations is the integration of geometry information together with information about stiffness and the sensing of movable degrees-of-freedom. For instance, we are easily capable of identifying the rotatory axis of a door handle through haptic exploration. de Schutter et al. [18] show how one can solve this task from a computational perspective and suggest an adaptive controller using Kalman filter techniques to replicate a similar ability for a robotic manipulator. The even more demanding task of identifying the movable part structure of a composite object through active exploration has been considered in Katz and Brock [38]. In their approach the authors substitute vision for tactile information, creating an object model from a set of feature point clouds generated from a sequence of exploratory “pushes” to an articulated chain of segments connected by rotatory joints. A different approach, building a 3D shape representation of rigid objects from a number of tactile “images” resulting from self-generated grasps by a three-fingered gripper with tactile matrix sensors in its finger pads, is shown in Meier et al. [46]. The authors also demonstrate that the resulting 3D shape representations can be used to recognize an object and discriminate it from a number of competitors.

Because currently the miniaturization of touch sensors into highly articulated, anthropomorphic manipulators still poses a difficult technical challenge, a complementary approach is to instrument the to-be-grasped objects with tactile sensors in order to create new windows into haptic interaction during manual actions. Recent work in the author’s group has led to the construction of an “iObject” [39] that can sense and wirelessly transmit such haptic patterns and, thereby, allow us to investigate haptic control strategies during a variety of manipulation tasks.


Studies of the cognitive development in children have provided numerous observations pointing to a close linkage of manual gesture, language development, and communication [25]. Manual actions for showing and giving develop before pointing, and pointing has been found to predict acquisition of single words. Later on, manual gestures become combined with word use and a further differentiation of manual gestures into deictic, metaphoric, iconic, and “beat” gestures develops and becomes intertwined with more elaborated language structuring [2].

These observations have led to a view that manual action shares with language an extensive sensorimotor basis (in which also the mouth region plays a major role, exemplified, e.g., in the Babcock reflex, causing babies to open their mouth in response to touching their palm), which leads to a highly correlated development of both modalities, or, in exceptional circumstances, to the ability to develop sign language if the vocal modality cannot be used, and the coupling of gesture and language in situated communication has begun to connect social robotics and computer graphics to add natural expressivity to agents by endowing them with the ability to enhance language with natural-looking deictic, iconic, or metaphoric gesture [6,47].

The tight coupling between manual action and language [52] is also reflected in the close proximity of the involved brain regions, such as the hand and mouth regions in sensorimotor and motor cortex, and remarkable cross-effects, for example, changes in pregrasp aperture when reaching for the same object, but while listening to words for large or small objects [61].

Most of our knowledge about brain control of hand actions stems, however, from studies with monkeys, such as macaques, which share with us the ability of fine manipulation. A vast body of studies has led to the discovery of a complex network of brain areas involved in the recognition, preparation, execution, and learning of manual actions that is now rather well known in macaques [17], with additional findings in humans that point to the existence of homologue networks in the human brain [68].

The elucidation of these networks is closely connected with the discovery of the so-called mirror neurons [54]. Activity in these neurons is correlated with specific actions, however, irrespective of whether the action is carried out by the animal, or is only observed by it. Thus, these neurons seem to represent or “mirror” specific actions per se, which has given rise to their name and the concept of a mirror system providing a representation of actions that can be shared by perception and behavior control alike.

The discovery of such a system has been a major missing link to explain how neural structures might subserve the ability of primates and humans to learn by imitating others. This in turn has led to ideas how this link between observer and actor might provide a “bridge” from “doing” to “communicating” [53], thereby setting the basis for the evolution of language [23] and connecting it with a capacity for manual skills. These ideas have received further support from recent neuroscience findings, such as evidence for even more generalized “mirror-type” neurons that may be involved in the creation of more abstract concepts required to enable higher-level action semantics, such as the representation of action goals [74]. Such abilities would appear crucial for creating linguistic representations and for enabling what has been termed the social brain, that is, the ability to infer goals and intentions in observed actions of others.

This social dimension leads back to another important role of our hands: using touch and haptics to enrich communication with an important emotional dimension. Although this important role of hands is clear from our everyday experience, systematic research into this subject is still in a very infancy stage, for example, investigating the role of touch for social communication [48], or how haptic interfaces could enrich social media [26].


We have only been able to cover some of the major capabilities of our hands: grasping, manipulating objects, using our hands as perceptual devices, and, finally, some aspects of their role in communication.

A shared element of these capabilities is interaction. There exists an abundance of works aiming to explain intelligent behavior as arising from capabilities of perception, category formation, and decision making; however, the study of manual dexterity and its replication in robots forcefully entrenches us into issues of intelligent control.

In robotics, control is a very familiar concept. However, it usually is encountered within highly prestructured settings, with predefined assignments of state-spaces and control variables, leading to well-formatted optimization problems, such as finding feedback laws that minimize a tracking error under a given set of conditions.

We believe that manual action requires a broader understanding of intelligent control. As we have seen in the preceding chapters, the relationships between hands and objects can be extremely multifaceted and are not well amenable to a picture where a controller with predefined state, input, and output variables is embedded in a fixed interaction loop of a known structure. Instead, the flexible shaping of these relationships—and thereby of the structure of the interaction loop itself—is an essential characteristic of the task. Coping with this challenge goes well beyond what current control approaches can deliver: although much of classical control theory is focused on the characterization of controllability for given situations and the derivation of suitable control laws, an important part of manual action requires solving the question of how controllability can be created in the first place through suitable “attachments” of the fingers, and their sensors (along with the eyes) to the object and its environment.

This challenging question is not solvable by a narrow focus on “low-level” aspects alone: controllability needs to be achieved on many levels that range from small local movements of an object to more global actions, such as unscrewing a jar, over the achievement of high-level goals, such as filling a glass of juice, up to the highest level that is encountered in communication: the structured interaction with thoughts.

As we have seen from the preceding sections, manual actions are crucially involved in all these levels. Therefore, the quest for a better understanding of manual action, along with its engineering facet of how to synthesize dextrous manual action for robots, is highly likely to catalyze deeper insights into what is required for an intelligent agent to become able to shape and influence its environment in increasingly sophisticated and abstract ways, starting with a control of the physics in its immediate surround, advancing to the mastery of a variety of mechanical devices and tools, and ultimately paving the way for grasping mental objects that are only present through thinking and communication.

Thus, research on dexterity, manual action, and how it is realized by the brain, seems to be in a pivotal position for the deciphering of the principles of cognitive interaction. Therefore, it has been argued that it might play a similar role as played by the Rosetta Stone for the deciphering of ancient writing systems [27]. At the same time, it can provide a fascinating unifying topic for elucidating an important and rich part of our cognition: the manual intelligence that becomes evident from what our hands can do in their daily actions.


This work was supported through DFG Grant CoE 277: Cognitive Interaction Technology (CITEC).


Bridgwater L.B. et al. The robonaut 2 hand - designed to do work with tools. IEEE International Conference on Robotics and Automation (ICRA). 2012:3425–3430. In.
Bates E, Dick F. Language, gesture and the developing brain. Develop. Psychobiol. 2002;40:293–310. [PubMed: 11891640]
Smeets J.B, Brenner E. A new view on grasping. Motor Contr. 1999;3:237–271. [PubMed: 10409797]
Borst C, Fischer M, Haidacher S, Liu H, Hirzinger G. Dlr hand ii: Experiments and experiences with an anthropomorphic hand. IEEE International Conference on Robotics and Automation (ICRA). 2003:702–707. In.
Bohg J, Kragic D. Learning grasping points with shape context. Robot. Autom. 2010;4:362–377.
Hartmann B, Mancini M, Pelachaud C. Implementing expressive gesture synthesis for embodied conversational agents. Gesture in Human-Computer Interaction and Simulation. 2006:188–199. In.
Bierbaum A, Schill J, Asfour T, Dillmann R. Force position control for a pneumatic anthropomorphic hand. Proceedings of the 9th IEE-RAS International Conference on Humanaoid Robotics. 2009:21–27. In.
Leon B, Ulbrich S, Diankov R, Puche G, Przybylski M, Morales A. Open-grasp: A toolkit for robot grasping simulation. Simulation. 2010;6472:109–120. In.
Castiello U. The neuroscience of grasping. Nature Rev. Neurosci. 2005;6:726–736. [PubMed: 16100518]
Ciocarlie M, Goldfeder C, Allen P. Dimensionality reduction for handindependent dexterous robotic grasping. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems San Diego. 2007 Oct-Nov;29(2):3270–3275. In.
Goldfeder C, Allen P.K. Data-driven grasping. Auton. Robots. 2011;31(1):1–20.
Borst C, Fischer M, Hirzinger G. Grasping the dice by dicing the grasp. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2003;4:3692–3697. In.
Shadow Robot Company; 2012 Website www​
Elbrechter C, Haschke R, Ritter H. Bi-manual robotic paper manipulation based on real-time marker tracking and physical modelling. IEEE/RSJ International Conference on Robots and Systems (IROS). 2011:1427–1432. In.
Elbrechter C, Haschke R, Ritter H. Folding paper with anthropomorphic robot hands using real-time physics-based modeling. Proceedings of the IEEE International Conference on Humanoid Robots (Humanoids). 2012:210–215. In.
Cutkosky M.R. On grasp choice, grasp models, and the design of hands for manufacturing tasks. IEEE Trans. Robot. Autom. 1989;5:269–279.
Davare M, Kraskov A, Rothwell J.C, Lemon R.N. Interactions between areas of the cortical grasping network. Curr. Opin. Neurobiol. 2011. 2011;21:564–570. [PMC free article: PMC3437559] [PubMed: 21696944]
de Schutter J, Bruyninckx H, Dutr S, de Geeter J, Katupitiya J, Demey S, Lebfevre T. Estimating first-order geometric parameters and monitoring contact transitions during force-controlled compliant motion. Int. J. Robot. Res. 1999;18:1161–1184.
Fagg A.H, Arbib M.A. Modeling parietal-premotor interactions in primate control of grasping. Neural Netw. 1998;11:1277–1303. [PubMed: 12662750]
Flanagan J.R, Bowman M.C, Johansson R.S. Control strategies in object manipulation tasks. Curr. Opin. Neurobiol. 2006;16:650–659. [PubMed: 17084619]
Ferrari C, Canny J. Planning optimal grasps. IEEE International Conference on Robotics and Automation. 1992:2290–2295. In.
Goodwine B, Burdick J.W. Motion planning for kinematic stratified systems with application to quasi-static legged locomotion and finger gaiting. IEEE Trans. Robot. Autom. 2002;18:209–222.
Gentilucci M, Corballis M.C. From manual gesture to speech: A gradual transition. Neurosci. Biobehav. Rev. 2006;30:949–960. [PubMed: 16620983]
Grioli G, Catalano M, Silvestro E, Tono S, Bicchi A. Adaptive synergies: An approach to the design of under-actuated robotic hands. IEEE International Conference on Intelligent Robots and Systems (IROS). 2012:1251–1256.
Goldin S. Hearing Gesture: How Our Hands Help Us Think. Cambridge, MA: Harvard University Press; 2003.
Haans A. Mediated social touch: A review of current research and future directions. Virtual Real. 2006;9(2):149–159.
Ritter H, Haschke R, Röthling F, Steil J. Manual intelligence as a Rosetta Stone for robot cognition. Robot. Res. 2011;66:135–146.
Heidemann G, Schöpfer M. Dynamic tactile sensing for object identification. IEEE International Conference on Robotics and Automation (ICRA). 2004:813–818. In.
Haschke R, Steil J, Steuwer I, Ritter H. Task-oriented quality measures for dextrous grasping. IEEE CIRA Conference Proceedings. 2005:689–694. In.
Han L, Trinkle J.C, Li Z.X. Grasp analysis as linear matrix inequality problems. IEEE Trans. Robot. Autom. 2000;16:663–674.
Maycock J, Dornbusch D, Elbrechter C, Haschke R, Schack T, Ritter H. Approaching manual intelligence. KI-Künstliche Intelligenz. 2010;24(4):287–294.
Jeannerod M. The timing of natural prehension movements. J. Mot. Behav. 1984;16:235–254. [PubMed: 15151851]
Johansson R.S, Flanagan J.R. Coding and use of tactile signals from the fingertips in object manipulation tasks. Nature Rev. Neurosci. 2009;10:345–359. [PubMed: 19352402]
Hong J, Lafferriere G, Mishra B, Tan X. Fine manipulation with multifinger hands. IEEE International Conference on Robotics and Automation. 1990:1568–1573. In.
Steffen J, Haschke R, Ritter H. Towards dextrous manipulation using manipulation manifolds. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2008:2738–2743. In.
Johansson R.S, Westling G. Roles of glabrous skin receptors and sensorimotor memory in automatic control of precision grip when lifting rougher or more slippery objects. Exp. Brain Res. 1984;56:550–564. [PubMed: 6499981]
Kawato M. From understanding the brain by creating the brains towards manipulative neuroscience. Phil. Trans. R. Soc. B. 2008;363:2201–2214. [PMC free article: PMC2610191] [PubMed: 18375374]
Katz D, Brock O. Extracting planar kinematic models using interactive perception. In: Kragic D, Kyrki V, editors. Unifying Perspectives in Computational and Robot Vision. Springer; US: 2008. pp. 11–23. In.
Koiva R, Haschke R, Ritter H. Development of an intelligent object for grasp and manipulation research. Proceedings of Advanced Robotics (ICAR). 2011:204–210. In.
Klatzky R, Lederman S. Stages of manual exploration in haptic object identification. Percep. Psychophys. 1992;6:661–670. [PubMed: 1287570]
Liu H. et al. Multisensory five-finger dexterous hand: The DLR/HIT hand. II. IEEE/RSJ International Conference on Intelligent Robots and Systems. 2008:3692–3697. In.
Miller A.T, Allen P.K. Graspit!: A versatile simulator for robotic grasping. IEEE Robot. Autom. Mag. 2004;12:110–122.
Morales A, Asfour T, Azad P, Knoop S, Dillmann R. Integrated grasp planning and visual object localization for a humanoid robot with five-fingered hands. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2006:5663–5668. In.
Mittendorfer P, Cheng G. Open-loop self-calibration of articulated robots with artificial skins. IEEE International Conference on Robotics and Automation (ICRA). 2012:4539–4545. In.
Ciocarlie M, Lackner C, Allen P. Soft finger model with adaptive contact geometry for grasping and manipulation tasks. IEEE 2nd Joint EuroHaptics Conference. 2007:219–224. In.
Meier M, Schöpfer M, Haschke R, Ritter H. A probabilistic approach to tactile shape reconstruction. IEEE Trans. Robot. 2011;27:630–635.
Salem M, Kopp S, Wachsmuth I, Rohlfing K, Joublin F. Generation and evaluation of communicative robot gesture. Int. J. Social Robot. 2012;4:1–17.
Lee K.M, Jung Y, Kim J, Kim S.R. Are physically embodied social agents better than disembodied social agents?: The effects of physical embodiment, tactile interaction, and people’s loneliness in humanrobot interaction. Int. J. Hum. Comput. Stud. 2006;64(10):962–973.
Namiki A, Imai Y, Ishikawa M, Kaneko M. Development of a high-speed multifingered hand system and its application to catching. IEEE International Conference on Intelligent Robots and Systems (IROS). 2003:2666–2671. In.
Oztop E, Kawato M, Arbib M. Mirror neurons and imitation: A computationally guided review. Neural Net. 2006;19:254–271. [PubMed: 16595172]
Prevete R, Tessitore G, Catanzariti E, Tamburrini G. Perceiving affordances: A computational investigation of grasping affordances. Cog. Syst. Res. 2011;12:122–133.
Pulvermüller F. Brain mechanisms linking language and action. Nat. Rev. Neurosci. 2005;6:576–582. [PubMed: 15959465]
Rizzolatti G, Arbib M. Language within our grasp. Trends Neurosci. 1998;21:188–194. [PubMed: 9610880]
Rizzolatti G, Craighero L. The mirror-neuron system. Ann. Rev. Neurosci (2004). 2004;27:169–192. [PubMed: 15217330]
Detry R, Ek C.H, Madry M, Piater J, Kragic D. Generalizing grasps across partly similar objects. IEEE International Conference on Robotics and Automation (ICRA). 2012:3791–3797. In.
Röthling F, Haschke R, Steil J, Ritter H. Platform portable anthropomorphic grasping with the bielefeld 20-dof shadow and 9-dof tum hand. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2007:2951–2956. In.
Rosenbaum D.A, Jorgensen M.J. Planning macroscopic aspects of manual control. Human Move. Sci. 1992;11:61–69.
Wells R, Greig M. Characterizing human hand prehensile strength by force and moment wrench. Ergonomics. 2001;44:1392–1402. [PubMed: 11936830]
Scheibert J. et al. The role of fingerprints in the coding of tactile information probed with a biomimetic sensor. Science. 2009;323:1503–1506. [PubMed: 19179493]
Schulz S. Proceedings of MyoElectric Controls/Powered Prosthetics Symposium. vol. 2011. New Brunswick (Canada): 2011. First experiences with the Vincent hand. In.
Glover S, Rosenbaum D.A, Graham J, Dixon P. Grasping the meaning of words. Exp. Brain Res. 2004;154:103–108. [PubMed: 14578997]
Saxena A, Driemeyer J, Ng A.Y. Robotic grasping of novel objects using vision. Int. J. Robot. Res. 2008;27:157–173.
Steffen J, Elbrechter C, Haschke R, Ritter H. Bio-inspired motion strategies for a bimanual manipulation task. IEEE-RAS International Conference on Humanoid Robotics. 2010:625–630. In.
Santello M, Flanders M, Soechting J.F. Postural hand synergies for tool use. J. Neurosci. 1998;18:10105–10115. [PMC free article: PMC6793309] [PubMed: 9822764]
Dahiya R.S, Metta G, Valle M, Sandini G. Tactile sensing – from humans to humanoids. IEEE Trans. Robot. 2010;26(1):1–20.
Schöpfer M, Pardowitz M, Haschke R, Ritter H. Towards Service Robots for Everyday Environments. Vol. 76. Springer; Berlin, Heidelberg: 2012. Identifying relevant tactile features for object identification. In E. Prassler (Ed.) pp. 417–430.
Saut J-P, Shabani A, El-Khoury S, Perdereau V. Dexterous manipulation planning using probabilistic roadmaps in continuous grasp subspaces. IEEE International Conference on Intelligent Robots and Systems (IROS). 2007:2907–2912. In.
Tunik E, Grafton S.T. Beyond grasping: Representation of action in human anterior intraparietal sulcus. TNeuroImage 36. 2007;36(86):77–86. [PMC free article: PMC1978063] [PubMed: 17499173]
Schack T, Ritter H. The cognitive nature of action – Functional links between cognitive psychology, movement science, and robotics. Prog. Brain Res. 2009;174:231–250. [PubMed: 19477343]
Ückermann A, Elbrechter C, Haschke R, Ritter H. 3d scene segmentation for autonomous robot grasping. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2012:1734–1740. In.
Martinez-Hernandez U, Lepora N, Barron-Gonzalez H, Dodd T, Prescott T. Towards contour following exploration based on tactile sensing with the iCub fingertip. Adv. Auton. Robot. 2012;7429:459–460.
van Duinen H, Gandevia S.S. Constraints for control of the human hand (2011). J. Physiol. 2011;23:5583–5593. [PMC free article: PMC3249034] [PubMed: 21986205]
Chan W.P, Parker C.A.C, Vander Loos H.F.M. et al. Grip forces and load forces in handovers: Implications for designing human-robot handover controllers. Proceedings of the 7th ACM/IEEE International Conference on Human-Robot Interaction. 2012:9–16. In.
Yamazaki Y, Yokochi H, Tanaka M, Okanoya K, Iriki A. Potential role of monkey inferior parietal neurons coding action semantic equivalences as precursors of parts of speech. Social Neurosci. 1. 2010;5:105–117. [PMC free article: PMC2826156] [PubMed: 20119879]
Yau J.M, Pasupathy A, Fitzgerald P.J, Hsiao S.S, Connor C.E. Analogous intermediate shape coding in vision and touch. Proc. Nat. Acad. Sci. 2009;106:16457–16462. [PMC free article: PMC2738619] [PubMed: 19805320]
Su Z, Fishel J.A, Yamamoto T, Loeb G.E. Use of tactile feedback to control exploratory movements to characterize object compliance. Frontiers Neurorobot. 2012;6 [PMC free article: PMC3405524] [PubMed: 22855676]
© 2015 by Taylor & Francis Group, LLC.
Bookshelf ID: NBK299038PMID: 26065077


  • PubReader
  • Print View
  • Cite this Page

Other titles in this collection

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Similar articles in PubMed

See reviews...See all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...