Chapter 14Modeling Multisensory Processes in Saccadic Responses

Time-Window-of-Integration Model

Diederich A, Colonius H.

Publication Details


Multisensory research within experimental psychology has led to the emergence of a number of lawful relations between response speed and various empirical conditions of the experimental setup (spatiotemporal stimulus configuration, intensity, number of modalities involved, type of instruction, and so forth). This chapter presents a conceptual framework to account for the effects of cross-modal stimulation on response speed. Although our framework applies to measures of cross-modal response speed in general, here we focus on modeling saccadic reaction time as a measure of orientation performance toward cross-modal stimuli.

The central postulate is the existence of a critical “time-window-of-integration” (TWIN) controlling the combination of information from different modalities. It is demonstrated that a few basic assumptions about this timing mechanism imply a remarkable number of empirically testable predictions. After introducing a general version of the TWIN model framework, we present various specifications and extensions of the original model that are geared toward more specific experimental paradigms. Our emphasis will be on predictions and empirical testability of these model versions, but for experimental data, we refer the reader to the original literature.


For more than 150 years, response time (RT) has been used in experimental psychology as a ubiquitous measure to investigate hypotheses about the mental and motor processes involved in simple cognitive tasks (Van Zandt 2002). Interpreting RT data, in the context of some specific experimental paradigm, is subtle and requires a high level of technical skill. Fortunately, over the years, many sophisticated mathematical and statistical methods for response time analysis and corresponding processing models have been developed (Luce 1986; Schweickert et al., in press). One reason for the sustained popularity of RT as a measure of mental processes may be the simple fact that these processes always have to unfold over time. A similar rationale, of course, is valid for other methods developed to investigate mental processes, such as electrophysiological and related brain-imaging techniques, and it may be one reason why we are currently witnessing some transfer of concepts and techniques from RT analysis into these domains (e.g., Sternberg 2001). Here, we focus on the early, dynamic aspects of simultaneously processing cross-modal stimuli—combinations of vision, audition, and touch—as they are revealed by a quantitative stochastic analysis of response times.

One of the first psychological studies on cross-modal interaction using RT to measure the effect of combining stimuli from different modalities and of varying their intensities is the classic article by Todd (1912). A central finding, supported by subsequent research, is that the occurrence of cross-modal effects critically depends on the temporal arrangement of the stimulus configuration. For example, the speedup of response time to a visual stimulus resulting from presenting an accessory auditory stimulus typically becomes most pronounced when the visual stimulus precedes the auditory by an interval that equals the difference in RT between response to the visual alone and the auditory alone (Hershenson 1962). The rising interest in multisensory research in experimental psychology over the past 20 years has led to the emergence of a number of lawful relations between response speed, on the one hand, and properties of the experimental setting, such as (1) spatiotemporal stimulus configuration, (2) stimulus intensity levels, (3) number of modalities involved, (4) type of instruction, and (5) semantic congruity, on the other. In the following, rather than reviewing the abundance of empirical results, we present a modeling framework within which a number of specific quantitative models have been developed and tested. Although such models can certainly not reflect the full complexity of the underlying multisensory processes, their predictions are sufficiently specific to be rigorously tested through experiments.

For a long time, the ubiquitous mode of assessing response speed has been to measure the time it takes to press a button, or to release it, by moving a finger or foot. With the advance of modern eye movement registration techniques, the measurement of gaze shifts has become an important additional technique to assess multisensory effects. In particular saccadic reaction time, i.e., the time from the presentation of a target stimulus to the beginning of the eye movement, is ideally suited for studying both the temporal and spatial rules of multisensory integration. Although participants can be asked to move their eyes to either visual, auditory, or somatosensory targets because the ocular system is geared to the visual system, the saccadic RT characteristics will be specific to each modality. For example, it is well-known that saccades to visual targets have a higher level of accuracy than those to auditory or somatosensory stimuli. Note also, as the superior colliculus is an important site of oculomotor control (e.g., Munoz and Wurtz 1995), measuring saccadic responses is an obvious choice for studying the behavioral consequences of multisensory integration.


We introduce a conceptual framework to account for the effects of cross-modal stimulation as measured by changes in response speed. * The central postulate is the existence of a critical TWIN controlling the integration of information from different modalities. The starting idea simply is that a visual and an auditory stimulus must not be presented too far away from each other in time for bimodal integration to occur. As we will show, this seemingly innocuous assumption has a number of nontrivial consequences that any multisensory integration model of response speed has to satisfy. Most prominently, it imposes a process consisting of—at least—two serial stages: one early stage, before the outcome of the time window check has occurred, and a later one, in which the outcome of the check may affect further processing.

Although the TWIN framework applies to measures of cross-modal response speed in general, the focus is on modeling saccadic reaction time. First, a general version of the TWIN model and its predictions, introduced by Colonius and Diederich (2004), will be described. Subsequently, we present various extensions of the original model that are geared toward more specific experimental paradigms. Our emphasis will again be on the predictions and empirical testability of these model versions but because of space limitations, no experimental data will be presented here.

14.3.1. Basic Assumptions

A classic explanation for a speedup of responses to cross-modal stimuli is that subjects are merely responding to the first stimulus detected. Taking these detection times to be random variables and glossing over some technical details, observed reaction time would then become the minimum of the reaction times to the visual, auditory, or tactile signal leading to a purely statistical facilitation effect (also known as probability summation) in response speed (Raab 1962). Over time, numerous studies have shown that this race model was not sufficient to explain the observed speedup in saccadic reaction time (Harrington and Peck 1998; Hughes et al. 1994, 1998; Corneil and Munoz 1996; Arndt and Colonius 2003). Using Miller’s inequality as a benchmark test (cf. Colonius and Diederich 2006; Miller 1982), saccadic responses to bimodal stimuli have been found to be faster than predicted by statistical facilitation, in particular, when the stimuli were spatially aligned. Moreover, in the race model, there is no natural explanation for the decrease in facilitation observed with variations in many cross-modal stimulus properties, e.g., increasing spatial disparity between the stimuli.

Nevertheless, the initial anatomic separation of the afferent pathways for different sensory modalities suggests that an early stage of peripheral processing exists, during which no intermodal interaction may occur. For example, a study by Whitchurch and Takahashi (2006) collecting (head) saccadic reaction times in the barn owl lends support to the notion of a race between early visual and auditory processes depending on the relative intensity levels of the stimuli. In particular, their data suggest that the faster modality initiates the saccade, whereas the slower modality remains available to refine saccade trajectory. Thus, there are good reasons for retaining the construct of an—albeit very peripheral—race mechanism.

Even under invariant experimental conditions, observed responses typically vary from one trial to the next, presumably because of an inherent variability of the underlying neural processes in both ascending and descending pathways. In analogy to the classic race model, this is taken into account in the TWIN framework by assuming any processing duration to be a random variable. In particular, the peripheral processing times for visual, auditory, and somatosensory stimuli are assumed to be stochastically independent random variables. This leads to the first postulate of the TWIN model:

(B1) First Stage Assumption: The first stage consists in a (stochastically independent) race among the peripheral processes in the visual, auditory, and/or somatosensory pathways triggered by a cross-modal stimulus complex.

The existence of a critical “spatiotemporal window” for multisensory integration to occur has been suggested by several authors, based on both neurophysiological and behavioral findings in humans, monkey, and cat (e.g., Bell et al. 2005; Meredith 2002; Corneil et al. 2002; Meredith et al. 1987; see Navarra et al. 2005 for a recent behavioral study). This integration may manifest itself in the form of an increased firing rate of a multisensory neuron (relative to unimodal stimulation), an acceleration of saccadic reaction time (Frens et al. 1995; Diederich et al. 2003), an effective audiovisual speech integration (Van Wassenhove et al. 2007), or in an improved or degraded judgment of temporal order of bimodal stimulus pairs (cf. Spence and Squire 2003).

One of the basic tenets of the TWIN framework, however, is the priority of temporal proximity over any other type of proximity: rather than assuming a joint spatiotemporal window of integration permitting interaction to occur only for both spatially and temporally neighboring stimuli, the TWIN model allows for cross-modal interaction to occur, for example, even for spatially rather distant stimuli of different modalities as long as they fall within the time window.

(B2) TWIN Assumption: Multisensory integration occurs only if the peripheral processes of the first stage all terminate within a given temporal interval, the TWIN.

In other words, a visual and an auditory stimulus may occur at the same spatial location, or the lip movements of a speaker may be perfectly consistent with the utterance, no intersensory interaction effect will be possible if the data from the two sensory channels are registered too distant from each other in time. Thus, the window acts like a filter determining whether afferent information delivered from different sensory organs is registered close enough in time to allow for multisensory integration. Note that passing the filter is a necessary, but not sufficient, condition for multisensory integration to occur. The reason is that the amount of multisensory integration also depends on other aspects of the stimulus set, such as the spatial configuration of the stimuli. For example, response depression may occur with nearly simultaneous but distant stimuli, making it easier for the organism to focus attention on the more important event. In other cases, multisensory integration may fail to occur—despite near-simultaneity of the unisensory events—because the a priori probability for a cross-modal event is very small (e.g., Körding et al. 2007).

Although the priority of temporal proximity seems to afford more flexibility for an organism in a complex environment, the next assumption delimits the role of temporal proximity to the first processing stage:

(B3) Assumption of Temporal Separability: The amount of interaction manifesting itself in an increase or decrease of second stage processing time is a function of cross-modal stimulus features, but it does not depend on the presentation asynchrony (stimulus onset asynchrony, SOA) of the stimuli.

This assumption is based on a distinction between intra- and cross-modal stimulus properties, where the properties may refer to both subjective and physical properties. Cross-modal properties are defined when stimuli of more than one modality are present, such as spatial distance of target to nontarget, or subjective similarity between stimuli of different modalities. Intramodal properties, on the other hand, refer to properties definable for a single stimulus, regardless of whether this property is definable in all modalities (such as intensity) or in only one modality (such as wavelength for color or frequency for pitch). Intramodal properties can affect the outcome of the race in the first stage and, thereby, the probability of an interaction. Cross-modal properties may affect the amount of cross-modal interaction occurring in the second stage. Note that cross-modal features cannot influence first stage processing time because the stimuli are still being processed in separate pathways.

(B4) Second Stage Assumption: The second stage comprises all processes after the first stage including preparation and execution of a response.

The assumption of only two stages is certainly an oversimplification. Note, however, that the second stage is defined here by default: it includes all subsequent, possibly overlapping, processes that are not part of the peripheral processes in the first stage (for a similar approach, see Van Opstal and Munoz 2004). Thus, the TWIN model retains the classic notion of a race mechanism as an explanation for cross-modal interaction but restricts it to the very first stage of stimulus processing.

14.3.2. Quantifying Multisensory Integration in the TWIN Model

To derive empirically testable predictions from the TWIN framework, its assumptions must be put into more precise form. According to the two-stage assumption, total saccadic reaction time in the cross-modal condition can be written as a sum of two nonnegative random variables defined on a common probability space:

Image ch14_eq01

where S1 and S2 refer to first and second stage processing time, respectively (a base time would also be subsumed under S2). Let I denote the event that multisensory integration occurs, having probability P(I). For the expected reaction time in the cross-modal condition then follows:

Image ch14_ueq01

where E[S2|I] and E[S2|Ic] denote the expected second stage processing time conditioned on interaction occurring (I) or not occurring (Ic), respectively. Putting Δ≡E[S2|Ic] – E[S2|I], this becomes

Image ch14_eq02

That is, mean RT to cross-modal stimuli is the sum of mean RT of the first stage processing time, mean RT of the second stage processing when no interaction occurs, and the term P(I) · Δ, which is a measure of the expected amount of intersensory interaction in the second stage with positive Δ values corresponding to facilitation, and negative values corresponding to inhibition.

This factorization of expected intersensory interaction into the probability of interaction P(I) and the amount and sign of interaction (Δ) is an important feature of the TWIN model. According to Assumptions B1 to B4, the first factor, P(I), depends on the temporal configuration of the stimuli (SOA), whereas the second factor, Δ, depends on nontemporal aspects, in particular their spatial configuration. Note that this separation of temporal and nontemporal factors is in accordance with the definition of the window of integration: the incidence of multisensory integration hinges on the stimuli to occur in temporal proximity, whereas the amount and sign of interaction (Δ) is modulated by nontemporal aspects, such as semantic congruity or spatial proximity reaching, in the latter case, from enhancement for neighboring stimuli to possible inhibition for distant stimuli (cf. Diederich and Colonius 2007b).

14.3.3. Some General Predictions of TWIN

In the next section, more specific assumptions on first stage processing time, S1, and probability of interaction P(I) will be introduced to derive detailed quantitative predictions for specific experimental cross-modal paradigms. Nonetheless, even at the general level of the framework introduced thus far, a number of qualitative empirical predictions of TWIN are possible.

SOA effects. The amount of cross-modal interaction should depend on the SOA between the stimuli because the probability of integration, P(I), changes with SOA. Let us assume that two stimuli from different modalities differ considerably in their peripheral processing times. If the faster stimulus is delayed (in terms of SOA) so that the arrival times of both stimuli have a high probability of falling into the window of integration, then the amount of cross-modal interaction should be largest for that value of SOA (see, e.g., Frens et al. 1995; Colonius and Arndt 2001).

Intensity effects. Stimuli of high intensity have relatively fast peripheral processing times. Therefore, for example, if a stimulus from one modality has a high intensity compared to a stimulus from the other modality, the chance that both peripheral processes terminate within the time window will be small, assuming simultaneous stimulus presentations. The resulting low value of P(I) is in line with the empirical observation that a very strong signal will effectively rule out any further reduction of saccadic RT by adding a stimulus from another modality (e.g., Corneil et al. 2002).

Cross-modal effects.The amount of multisensory integration (Δ) and its sign (facilitation or inhibition) occurring in the second stage depend on cross-modal features of the stimulus set, for example, spatial disparity and laterality (laterality here refers to whether all stimuli appear in the same hemisphere). Cross-modal features cannot have an influence on first stage processing time because the modalities are being processed in separate pathways. Conversely, parameter Δ not depending on SOA cannot change its sign as a function of SOA and, therefore, the model cannot simultaneously predict facilitation to occur for some SOA values and inhibition for others. Some empirical evidence against this prediction has been observed (Diederich and Colonius 2008).

In the classic race model, the addition of a stimulus from a modality not yet present will increase (or, at least, not decrease) the amount of response facilitation. This follows from the fact that— even without assuming stochastic independence—the probability of the fastest of several processes terminating processing before time t will increase with the number of “racers” (e.g., Colonius and Vorberg 1994). In the case of TWIN, both facilitation and inhibition are possible under certain conditions as follows:

Number of modalities effect. The addition of a stimulus from a modality not yet present will increase (or, at least, not decrease) the expected amount of interaction if the added stimulus is not “too fast” and the time window is not “too small.” The latter restrictions are meant to guarantee that the added stimulus will fall into the time window, thereby increasing the probability of interaction to occur.


In a cross-modal experimental paradigm, the individual modalities may either be treated as being on an equal footing, or one modality may be singled out as a target modality, whereas stimuli from the remaining modalities may be ignored by the participant as nontargets. Cross-modal effects are assessed in different ways, depending on task instruction. As shown below, the TWIN model can take these different paradigms into account simply by modifying the conditions that lead to an opening of the time window.

14.4.1. Measuring Cross-Modal Effects in Focused Attention and Redundant Target Paradigms

In the redundant target paradigm (RTP; also known as the divided attention paradigm), stimuli from different modalities are presented simultaneously or with certain SOA, and the participant is instructed to respond to the stimulus detected first. Typically, the time to respond in the cross-modal condition is faster than in either of the unimodal conditions. In the focused attention paradigm (FAP), cross-modal stimulus sets are presented in the same manner, but now participants are instructed to respond only to the onset of a stimulus from a specifically defined target modality, such as the visual, and to ignore the remaining nontarget stimulus (the tactile or the auditory). In the latter setting, when a stimulus of a nontarget modality, for example, a tone, appears before the visual target at some spatial disparity, there is no overt response to the tone if the participant is following the task instructions. Nevertheless, the nontarget stimulus has been shown to modulate the saccadic response to the target: depending on the exact spatiotemporal configuration of target and nontarget, the effect can be a speedup or an inhibition of saccadic RT (see, e.g., Amlôt et al. 2003; Diederich and Colonius 2007b), and the saccadic trajectory can be affected as well (Doyle and Walker 2002).

Some striking similarities to human data have been found in a detection task utilizing both paradigms. Stein et al. (1988) trained cats to orient to visual or auditory stimuli, or both. In one paradigm, the target was a visual stimulus (a dimly illuminating LED) and the animal learned that although an auditory stimulus (a brief, low-intensity broadband noise) would be presented periodically, responses to it would never be rewarded, and the cats learned to “ignore” it (FAP). Visual–auditory stimuli were always presented spatially coincident, but their location varied from trial to trial. The weak visual stimulus was difficult to detect and the cats’ performance was <50% correct detection. However, combining the visual stimulus with the neutral auditory stimulus markedly enhanced performance, regardless of their position. A similar result was obtained when animals learned that both stimuli were potential targets (RTP). In a separate experiment in which the visual and the (neutral) auditory stimuli were spatially disparate, however, performance was significantly worse than when the visual stimulus was presented alone (cf. Stein et al. 2004).

A common method to assess the amount of cross-modal interaction is to use a measure that relates mean RT in cross-modal conditions to that in the unimodal condition. The following definitions quantify the percentage of RT enhancement in analogy to a measure proposed for measuring multisensory enhancement in neural responses (cf. Meredith and Stein 1986; Anastasio et al. 2000; Colonius and Diederich 2002; Diederich and Colonius 2004a, 2004b). For visual, auditory, and visual–auditory stimuli with observed mean (saccadic or manual) reaction time, RT V , RT A , and RT VA , respectively, and SOA = τ, the multisensory response enhancement (MRE) for the redundant target task is defined as

Image ch14_eq03

where RTAV,τ refers to observed mean RT to the bimodal stimulus with SOA = τ. For the focused attention task, MRE is defined as

assuming vision as target modality.

14.4.2. TWIN Model for the FAP

TWIN is adapted to the focused attention task by replacing the original TWIN Assumption B2 with

(B2-FAP) TWIN Assumption: In the FAP, cross-modal interaction occurs only if (1) a nontarget stimulus wins the race in the first stage, opening the TWIN such that (2) the termination of the target peripheral process falls in the window. The duration of the time window is a constant.

The idea here is that the winning nontarget will keep the saccadic system in a state of heightened reactivity such that the upcoming target stimulus, if it falls into the time window, will trigger cross-modal interaction. At the neural level, this may correspond to a gradual inhibition of fixation neurons (in the superior colliculus) and/or omnipause neurons (in the midline pontine brain stem). In the case of the target being the winner, no discernible effect on saccadic RT is predicted, such as in the unimodal situation.

The race in the first stage of the model is made explicit by assigning statistically independent, nonnegative random variables V and A to the peripheral processing times, for example, for a visual target and an auditory nontarget stimulus, respectively. With τ as SOA value and ω as integration window width parameter, Assumption B2-FAP amounts to the event that multisensory integration occurs, I FAP , being

Image ch14_ueq02

Thus, the probability of integration to occur, P(IFAP), is a function of both τ and ω, and it can be determined numerically once the distribution functions of A and V have been specified.

Expected reaction time in the bimodal condition then is (cf. Equation 14.2)

Image ch14_eq05

No interaction is possible in the unimodal condition. Thus, the expected reaction time for the visual (target) stimulus condition is

Image ch14_eq06

Note that in the focused attention task, the first stage duration is defined as the time it takes to process the (visual) target stimulus, E[V]. Cross-modal interaction (CI) is defined as difference between mean RT to the unimodal and cross-modal stimuli, i.e.,

Thus, the separation of temporal and nontemporal factors expressed in the above equation for the observable CI is directly inherited from Assumptions B4 and B2-FAP. TWIN Predictions for the FAP

The integration Assumption B2-FAP permits further specification of TWIN’s general predictions of Section 14.3.3. From a model testing point of view, it is a clear strength of the TWIN framework that it allows for numerous qualitative predictions without having to specify the probability distributions for the random processing times. Thus, a violation of any one of these predictions cannot be attributed to an inappropriate choice of the distributions but may point to a more fundamental inadequacy of one or, possibly, several model assumptions. For a quantitative fit to an observed set of data, however, some distributional assumptions are required. In the parametric version of TWIN, all peripheral processing times are assumed to be exponentially distributed (cf. Colonius and Diederich 2004b). This choice is made mainly for computational simplicity: calculating the probability of integration, P(IFAP), is straightforward, and the exponential distribution is characterized by a single quantity, the intensity parameter λ (see Appendix A). As long as predictions are limited to the level of means, no specific assumptions about the distribution of processing times in the second stage are necessary (but see Section 14.6).

Next, we demonstrate how the focused attention context leads to more specific empirically testable predictions of TWIN. Predictions relying on the parametric TWIN version are postponed to the final part of this section. If not specifically mentioned otherwise, we always assume nonnegative Δ values in the following elaborations.

SOA effects. When the nontarget is presented very late relative to the target (large positive SOA), its chance of winning the race against the target and thus opening the window of integration become very small. When it is presented rather early (large negative SOA), it is likely to win the race and to open the window, but the window may be closed by the time the target arrives. Again, the probability of integration, P(IFAP), is small. Therefore, the largest probability of integration is expected for some midrange SOA values. Although P(IFAP) is unobservable, it should leave its mark on a well-known observable measure, i.e., MRE. In fact, MREFAP, defined in (Equation 14.4) as a function of SOA, should have the same form as P(IFAP), scaled only by some constant:

Image ch14_eq08

Intensity effects. Increasing the intensity of the visual stimulus will speed up visual peripheral processing (up to some minimum level) thereby increasing the chance for the visual target to win the race. Thus, the probability that the window of integration opens decreases, predicting less multisensory integration. Increasing the intensity of the nontarget auditory stimulus, on the other hand, leads to the opposite prediction: the auditory stimulus will have a better chance to win the race and to open the window of integration, hence, predicting more multisensory integration to occur on average. Two further distinctions can be made. For large negative SOA, i.e., when the auditory nontarget arrives very early, further increasing the auditory intensity makes is more likely for the TWIN to close before the target arrives and therefore results in a lower P(IFAP) value. For smaller negative SOA, however, i.e., when the nontarget is presented shortly before the target, increasing the auditory intensity improves its chances to win against the target and to open the window. Given the complexity of these intensity effects, however, more specific quantitative predictions will require some distributional assumptions for the first stage processing times (see below). Alternatively, it may be feasible to adapt the “double factorial paradigm” developed by Townsend and Nozawa (1995) to analyze predictions when the effects of both targets and nontargets presented at two different intensities levels are observed.

Cross-modal effects. If target and nontarget are presented in two distinct cross-modal conditions, one would expect parameter Δ to take on two different values. For example, for two spatial conditions, ipsilateral and contralateral, the values could be Δi and Δc, respectively. Subtracting the corresponding cross-modal interaction terms then gives (cf. Equation 14.7)

Image ch14_eq09

an expression that should again yield the same qualitative behavior, as a function of SOA, as P(IFAP). In a similar vein, one can capitalize on the factorization of expected cross-modal interaction if some additional experimental factor affecting Δ, but not P(IFAP), is available. In Colonius et al. (2009), an auditory background masker stimulus, presented at increasing intensity levels, was hypothesized to simultaneously increase Δc and decrease Δi. The ratio of CIs in both configurations,

should then remain invariant across SOA values, with a separate value for each level of the masker.

Number of nontargets effects. For cross-modal interaction to occur in the focused attention task, it is necessary that the nontarget process wins the race in the first stage. With two or more nontargets entering the race, the probability of one of them winning against the target process increases and, therefore, the probability of opening the window of integration increases with the number of nontargets present. In this case, there are even two different ways of utilizing the factorization of CI, both requiring the existence of two cross-modal conditions with two different Δ parameters (spatial or other). The first test is analogous to the previous one. Because the number of nontargets affects P(IFAP) only, the ratio in Equation 14.10 should be the same whether it is computed from conditions with one or two nontargets. The second test results from taking the ratio of CI based on one non-target, C1, over CI based on two nontargets, C2. Because Δ should not be affected by the number of nontargets, the ratio

Image ch14_eq11

where P1 and P2 refer to the probability of opening the window under one or two nontargets, respectively, should be the same, no matter from which one of the two cross-modal conditions it was computed. In the study of Diederich and Colonius (2007a), neither of these tests revealed evidence against these TWIN predictions.

SOA and intensity effects predicted by a parametric TWIN version. Assuming exponential distributions for the peripheral processing times, the intensity parameter for the visual modality is set to 1/λV = 50 (ms) and to 1/λA = 10, 30, 70, or 90 (ms) for the auditory nontarget. Quantitative predictions of TWIN for focused attention are shown in the left of Figure 14.1. Panels 1 and 2 show mean RT and P(IFAP) as a function of SOA for the various intensities of the auditory nontarget. Note that two intensities result in faster mean RT, whereas two intensities result in lower mean RT, compared to mean unimodal RT to the visual target. Here, the parameter for second stage processing time when no integration occurs, μ, was set to 100 ms. The TWIN was set to 200 ms. The parameter for multisensory integration was set to Δi = 20 ms for bimodal stimuli presented ipsilaterally, implying a facilitation effect. Note that neither λV nor μ are directly observable, but the sum of the peripheral and central processing time for the visual target stimulus constitutes a prediction for unimodal mean saccadic RT:

FIGURE 14.1. TWIN predictions for FAP (left panels) and RTP (right panels).


TWIN predictions for FAP (left panels) and RTP (right panels). Parameters in both paradigms were chosen to be identical. Mean RT for visual stimulus is 150 ms (1/λV = 50, μ = 100). Peripheral processing times for auditory stimuli are 1/λ (more...)

Image ch14_ueq03

which, for the present example, is 50 ms + 100 ms = 150 ms. The dashed line and the dotted line show the bimodal RT predictions for the auditory nontargets with the highest and lowest intensity, respectively.

No fits to empirical data sets are presented here, but good support of TWIN has been found thus far (see, e.g., Diederich and Colonius 2007a, 2007b, 2008; Diederich et al. 2008). Close correspondence between data and model prediction, however, is not the only aspect to consider. Importantly, the pattern of parameter values estimated for a given experimental setting should suggest a meaningful interpretation. For example, increasing stimulus intensities are reflected in a decrease of the corresponding λ parameters, assuming higher intensities to lead to faster peripheral processing times (at least, within certain limits). Furthermore, in the study with an auditory background masker (Colonius et al. 2009), the cross-modal interaction parameter (Δ) was a decreasing or increasing function of masker level for the contralateral or ipsilateral condition, respectively, as predicted.

14.4.3. TWIN Model for RTP

TWIN is adapted to the redundant target task by replacing the original TWIN Assumption B2 by

(B2-RTP) TWIN Assumption: In the RTP, (1) the window of integration is opened by whichever stimulus wins the race in the first stage and (2) cross-modal interaction occurs if the termination of the peripheral process of a stimulus of another modality falls within the window. The duration of the time window is a constant.

Obviously, if stimuli from more than two modalities are presented, the question of a possible additional effect on cross-modal interaction arises. There is both behavioral and neurophysiological evidence for trimodal interaction (e.g., Diederich and Colonius 2004b; Stein and Meredith 1993), but data from saccadic eye movement recordings do not yet seem to be conclusive enough to justify further elaboration of Assumption B2-RTP.

To compute the probability of interaction in the RTP, P(IRTP), we assume that a visual and an auditory stimulus are presented with an SOA equal to τ. Then, either the visual stimulus wins, V < A + τ or the auditory stimulus wins, A + τ < V; so, in either case, min(V, A + τ < max(V, A + τ) and, by Assumption B2-RTP,

Image ch14_ueq04

Thus, the probability of integration to occur is a function of both τ and ω, as before. Expected reaction time in the cross-modal condition is computed as (see Equation 14.2)

Image ch14_eq12

In the RTP, first stage duration is determined by the termination time of the winner. This is an important difference to the focused attention situation in which first stage duration is defined by the time it takes to process the (visual) target stimulus. Even for a zero probability of interaction, expected reaction time in the bimodal condition is smaller than, or equal to, either of the unimodal stimulus conditions. These are

Image ch14_eq13


Image ch14_eq14

because in the redundant target version of TWIN, the race in the first stage produces a statistical facilitation effect equivalent to the one in the classic race model. Thus, a possible cross-modal enhancement observed in a redundant target task may be because of multisensory integration or statistical facilitation, or both. Moreover, a possible cross-modal inhibition effect may be weakened by the simultaneous presence of statistical facilitation in the first stage. Predictions for the redundant target case are less straightforward than for focused attention because the factorization of cross-modal interaction (CI) in the latter is no longer valid. Nevertheless, some general predictions can be made assuming, as before, a multisensory facilitation effect, i.e., Δ > 0. TWIN Predictions for RTP

In this paradigm, both stimuli are on an equal footing and, therefore, negative SOA values need not be introduced. Each SOA value now indicates the time between the stimulus presented first and the one presented second, regardless of modality.

SOA effects. The probability of cross-modal interaction decreases with increasing SOA: the later the second stimulus is presented, the less likely it is to win the race and to open the window of integration; alternatively, if the window has already been opened by the first stimulus, the less likely it is to fall into that window with increasing SOA. For large enough SOA values, mean saccadic RT in the cross-modal condition approaches the mean for the stimulus presented first.

To fix ideas, we now assume, without loss of generality, that a visual stimulus of constant intensity is presented first and that an auditory stimulus is presented second, or simultaneous with the visual, and at different intensities. Predictions then depend on the relative intensity difference between both stimuli. Note that the unimodal means constitute upper bounds for bimodal mean RT.

Intensity effects. For a visual stimulus presented first, increasing the intensity of the auditory stimulus (presented second) increases the amount of facilitation.

SOA and intensity effects predicted by a parametric TWIN version. Figure 14.1 (right panels) shows the quantitative predictions of TWIN for SOA and intensity variations under exponential distributions for the peripheral processing times. Parameters are the same as for the FAP predictions (left panels). Panels 1 and 2 show mean RT and P(I) as a function of SOA for various intensity levels (λ parameters) of the auditory stimulus. Both panels exhibit the predicted monotonicity in SOA and intensity. The third panel, depicting MRE, reveals some nonmonotonic behavior in both SOA and intensity.

Without going into numerical details, this nonmonotonicity of MRE can be seen to be because of a subtle interaction between two mechanisms, both being involved in the generation of MRE: (1) statistical facilitation occurring in the first stage and (2) opening of the time window. The former is maximal if presentation of the stimulus processed faster is delayed by an SOA equal to the difference in mean RT in the unimodal stimulus conditions, that is when peripheral processing times are in physiological synchrony; for example, if mean RT to an auditory stimulus is 110 ms and mean RT to a visual stimulus is 150 ms, the maximal amount of statistical facilitation is expected when the auditory stimulus is presented 150 ms − 110 ms = 40 ms after the visual stimulus. The SOA value being “optimal” for statistical facilitation, however, need not be the one producing the highest probability of opening the time window that was shown to be decreasing with SOA. Moreover, the nonmonotonicity in intensity becomes plausible if one realizes that variation in intensity results in a change in mean processing time analogous to an SOA effect: for example, lowering auditory stimulus intensity has an effect on statistical facilitation and the probability of opening the time window that is comparable to increasing SOA.

14.4.4. Focused Attention versus RTP

Top-down versus bottom-up. The distinction between RTP and FAP is not only an interesting experimental variation as such but it may also provide an important theoretical aspect. In fact, because physically identical stimuli can be presented under the same spatiotemporal configuration in both paradigms, any differences observed in the corresponding reaction times would have to be because of the instructions being different, thereby pointing to a possible separation of top-down from bottom-up processes in the underlying multisensory integration mechanism.

Probability of integration. Moreover, comparing both paradigms yields some additional insight into the mechanics of TWIN. Note that under equivalent stimulus conditions, IFAPIRTP; this relation follows from the observation that

Image ch14_ueq05

It means that any realization of the peripheral processing times that leads to an opening of the time window under the focused attention instruction also leads to the same event under the redundant target instruction. Thus, the probability of integration under redundant target instructions cannot be smaller than that under focused attention instruction: P(IFAP) ≤ P(IRTP), given identical stimulus conditions (see also Figure 14.1).

Inverse effectiveness. It is instructive to consider the effect of varying stimulus intensity in both paradigms when both stimuli are presented simultaneously (SOA = 0) and at intensity levels producing the same mean peripheral speed, i.e., with the same intensity parameters, λV = λA. Assuming exponential distributions, Figure 14.2 depicts the probability of integration (upper panels) and MRE (lower panels) as a function of time window width (ω) for both paradigms and with each curve presenting a specific intensity level. The probability of integration increases monotonically from zero (for ω = 0) toward 0.5 for the focused attention, and toward 1 for the RTP. For the former, the probability of integration cannot surpass 0.5 because, for any given window width, the target process has the same chance of winning as the nontarget process under the given λ parameters. For both paradigms, P(I), as a function of ω, is ordered with respect to intensity level: it increases monotonically with the mean processing time of both stimuli* (upper panels of Figure 14.2). The same ordering is found for MRE in the FAP; somewhat surprisingly, however, the ordering is reversed for MRE in the RTP: increasing intensity implies less enhancement, i.e., it exhibits the “inverse effectiveness” property often reported in empirical studies (Stein and Meredith 1993; Rowland and Stein 2008). Similar to the above discussion of intensity effects for RTP, this is because of an interaction generated by increasing intensity: it weakens statistical facilitation in first stage processing but simultaneously increases the probability of integration.

FIGURE 14.2. TWIN predictions for FAP (left panels) and RTP (right panels) as a function of time window width (ω) at SOA = 0.


TWIN predictions for FAP (left panels) and RTP (right panels) as a function of time window width (ω) at SOA = 0. Upper panels depict probability of integration P(I), whereas lower panels show MRE. Each curve corresponds to a specific intensity (more...)


Although estimates for the TWIN vary somewhat across subjects and task specifics, a 200-ms width showed up in several studies (e.g., Eimer 2001; Sinclair and Hammond 2009). In a focused attention task, when the nontarget occurs at an early point in time (i.e., 200 ms or more before the target), a substantial decrease of RT compared to the unimodal condition has been observed by Diederich and Colonius (2007a). This decrease, however, no longer depended on whether target and nontarget appeared at ipsilateral or contralateral positions, thus supporting the hypothesis that the nontarget plays the role of a spatially unspecific alerting cue, or warning signal, for the upcoming target whenever the SOA is large enough.

The hypothesis of increased cross-modal processing triggered by an alerting cue had already been advanced by Nickerson (1973), who called it “preparation enhancement.” In the eye movement literature, the effects of a warning signal have been studied primarily in the context of explaining the “gap effect,” i.e., the latency to initiate a saccade to an eccentric target is reduced by extinguishing the fixation stimulus approximately 200 ms before target onset (Reuter-Lorenz et al. 1991; Klein and Kingston 1993). An early study on the effect of auditory or visual warning signals on saccade latency, but without considering multisensory integration effects, was conducted by Ross and Ross (1981).

Here, the dual role of the nontarget—inducing multisensory integration that is governed by the above-mentioned spatiotemporal rules, on the one hand, and acting as a spatially unspecific cross-modal warning cue, on the other—will be taken into account by an extension of TWIN that yields an estimate of the relative contribution of either mechanism for any specific SOA value.

(W) Assumption on warning mechanism: If the nontarget wins the processing race in the first stage by a margin wide enough for the TWIN to be closed again before the arrival of the target, then subsequent processing will be facilitated or inhibited (“warning effect”) without dependence on the spatial configuration of the stimuli.*

The time margin by which the nontarget may win against the target will be called head start denoted as γ. The assumption stipulates that the head start is at least as large as the width of the time window for a warning effect to occur. That is, the warning mechanism of the nontarget is triggered whenever the nontarget wins the race by a head start γ ≥ ω ≥ 0. Taking, for concreteness, the auditory as nontarget modality, occurrence of a warning effect corresponds to the event:

Image ch14_ueq06

The probability of warning to occur, P(W), is a function of both τ and γ. Because γ ≥ ω ≥ 0 this precludes the simultaneous occurrence of both warning and multisensory interaction within one and the same trial and, therefore, P(IW) = 0 (because no confusion can arise, we write I for IFAP throughout this section). The actual value of the head start criterion is a parameter to be estimated in fitting the model under Assumption W.

The expected saccadic reaction time in the cross-modal condition in the TWIN model with warning assumption can then be shown to be

Image ch14_ueq07

where E[S2|I], E[S2|W], and E[S2|IcWc] denote the expected second stage processing time conditioned on interaction occurring (I), warning occurring (W), or neither of them occurring (IcWc), respectively (Ic, Wc stand for the complement of events I, W). Setting

Image ch14_ueq08

where κ denotes the amount of the warning effect (in milliseconds), this becomes

Image ch14_eq15

In the unimodal condition, neither integration nor warning are possible. Thus,

Image ch14_eq16

and we arrive at a simple expression for the combined effect of multisensory integration and warning, cross-modal interaction (CI),

Recall that the basic assumptions of TWIN imply that for a given spatial configuration and nontarget modality, there are no sign reversals or changes in magnitude of Δ across all SOA values. The same holds for κ. Note, however, that Δ and κ can separately take on positive or negative values (or zero) depending on whether multisensory integration and warning have a facilitative or inhibitory effect. Furthermore, for the probability of integration P(I), the probability of warning P(W) does change with SOA.

14.5.1. TWIN Predictions for FAP with Warning

The occurrence of a warning effect depends on intramodal characteristics of the target and the nontarget, such as modality or intensity. Assuming that increasing stimulus intensity goes along with decreased reaction time (for auditory stimuli, see, e.g., Frens et al. 1995; Arndt and Colonius 2003; for stimuli, see Diederich and Colonius 2004b), TWIN makes specific predictions regarding the effect of nontarget intensity variation.

Intensity effects. An intense (auditory) nontarget may have a higher chance to win the race with a head start compared to a weak nontarget. In general, increasing the intensity of the nontarget (1) increases the probability of it functioning as a warning signal, and (2) makes it more likely for the nontarget to win the peripheral race against the target process.

SOA effects. The probability of warning P(W) decreases monotonically with SOA: the later the nontarget is presented, the smaller its chances to win the race against the target with some head start γ. This differs from the nonmonotonic relationship predicted between P(IFAP) and SOA (see above). It is interesting to note that the difference in how P(I) and P(W) should depend on SOA is, in principle, empirically testable without any distributional assumptions by manipulating the conditions of the experiment. Specifically, if target and nontarget are presented in two distinct spatial conditions, for example, ipsilateral and contralateral, one would expect Δ to take on two different values, Δi and Δc, whereas P(W) · κ, the expected nonspatial warning effect, should remain the same under both conditions. Subtracting the corresponding cross-modal interaction terms then gives, after canceling the warning effect terms (Equation 14.17),

Image ch14_eq18

This expression is an observable function of SOA and, because the factor Δi – Δc does not depend on SOA by Assumption B3, it should exhibit the same functional form as P(I): increasing and then decreasing (see Figure 14.1, middle left panel).

Context effects. The magnitude of the warning effect may be influenced by the experimental design. Specifically, presenting nontargets from different modalities in two distinct presentation modes, e.g., blocking or mixing the modality of the auditory and tactile nontargets within an experimental block of trials, such that supposedly no changes in the expected amount of multisensory integration should occur, then subtraction of the corresponding CI values yields, after canceling the integration effect terms,

Image ch14_eq19

a quantity that should decrease monotonically with SOA because P(W) does.

The extension of the model to include warning effects has been probed for both auditory and tactile nontargets. Concerning the warning assumptions, no clear superiority of version A over version B was found in the data. For detailed results on all of the tests described above, we refer the reader to Diederich and Colonius (2008).

SOA and intensity: quantitative predictions. To illustrate the predictions of TWIN with warning for mean SRT, we choose the following set of parameters. As before, the intensity parameter for the visual modality is set to 1/λv = 50 (ms) and to 1/λA = 10, 30, 70, or 90 (ms) for the (auditory) nontarget, the parameter for second stage processing time when no integration and no warning occurs, μ ≡ E[S2|IcWc], is set to 100 ms, and the TWIN to 200 ms. The parameter for multisensory integration is set to Δi = 20 ms for bimodal stimuli presented ipsilaterally, and κ is set to 5 ms (Figure 14.3).

FIGURE 14.3. TWIN predictions for FAP when only warning occurs (left panels) and when both integration and warning occur (right panels).


TWIN predictions for FAP when only warning occurs (left panels) and when both integration and warning occur (right panels). Parameters are chosen as before: 1/λV = 50 and μ = 100, resulting in a mean RT for visual stimulus of 150 ms. Peripheral (more...)


The main contribution of the TWIN framework thus far is to provide an estimate of the multisensory integration effect—and, for the extended model, also of a possible warning effect—that is “contaminated” neither by a specific SOA nor by intramodal stimulus properties such as intensity. This is achieved through factorizing* expected cross-modal interaction into the probability of interaction in a given trial, P(I), times the amount of interaction Δ (cf. Equation 14.2), the latter being measured in milliseconds. Some potential extensions of the TWIN framework are discussed next.

Although the functional dependence of P(I) on SOA and stimulus parameters is made explicit in the rules governing the opening and closing of the time window, the TWIN model framework as such does not stipulate a mechanism for determining the actual amount of interaction. By Assumption B4, Δ depends on cross-modal features like, for example, spatial distance between the stimuli of different modalities, and by systematically varying the spatial configuration, some insight into the functional dependence can be gained (e.g., Diederich and Colonius 2007b). Given the diversity of intersensory interaction effects, however, it would be presumptuous to aim at a single universal mechanism for predicting the amount of Δ. This does not preclude incorporating multisensory integration mechanisms into the TWIN framework within a specific context such as a spatial orienting task. Such an approach, which includes stipulating distributional properties of second stage processing time in a given situation, would bring along the possibility of a stronger quantitative model test, namely at the level of the entire observable reaction time distribution rather than at the level of means only.

In line with the framework of modeling multisensory integration as (nearly) optimal decision making (Körding et al. 2007), we have recently suggested a decision rule that determines an optimal window width as a function of (1) the prior odds in favor of a common multisensory source, (2) the likelihood of arrival time differences, and (3) the payoff for making correct or wrong decisions (Colonius and Diederich 2010).

Another direction is to extend the TWIN framework to account for additional experimental paradigms. For example, in many studies, a subject’s task is not simply to detect the target but to perform a speeded discrimination task between two stimuli (Driver and Spence 2004). Modeling this task implies not only a prediction of reaction time but also of the frequency of a correct or incorrect discrimination response. Traditionally, such data have been accommodated by assuming an evidence accumulation mechanism sequentially sampling information from the stimulus display favoring either response option A or B, for example, and stopping as soon as a criterion threshold for one or the other alternative has been reached. A popular subclass of these models are the diffusion models, which have been considered models of multisensory integration early on (Diederich 1995, 2008). At this point, however, it is an open question how this approach can be reconciled with the TWIN framework.

One of the most intriguing neurophysiological findings has been the suppression of multisensory integration ability of superior colliculus neurons by a temporary suspension of corticotectal inputs from the anterior ectosylvian sulcus and the lateral suprasylvian sulcus (Clemo and Stein 1986; Jiang et al. 2001). A concomitant effect on multisensory orientation behavior observed in the cat (Jiang et al. 2002) suggests the existence of more general cortical influences on multisensory integration. Currently, there is no explicit provision of a top-down mechanism in the TWIN framework. Note, however, that the influence of task instruction (FAP vs. RTP) is implicitly incorporated in TWIN because the probability of integration is supposed to be computed differently under otherwise identical stimulus conditions (cf. Section 14.4.4). It is a challenge for future development to demonstrate that the explicit incorporation of top-down processes can be reconciled with the two-stage structure of the TWIN framework.



The peripheral processing times V for the visual and A for the auditory stimulus have an exponential distribution with parameters λV and λA, respectively. That is,

Image ch14_ueq09

for t ≥ 0, and fV(t) = fA(t) ≡ 0 for t < 0. The corresponding distribution functions are referred to as FV(t) and FA(t).

A.1.1. Focused Attention Paradigm

The visual stimulus is the target and the auditory stimulus is the nontarget. By definition,

Image ch14_ueq10

where τ denotes the SOA value and ω is the width of the integration window. Computing the integral expression requires that we distinguish between three cases for the sign of τ + ω:

  1. (1) τ < τ + ω < 0
Image ch14_ueq11
  1. (2) τ < 0 < τ + ω
Image ch14_ueq12
  1. (3) 0 < τ < τ + ω
Image ch14_ueq13

The mean RT for cross-modal stimuli is

Image ch14_ueq14

and the mean RT for the visual target is

Image ch14_ueq15

where 1/λV, the mean of the exponential distribution, is the mean RT of the first stage and μ is the mean RT of the second stage when no interaction occurs.

A.1.2. Redundant Target Paradigm

The visual stimulus is presented first and the auditory stimulus second. By definition,

Image ch14_ueq16

If the visual stimulus wins:

  1. (1) 0 ≤ τ ≤ ω
Image ch14_ueq17
  1. (2) 0 < ω ≤ τ
Image ch14_ueq18

If the auditory stimulus wins: 0 < τ ≤ τ + ω and

Image ch14_ueq19

The probability that the visual or the auditory stimulus wins is therefore

Image ch14_ueq20

The mean RT for cross-modal stimuli is

Image ch14_ueq21

and the mean RT for the visual and auditory stimulus is

Image ch14_ueq22


Image ch14_ueq23

A.1.3. Focused Attention and Warning

By definition,

Image ch14_ueq24

Again, we need to consider different cases:

  1. (1) τ + γA < 0
Image ch14_ueq25
  1. (2) τ + γA ≥ 0
Image ch14_ueq26

The mean RT for cross-modal stimuli is

Image ch14_ueq27

where 1/λV is the mean RT of the first stage, μ is the mean RT of the second stage when no interaction occurs, P(IFAP) · Δ is the expected amount of intersensory interaction, and P(W) · κ is the expected amount of warning.


  1. Amlôt R, Walker R, Driver J, Spence C. Multimodal visual-somatosensory integration in saccade generation. Neuropsychologia. 2003;41:1–15. [PubMed: 12427561]
  2. Anastasio T.J, Patton P.E, Belkacem-Boussaid K. Using Bayes’ rule to model multisensory enhancement in the superior colliculus. Neural Computation. 2000;12:1165–1187. [PubMed: 10905812]
  3. Arndt A, Colonius H. Two separate stages in crossmodal saccadic integration: Evidence from varying intensity of an auditory accessory stimulus. Experimental Brain Research. 2003;150:417–426. [PubMed: 12728291]
  4. Bell A.H, Meredith A, Van Opstal A.J, Munoz D.P. Crossmodal integration in the primate superior colliculus underlying the preparation and initiation of saccadic eye movements. Journal of Neurophysiology. 2005;93:3659–3673. [PubMed: 15703222]
  5. Clemo H.R, Stein B.E. Effects of cooling somatosensory corticotectal influences in cat. Journal of Neurophysiology. 1986;55:1352–1368. [PubMed: 3734860]
  6. Colonius H, Arndt P. A two-stage model for visual-auditory interaction in saccadic latencies. Perception & Psychophysics, 2001;63:126–147. [PubMed: 11304009]
  7. Colonius H, Diederich A. A maximum-likelihood approach to modeling multisensory enhancement. In: Ditterich T.G, Becker S, Ghahramani Z, editors. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press; 2002. p. 14.
  8. Colonius H, Diederich A. Multisensory interaction in saccadic reaction time: A time-window-of-integration model. Journal of Cognitive Neuroscience. 2004;16:1000–1009. [PubMed: 15298787]
  9. Colonius H, Diederich A. Race model inequality: Interpreting a geometric measure of the amount of violation. Psychological Review. 2006;113(1):148–154. [PubMed: 16478305]
  10. Colonius H, Diederich A. The optimal time window of visual–auditory integration: A reaction time analysis. Frontiers in Integrative Neuroscience. 2010;4:11. doi:10.3389/fnint.2010.00011. [PMC free article: PMC2871715] [PubMed: 20485476]
  11. Colonius H, Vorberg D. Distribution inequalities for parallel models with unlimited capacity. Journal of Mathematical Psychology. 1994;38:35–58.
  12. Colonius H, Diederich A, Steenken R. Time-window-of-integration (TWIN) model for saccadic reaction time: Effect of auditory masker level on visual-auditory spatial interaction in elevation. Brain Topography. 2009;21:177–184. [PubMed: 19337824]
  13. Corneil B.D, Munoz D.P. The influence of auditory and visual distractors on human orienting gaze shifts. Journal of Neuroscience. 1996;16:8193–8207. [PubMed: 8987844]
  14. Corneil B.D, Van Wanrooij M, Munoz D.P, Van Opstal A.J. Auditory-visual interactions subserving goal-directed saccades in a complex scene. Journal of Neurophysiology. 2002;88:438–454. [PubMed: 12091566]
  15. Diederich A. Intersensory facilitation of reaction time: Evaluation of counter and diffusion coactivation models. Journal of Mathematical Psychology. 1995;39:197–215.
  16. Diederich A. A further test on sequential sampling models accounting for payoff effects on response bias in perceptual decision tasks. Perception & Psychophysics. 2008;70(2):229–256. [PubMed: 18372746]
  17. Diederich A, Colonius H. Modeling the time course of multisensory interaction in manual and saccadic responses. In: Calvert G, Spence C, Stein B.E, editors. Handbook of multisensory processes. Cambridge, MA: MIT Press; 2004a. pp. 395–408.
  18. Diederich A, Colonius H. Bimodal and trimodal multisensory enhancement: Effects of stimulus onset and intensity on reaction time. Perception & Psychophysics. 2004b;66(8):1388–1404. [PubMed: 15813202]
  19. Diederich A, Colonius H. Why two "distractors" are better than one: Modeling the effect of nontarget auditory and tactile stimuli on visual saccadic reaction time. Experimental Brain Research. 2007a;179:43–54. [PubMed: 17216154]
  20. Diederich A, Colonius H. Modeling spatial effects in visual-tactile saccadic reaction time. Perception & Psychophysics. 2007b;69(1):56–67. [PubMed: 17515216]
  21. Diederich A, Colonius H. Crossmodal interaction in saccadic reaction time: Separating multisensory from warning effects in the time window of integration model. Experimental Brain Research. 2008;186:1–22. [PubMed: 18004552]
  22. Diederich A, Colonius H, Bockhorst D, Tabeling S. Visual–tactile spatial interaction in saccade generation. Experimental Brain Research. 2003;148:328–337. [PubMed: 12541144]
  23. Diederich A, Colonius H, Schomburg A. Assessing age-related multisensory enhancement with the time-window-of-integration model. Neuropsychologia. 2008;46:2556–2562. [PubMed: 18490033]
  24. Doyle M.C, Walker R. Multisensory interactions in saccade target selection: Curved saccade trajectories. Experimental Brain Research. 2002;142:116–130. [PubMed: 11797089]
  25. Driver J, Spence C. Crossmodal spatial attention: Evidence from human performance. In: Spence C, Driver J, editors. Crossmodal space and crossmodal attention. Oxford: Oxford Univ. Press; 2004. pp. 179–220.
  26. Eimer M. Crossmodal links in spatial attention between vision, audition, and touch: Evidence from event-related brain potentials. Neuropsychologia. 2001;39:1292–1303. [PubMed: 11566312]
  27. Frens M.A, Van Opstal A.J, Van der Willigen R.F. Spatial and temporal factors determine auditory- visual interactions in human saccadic eye movements. Perception & Psychophysics. 1995;57:802–816. [PubMed: 7651805]
  28. Harrington L.K, Peck C.K. Spatial disparity affects visual-auditory interactions in human sensorimotor processing. Experimental Brain Research. 1998;122:247–252. [PubMed: 9776523]
  29. Hershenson M. Reaction time as a measure of intersensory facilitation. Journal of Experimental Psychology. 1962;63:289–293. [PubMed: 13906889]
  30. Hughes H.C, Reuter-Lorenz P.-A, Nozawa G, Fendrich R. Visual–auditory interactions in sensorimotor processing: Saccades versus manual responses. Journal of Experimental Psychology: Human Perception and Performance. 1994;20:131–153. [PubMed: 8133219]
  31. Hughes H.C, Nelson M.D, Aronchick D.M. Spatial characteristics of visual-auditory summation in human saccades. Vision Research. 1998;38:3955–3963. [PubMed: 10211387]
  32. Jiang W, Wallace M.T, Jiang H, Vaughan J.W, Stein B.E. Two cortical areas mediate multisensory integration in superior colliculus neurons. Journal of Neurophysiology. 2001;85:506–522. [PubMed: 11160489]
  33. Jiang W, Jiang H, Stein B.E. Two cortical areas facilitate multisensory orientation behaviour. Journal of Cognitive Neuroscience. 2002;14:1240–1255. [PubMed: 12495529]
  34. Körding K.P, Beierholm U, Ma W.J, Quartz S, Tenenbaum J.B. et al. Causal inference in multisensory perception. PLoS ONE. 2007;2(9):e943. doi:10.1371/journal.pone.0000943. [PMC free article: PMC1978520] [PubMed: 17895984]
  35. Klein R, Kingstone A. Why do visual offsets reduce saccadic latencies? Behavioral and Brain Sciences. 1993;16(3):583–584.
  36. Luce R.D. Response times: Their role in inferring elementary mental organization. New York: Oxford Univ. Press; 1986.
  37. Meredith M.A. On the neural basis for multisensory convergence: A brief overview. Cognitive Brain Research. 2002;14:31–40. [PubMed: 12063128]
  38. Meredith M.A, Stein B.E. Visual, auditory, and somatosensory convergence on cells in superior colliculus results in multisensory integration. Journal of Neurophysiology. 1986;56:640–662. [PubMed: 3537225]
  39. Meredith M.A, Nemitz J.W, Stein B.E. Determinants of multisensory integration in superior colliculus neurons. I. Temporal factors. Journal of Neuroscience. 1987;10:3215–3229. [PubMed: 3668625]
  40. Miller J.O. Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology. 1982;14:247–279. [PubMed: 7083803]
  41. Munoz D.P, Wurtz R. H. Saccade-related activity in monkey superior colliculus. I. Characteristics of burst and buildup cells. Journal of Neurophysiology. 1995;73:2313–2333. [PubMed: 7666141]
  42. Navarra J, Vatakis A, Zampini M, Soto-Faraco S, Humphreys W, Spence C. Exposure to asynchronous audiovisual speech extends the temporal window for audiovisual integration. Cognitive Brain Research. 2005;25:499–507. [PubMed: 16137867]
  43. Nickerson R.S. Intersensory facilitation of reaction time: Energy summation or preparation enhancement. Psychological Review. 1973;80:489–509. [PubMed: 4757060]
  44. Raab D.H. Statistical facilitation of simple reaction times. Transactions of the New York Academy of Science. 1962;24:574–590. [PubMed: 14489538]
  45. Reuter-Lorenz P.A, Hughes H.C, Fendrich R. The reduction of saccadic latency by prior offset of the fixation point: An analysis of the gap effect. Perception & Psychophysics. 1991;49(2):167–175. [PubMed: 2017353]
  46. Ross S.M, Ross L.E. Saccade latency and warning signals: Effects of auditory and visual stimulus onset and offset. Perception & Psychophysics. 1981;29(5):429–437. [PubMed: 7279569]
  47. Rowland B.A, Stein B.E. Temporal profiles of response enhancement in multisensory integration. Frontiers in Neuroscience. 2008;2:218–224. [PMC free article: PMC2622754] [PubMed: 19225595]
  48. Schweickert R, Fisher D.L, Sung K. Discovering Cognitive Architecture by Selectively Influencing Mental Processes. London: World Scientific Publishing (in press);
  49. Sinclair C, Hammond G.R. Excitatory and inhibitory processes in primary motor cortex during the foreperiod of a warned reaction time task are unrelated to response expectancy. Experimental Brain Research. 2009;194:103–113. [PubMed: 19139864]
  50. Spence C, Squire S. Multisensory integration: Maintaining the perception of synchrony. Current Biology. 2003;13:R519–R521. [PubMed: 12842029]
  51. Stein B.E, Meredith M.A. The Merging of the Senses. Cambridge, MA: MIT Press; 1993.
  52. Stein B.E, Huneycutt W.S, Meredith M.A. Neurons and behavior: The same rules of multisensory integration apply. Brain Research. 1988;448:355–358. [PubMed: 3378157]
  53. Stein B.E, Jiang W, Stanford T.R. Multisensory integration in single neurons in the midbrain. In: Calvert G, Spence C, Stein B.E, editors. Handbook of multisensory processes. Cambridge, MA: MIT Press; 2004. pp. 243–264.
  54. Sternberg S. Separate modifiability, mental modules, and the use of pure and composite measures to reveal them. Acta Psychologica. 2001;106:147–246. [PubMed: 11256336]
  55. Todd J.W. Reaction to multiple stimuli, in. In: Woodworth R.S, editor. Archives of Psychology, No. 25. Columbia contributions to philosophy and psychology. No. 8. Vol. XXI. New York: The Science Press; 1912.
  56. Townsend J.T, Nozawa G. Spatio-temporal properties of elementary perception: An investigation of parallel, serial, and coactive theories. Journal of Mathematical Psychology. 1995;39:321–359.
  57. Van Opstal A.J, Munoz D.P. Auditory–visual interactions subserving primate gaze orienting. In: G. Calvert, Spence C, Stein B.E, editors. Handbook of multisensory processes. Cambridge, MA: MIT Press; 2004. pp. 373–393.
  58. Van Wassenhove V, Grant K.W, Poeppel D. Temporal window of integration in auditory-visual speech perception. Neuropsychologia. 2007;45:598–607. [PubMed: 16530232]
  59. Van Zandt T. Analysis of response time distributions. In: Pashler H, editor. Stevens’ handbook of experimental psychology. 3rd edn. vol. 4. New York: Wiley & Sons, Inc; 2002.
  60. Whitchurch E.A, Takahashi T.T. Combined auditory and visual stimuli facilitate head saccades in the barn owl (Tyto alba) Journal of Neurophysiology. 2006;96:730–745. [PubMed: 16672296]



See Section 14.6 for possible extensions to other measures of performance.


This is because of a property of the exponential distribution: mean and SD are identical.


In the study of Diederich and Colonius 2008, an alternative version of this assumption was considered as well (version B). If the nontarget wins the processing race in the first stage by a wide enough margin, then subsequent processing will in part be facilitated or inhibited without dependence on the spatial configuration of the stimuli. This version is less restrictive: All that is needed for the nontarget to act as a warning signal is a “large enough” headstart against the target in the race and P(IW) can be larger than 0. Assuming that the effects on RT of the two events I and W, integration and warning, combine additively, it can then be shown that the cross-modal interaction prediction of this model version is captured by the same equation as under the original version, i.e., Equation 14.17 below. The only difference is in the order restriction for the parameters, γ ≥ ω. Up to now, no empirical evidence has been found in favor of one of the two versions over the other.


Strictly speaking, this only holds for the focused attention version of TWIN; for the redundant target version, an estimate of the amount of statistical facilitation is required and can be attained empirically (cf. Colonius and Diederich 2006).