Causal Inference in Multisensory Perception


10 pages

Please download to get full document.

View again

of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Causal Inference in Multisensory Perception
  Causal Inference in Multisensory Perception Konrad P. Ko ¨rding 1 . , Ulrik Beierholm 2 . * , Wei Ji Ma 3 . , Steven Quartz 2,4 , Joshua B. Tenenbaum 5 , Ladan Shams 6 1 Rehabilitation Institute of Chicago, Northwestern University, Chicago, Illinois, United States of America,  2 Computation and Neural Systems,California Institute of Technology, Pasadena, California, United States of America,  3 Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York, United States of America,  4 Division of Humanities and Social Sciences, California Institute of Technology, Pasadena,California, United States of America,  5 Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge,Massachusetts, United States of America,  6 Department of Psychology, University of California at Los Angeles, Los Angeles, California, United States of America Perceptual events derive their significance to an animal from their meaning about the world, that is from the information theycarry about their causes. The brain should thus be able to efficiently infer the causes underlying our sensory events. Here weuse multisensory cue combination to study causal inference in perception. We formulate an ideal-observer model that inferswhether two sensory cues srcinate from the same location and that also estimates their location(s). This model accuratelypredicts the nonlinear integration of cues by human subjects in two auditory-visual localization tasks. The results show thatindeed humans can efficiently infer the causal structure as well as the location of causes. By combining insights from the studyof causal inference with the ideal-observer approach to sensory cue combination, we show that the capacity to infer causalstructure is not limited to conscious, high-level cognition; it is also performed continually and effortlessly in perception. Citation: Ko¨rding KP, Beierholm U, Ma WJ, Quartz S, Tenenbaum JB, et al (2007) Causal Inference in Multisensory Perception. PLoS ONE 2(9): e943.doi:10.1371/journal.pone.0000943 INTRODUCTION Imagine you are walking in the forest and you see a suddenmovement in the bushes. You may infer that this movement wascaused by a hidden animal, but you may also consider a gust of wind as an alternative and possibly more probable cause. If youare a hungry predator–or a life-loving prey–this estimation may becritical to your survival. However, you may also hear an animal vocalization coming from a similar direction. Combining bothpieces of sensory information, you will be better at judging if thereis an animal in the bushes and if so, where exactly it is hiding.Importantly, the way how you will combine pieces of informationmust depend on the causal relationships you inferred. Thisexample illustrates that perceptual cues are seldom ecologicallyrelevant by themselves, but rather acquire their significancethrough their meaning about their causes. It also illustrates howcues from multiple sensory modalities can be used to inferunderlying causes. The nervous system is constantly engaged incombining uncertain information from different sensory modalitiesinto an integrated understanding of the causes of sensorystimulation.The study of multisensory integration has a long and fruitfulhistory in experimental psychology, neurophysiology, and psycho-physics. Von Helmholtz, in the late 19 th century startedconsidering cue combination, formalizing perception as uncon-scious probabilistic inference of a best guess of the state of theworld [1]. Since then, numerous studies have analyzed the waypeople use and combine cues for perception [e.g. 2,3], highlighting the rich set of effects that occur in multimodal perception.Over the last decade, many scientists have gone back toa probabilistic interpretation of cue combination as had beenproposed by von Helmholtz. These probabilistic models formalizethe problem of cue combination in an elegant way. It is assumedthat there is a single variable in the outside world (e.g., the positionof an animal) that is causing the cues (auditory and visualinformation). Each of the cues is assumed to be a noisy observationof this underlying variable. Due to noise in sensation, there is someuncertainty about the information conveyed by each cue andBayesian statistics is the systematic way of predicting how subjectscould optimally infer the underlying variables from the cues.Several recent studies have demonstrated impressive fits topsychophysical data, starting from the assumption that humanperformance is close to the ideal defined by probabilistic models[4–7]. In these experiments, cues tend to be close to each other intime, space, and structure, providing strong evidence for thereonly being a single cause for both cues. In situations where there isonly a single underlying cause, these models formalize the centralidea of probabilistic inference of a hidden cause. A range of experiments have shown effects that are hard toreconcile with the single-cause (i.e., forced-fusion) idea. Auditory- visual integration breaks down when the difference between thepresentation of the visual and the auditory stimulus is large [8–10].Such a distance or inconsistency is called disparity. Increasing disparity, for example by moving an auditory stimulus fartheraway from the position of a visual stimulus, reduces the influenceeach stimulus has on the perception of the other [11–14].Throughout this paper we only consider spatial disparity along the azimuthal axis. When subjects are asked to report theirpercepts in both modalities  on the same trial  , one can measure theinfluence that the two senses have on each other [13]. The datafrom such a dual-report paradigm show that, although at smalldisparities there is a tendency to integrate, greater disparity makesit more likely that a subject responds differently in both modalities. Academic Editor:  Olaf Sporns, Indiana University, United States of America Received  June 15, 2007;  Accepted  September 3, 2007;  Published  September 26,2007 Copyright:    2007 Ko¨rding et al. This is an open-access article distributed underthe terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided thesrcinal author and source are credited. Funding:  KPK was supported by a DFG Heisenberg Stipend. UB and SQ weresupported by the David and Lucille Packard Foundation as well as by the MooreFoundation. JBT was supported by the P. E. Newton Career Development Chair. LSwas supported by UCLA Academic senate and Career development grants. Competing Interests:  The authors have declared that no competing interestsexist. * To whom correspondence should be addressed.  E-mail: .  These authors contributed equally to this work. PLoS ONE | 1 September 2007 | Issue 9 | e943  Moreover, when people are simply asked whether they perceivea single cue or several cues they give answers that intuitively makea lot of sense: if two events are close to each other in space, time,and structure, subjects tend to perceive a single underlying cause,while if they are far away from one another subjects tend to infertwo independent causes [15,16]. If cues are close to one another,they interact and influence the perception of each other, whereasthey are processed independently when the discrepancy is large.New modeling efforts have made significant progress atformalizing the interactions between two cues. These modelsassume that there exist  two  relevant variables in the world, forexample the position of a visual and the position of an auditorystimulus. The visual and auditory cues that reach the nervoussystem are noisy versions of the underlying visual and auditory variables. The models further assume an ‘‘interaction prior’’,a joint prior distribution that defines how likely each combinationof visual and auditory stimuli is in the absence of any evidence.This prior formalizes that the probability of both positions being the same (related to a common cause) is high in comparison to thepositions being different from one another. This prior in effectdetermines the way in which two modalities influence each other.Very good fits to human performance have been shown for thecombination of two cues [13,14,17,18]. These studies assume aninteraction between processing in each modality and derivepredictions of human performance from this idea.Von Helmholtz did not only stress the issue of probabilisticinference but also that multiple objects may be the causes of oursensations [19,20]. In other words, any two sensory signals mayeither have a common cause, in which case they should beintegrated, or have different, independent causes, in which casethey should be processed separately. Further evidence for this ideacomes from a study that showed that by providing evidence thattwo signals are related it is possible to incite subjects to morestrongly combine two cues [21]. The within-modality binding problem is another example where causal inference is necessaryand the nervous system has to determine which set of stimulicorrespond to the same object and should be bound together [22– 24]. We are usually surrounded by many sights, sounds, odors, andtactile stimuli, and our nervous system constantly needs to estimatewhich signals have a common cause. The nervous systemfrequently needs to solve problems where it needs to interpretsensory signals in terms of potential causes.In this paper we formalize the problem of causal inference aswell as integration versus segregation in multisensory perception asan optimal Bayesian observer that not only infers source locationfrom two sensory signals (visual, s V   , and auditory, s  A  ) but alsowhether the signals have a common cause (  C   ). This inference iscomplicated by the fact that the nervous system does not haveaccess to the source locations of the signals but only to noisymeasurements thereof (visual,  x  V   , and auditory,  x   A  ). From thesenoisy observations it needs to infer the best estimates of the sourcelocations (  S ˆ  V    and  S ˆ   A  ). All this needs to happen in the presence of uncertainnty about the presence of a common cause (  C   ). To takeinto account multiple possible causal structures, we need a so-called mixture model [e.g. 24], but one of a very specific form.The model assumes that the underlying variables (azimuthalstimulus positions) cause the sensory inputs. The model considerstwo hypotheses, either that there is a common cause or that thereare independent causes. The optimal observer model defines howthe cues might actually be combined (i.e., in a statistically optimalmanner). In the model, cues are fused if the cues have onecommon cause and segregated if they have independent causes.The model typically has uncertainty about the causal interpreta-tion, in which case it will adjust its cue combination continuouslydepending on the degree of belief about the causal structure.This model makes three important predictions: (1) It predictsthe circumstances under which subjects should perceive a commoncause or independent causes. (2) It predicts if the individual cuesshould be fused or if they should be processed separately. (3) Itpredicts how the cues are combined if they are combined. Here wetest the predictions of the model and analyze how well it predictshuman behavior. RESULTS Causal Bayesian inference We model situations in which observers are presented withsimultaneous auditory and visual stimuli, and are asked to reporttheirlocation(s).Ifthevisualandtheauditorystimulihaveacommoncause (Fig. 1, left), subjects could use the visual cue to improve the Figure 1. The causal inference model.  Left: One cause can be responsible for both cues. In this case the visually perceived position  x  V   will be thecommon position  s  perturbed by visual noise with width  s V   and the auditory perceived position will be the common position perturbed by auditorynoise with width  s  A . Right: Alternatively, two distinct causes may be relevant, decoupling the problem into two independent estimation problems.The causal inference model infers the probability of a causal structure with a common cause (left,  C  =1) versus the causal structure with twoindependent causes (right,  C  =2) and then derives optimal predictions from this. We introduce a single variable  C   which determines which sub-modelgenerates the data.doi:10.1371/journal.pone.0000943.g001Causal InferencePLoS ONE | 2 September 2007 | Issue 9 | e943  auditory estimate, and vice versa. However, in the real world we areusually surrounded by multiple sources of sensory stimulation andhence multiple sights and sounds. Therefore the nervous systemcannot simply combine all signals into a joint estimate; it must inferwhich signals have a common cause and only integrate those.Specifically, for any pair of visual and auditory stimuli, it should alsoconsider the alternative possibility that they are unrelated and co-occurred randomly (Fig. 1, right).Here we developed an ideal observer that estimates the positionsof cues and also whether they have a common cause. This  causal inference model   uses two pieces of information. One piece is thelikelihood: the sensed visual and auditory positions, which arecorrupted by noise. Because perception is corrupted by noise, anysensory stimulus does not reveal the true visual position, but ratherinduces a distribution of where the stimulus could be, given thestimulus. The other piece of information is the prior: fromexperience we may know how likely two co-occurring signals areto have a common cause versus two independent causes. The causalinference model combines those pieces of information to estimate if there isa commoncause andtoestimate thepositionsof cues (see theMethods section and Supporting Information for details, Text S1).The causal inference model depends on four parameterscharacterizing the knowledge about the environment and theobserver’s sensory systems: the uncertainty of vision (  s V    ) andaudition (  s  A  ); knowledge the observer has about the spatial layoutof objects, in particular how much the observer expects thatobjects are more likely to be located centrally (  s P  , introduced toformalize that subjects have a bias to perceive stimuli straightahead); and the prior probability that there is a single cause versustwo causes (   p common  ). These four parameters are fit to humanbehavior in psychophysical experiments (see Methods for details). Experiment 1: Auditory-visual spatial localization Experienced ventriloquists move a puppet’s mouth in synchronywith their speech patterns, creating a powerful illusion of a talking puppet. This effect is a classical demonstration of auditory-visualintegration, where subjects infer that there is only a single cause(the puppet’s talking) for both visual (puppet’s facial movements)and auditory (speech) stimuli. Numerous experimental studieshave analyzed this kind of auditory-visual cue integration andfound situations in which the cues are combined and situations inwhich they are processed separately [2,3,7,8,10,15,25–27]. To testthe causal inference model, we use a laboratory version of the ventriloquist illusion, in which brief auditory and visual stimuli arepresented simultaneously with varying amounts of spatialdisparity. We use the dual-report paradigm which was introducedrecently to study auditory-visual numerosity judgment [13],because this provides information about the joint auditory-visualpercepts of subjects.Nineteen subjects participated in the experiment. On each trial,subjects were presented with either a brief visual stimulus in one of five locations along azimuth, or a brief tone at one of the same fivelocations, or both simultaneously (see Methods for details). The task was toreportthelocation of the visual stimulusas wellas the locationof the sound in each trial using two button presses (Fig. 2a).We found that subjects show large variability across trials,shown in Fig 2b for an auditory stimulus at the central location. If subjects would not be affected by noise then we would expect a plotthat has 100% of the trials as a press of the button corresponding to the central location. Instead we see a wide distribution,highlighting the presence of uncertainty in auditory perception. Allmodern theories of cue combination predict that two cues,presented simultaneously, will influence one another and lead toa bimodal precision that is better than the unimodal precision, asthe other cue can be used to reduce uncertainty. Indeed, we foundthat the visual stimulus influences the estimate of the auditorystimulus when the auditory stimulus is held at a fixed location(Fig. 2b, yellow versus gray). Moreover, we find that vision doeshave an influence on the perception of the auditory stimulus anda visual stimulus to the left biases perception to the left (Fig 2b, red versus yellow). Subjects thus base their estimate of the auditoryposition on both visual and auditory cues. Moreover, subjects’estimates of the auditory position often differ from their estimatesof the visual position.To examine whether the causal inference model can account forthe full range of cue combination observed in human multisensoryperception, wemakeuse ofboththe auditory and thevisualresponsefrequencies we measured in our experiment (Fig. 2c). Fourparameters were used to fit 250 data points (25 bisensory conditions,2 modalities, 5 buttons per modality). The causal inference modelaccounts for the data very well (  R  2 =0.97;  R  2 is calculated as theexplained variance divided by the total variance) (Fig. 2c,d). Oneinteresting finding is that the response distribution generally only hasone peak (in Fig 2c), but its position and skewness is affected by theposition of the other stimulus. The model shows this effect because itdoes not simply decide if there is a common cause or individualcauses but considers both possibilities on each trial.To facilitate quantitative comparison of the causal inferencemodel with other models, we fitted the parameters individually toeach subject’s data using a maximum-likelihood procedure: wemaximized the probability of the data under the model. For eachsubject, the best fit from 6 different sets of initial parameter valueswas used, to reduce the effect of these initial values. We did this forseveral different models that use previously proposed interactionpriors as well as the prior derived from causal inference. We firstconsidered two special cases of the causal inference model: pureintegration (causal inference with  p common =1) and pure segrega-tion (causal inference with  p common =0). We then considered twotwo-dimensional ad hoc priors that have been proposed in otherpapers. Roach et al. [18] proposed a two-dimensional (auditory- visual) prior that is defined as the sum of a Gaussian ridge along the diagonal, and a constant. This prior is somewhat similar to thecausal inference prior as the constant relates to events that areindependent and the Gaussian relates to sensory events that arerelated and thus have a common cause. Bresciani et al [14] useda special case of the Roach et al. prior ( The Shams et al. model(Shams et al, 2005) was not considered as it involves a priorspecified by a large number of parameters (25)). where no constantis added to the Gaussian. According to the Bresciani prior, visualand auditory positions that are very far away from each other areextremely unlikely. According to the Roach prior, such twopositions have a fixed, non-vanishing probability.In the comparison, we obtain the predicted response distribu-tion by integrating out the internal variables instead of equating itto the posterior distribution. This is the correct way of solving thisBayesian problem and differs from the approach taken in previouspapers [13,14,18] (although it only affects predictions in the Roachet al. model). We measure the goodness of fit obtained from thesepriors relative to that obtained from the causal inference prior,using the log likelihood over the entire data set. The resulting log likelihood ratios are shown in Table 1. The causal inference modelfits the data better than the other models. We also compare withan alternative model that instead of minimizing the mean squarederror maximizes the probability of being correct and can excludethis model based on the presented evidence.The parameters found in the likelihood optimization of thecausal inference model are as follows. We found the visual systemto be relatively precise (  s V   =2.14 6 0.22 u  ) and the auditory system Causal InferencePLoS ONE | 3 September 2007 | Issue 9 | e943  to be much less precise (  s  A =9.2 6 1.1 u  ). We found that peoplehave a modest prior estimating stimuli to be more likely to becentral (  s P  =12.3 6 1.1 u  ). Subjects have the tendency of indicating a direction that is straight ahead and the prior allows the model toshow such behavior as well. The average probability of perceiving a common cause for visual and auditory stimuli is relatively low(   p common =0.28 6 0.05 u  ). This explained that the observed biasesare small in comparison to the values predicted if subjects werecertain that there is a common cause (Fig. 2e). In summary, thecausal inference model provides precise predictions of the waypeople combine cues in an auditory-visual spatial localization task,and it does so better than earlier models. Figure 2. Combination of visual and auditory cues.  a) The experimental paradigm is shown. In each trial a visual and an auditory stimulus ispresented simultaneously and subjects report both the position of the perceived visual and the position of the perceived auditory stimuli by buttonpresses. b) The influence of vision on the perceived position of an auditory stimulus in the center is shown. Different colors correspond to the visualstimulus at different locations (sketched in warm to cold colors from the left to the right). The unimodal auditory case is shown in gray. c) Theaveraged responses of the subjects (solid lines) are shown along with the predictions of the ideal observer (broken lines) for each of the 35 stimulusconditions. These plot show how often on average which button was pressed in each of the conditions. d) The model responses from c) are plottedwith the human responses from c). e) The average auditory bias  ^ ss A { s A s V  { s A , i.e. the influence of deviations of the visual position on the perceivedauditory position is shown as a function of the spatial disparity (solid line) along with the model prediction (dashed line).doi:10.1371/journal.pone.0000943.g002Causal InferencePLoS ONE | 4 September 2007 | Issue 9 | e943  In the cue combination literature, bias is commonly used as anindex of crossmodal interactions. In our experiment, auditorylocalization bias is a measure of the influence of vision on auditionand can be plotted as a function of the spatial disparity [15,16].Like other authors, we find that the bias decreases with increasing spatial disparity (Fig. 2e). Thus, the larger the distance between visual and auditory stimuli, the smaller is the influence of vision onaudition. This result is naturally predicted by the causal inferencemodel: larger discrepancies make the single cause model less likelyas it would need to assume large noise values, which are unlikely. A model in which no combination happens at all (   p common =0)cannot explain the observed biases (Fig. 2e) as it predicts a verysmall bias (   p , 0.0001 t-test). The traditional forced-fusion model[4–7,29] fails to explain much of the variance in Fig. 2c(  R  2 =0.56). Moreover, this model would predict a very highbias—as vision is much more precise than audition in ourexperiment—and is ruled out by the bias data (Fig. 2e) (   p , 0.0001t-test). Neither the traditional nor the no-interaction model canexplain the data, whereas the causal inference model can explainthe observed patterns of partial combination well (see Supporting Information, Text S1 and Fig. S2 for a comparison with someother recent models of cue combination). Experiment 2: Auditory-visual spatial localizationwith measured perception of causality While the causal inference model accounts for the cue combinationdata described above, it makes a prediction that goes beyond theestimates of positions. If people infer the probability of commoncause then it should be possible to ask them if they perceivea common cause versus two causes. A recent experiment asked thisquestion [15]. We compare the predictions of the causal inferencemodel with the reported data from this experiment. Theseexperiments differed in a number of important respects from ourexperiment. Subjects were asked to report their perception of unity(i.e., whether the two stimuli have a common cause or twoindependent causes) on each trial. Only the location of the auditorystimulus was probed. Subjects pointed towards the location of theauditory stimulus instead of choosing a button to indicate theposition (see Methods, data analysis for figure 3, for details).The results of these experiments [15] indicate that the closer the visual stimulus is to the auditory stimulus, the more often do peopleperceive them as having a common cause (Fig. 3a). However, even if the two stimuli are close to one another, on some trials the noise inthe perception of the auditory stimulus will sometimes lead to theperception of distinct causes. For example, a subject may hear thesound at 10 u  even when both stimuli really are at 0 u ; on such a trial,the large perceived disparity may lead the subject to report distinctcauses.The model also showsthe trendthatwith increasing disparitythe probability of the perception of a common cause decreases. Itexplains 72% of the variance in human performance (Fig. 3a) andthus well models the human perception of causality.We next examined how the perception of a common versusdistinct causes affects the estimation of the position of auditorystimuli. The results indicate that when people perceive a commoncause they point to a position that is on average very close to theposition of the visual stimulus, and therefore the bias is high(Fig. 3b). If, on the other hand, subjects perceive distinct causes,they seem to not only rely on the auditory stimulus but seem to bepushed away from the visual stimulus and exhibit  negative   bias. Thisis a counterintuitive finding, as previous models [4–7,29] predictonly positive bias. Causal inference shows very similar behavior asit also exhibits negative biases, and explains 87% of the variance of the bias. The causal inference model thus accounts for thecounterintuitive negative biases.How can an optimal system exhibit negative bias? We arguethat this is a selection bias stemming from restricting ourselves totrials in which causes were perceived as distinct. To clarify this, weconsider, as an example, the case where the visual stimulus is 5 u  tothe right of the center and the auditory stimulus is in the center. Table 1.  Maximal log likelihood ratios (base  e ) across subjectsof models relative to causal inference model (mean 6 s.e.m.,see methods for details)....................................................................... Model Relative log likelihood Causal inference 0Causal inference with maximization  2 11 6 3Full integration  2 311 6 28Full segregation  2 25 6 7Roach et al.  2 18 6 6Bresciani et al.  2 22 6 6For the last two entries, we used the prior proposed by Roach et al. andBresciani et al. together with correct inference (see text for more detail). All of the maximal likelihood ratios in the table are considered decisive evidence infavor of the causal inference prior, even when correcting for the number of parameters using the Akaike Information Criterion (AIC) or the BayesianInformation Criterion (BIC) [28]. These criteria are methods for enabling faircomparison between models. Models with more parameters always fit databetter than models with fewer parameters. AIC and BIC are ways of correctingfor this bias.doi:10.1371/journal.pone.0000943.t001  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 3.  Reports of causal inference.  a) The relative frequency of subjects reporting one cause (black) is shown (reprinted with permission from [15])with the prediction of the causal inference model (red). b) The bias, i.e. the influence of vision on the perceived auditory position is shown (gray andblack). The predictions of the model are shown in red. c) A schematic illustration explaining the finding of negative biases. Blue and black dotsrepresent the perceived visual and auditory stimuli, respectively. In the pink area people perceive a common cause.doi:10.1371/journal.pone.0000943.g003Causal InferencePLoS ONE | 5 September 2007 | Issue 9 | e943
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks