TY - JOUR AU - Sawai, Ken-ichi AU - Sato, Yoshiyuki AU - Aihara, Kazuyuki PY - 2012 M3 - Original Research TI - Auditory Time-Interval Perception as Causal Inference on Sound Sources JO - Frontiers in Psychology UR - https://www.frontiersin.org/articles/10.3389/fpsyg.2012.00524 VL - 3 SN - 1664-1078 N2 - Perception of a temporal pattern in a sub-second time scale is fundamental to conversation, music perception, and other kinds of sound communication. However, its mechanism is not fully understood. A simple example is hearing three successive sounds with short time intervals. The following misperception of the latter interval is known: underestimation of the latter interval when the former is a little shorter or much longer than the latter, and overestimation of the latter when the former is a little longer or much shorter than the latter. Although this misperception of auditory time intervals for simple stimuli might be a cue to understanding the mechanism of time-interval perception, there exists no model that comprehensively explains it. Considering a previous experiment demonstrating that illusory perception does not occur for stimulus sounds with different frequencies, it might be plausible to think that the underlying mechanism of time-interval perception involves a causal inference on sound sources: herein, different frequencies provide cues for different causes. We construct a Bayesian observer model of this time-interval perception. We introduce a probabilistic variable representing the causality of sounds in the model. As prior knowledge, the observer assumes that a single sound source produces periodic and short time intervals, which is consistent with several previous works. We conducted numerical simulations and confirmed that our model can reproduce the misperception of auditory time intervals. A similar phenomenon has also been reported in visual and tactile modalities, though the time ranges for these are wider. This suggests the existence of a common mechanism for temporal pattern perception over modalities. This is because these different properties can be interpreted as a difference in time resolutions, given that the time resolutions for vision and touch are lower than those for audition. ER -