%A Aminoff,Elissa M. %A Toneva,Mariya %A Shrivastava,Abhinav %A Chen,Xinlei %A Misra,Ishan %A Gupta,Abhinav %A Tarr,Michael J. %D 2015 %J Frontiers in Computational Neuroscience %C %F %G English %K scene processing,parahippocampal place area,retrosplenial cortex,transverse occipital sulcus,Computer Vision %Q %R 10.3389/fncom.2015.00008 %W %L %M %P %7 %8 2015-February-04 %9 Original Research %+ Dr Elissa M. Aminoff,Center for the Neural Basis of Cognition, Carnegie Mellon University,Pittsburgh, PA, USA,eaminoff@fordham.edu %+ Dr Elissa M. Aminoff,Department of Psychology, Carnegie Mellon University,Pittsburgh, PA, USA,eaminoff@fordham.edu %# %! Scene-space in PPA, RSC, TOS %* %< %T Applying artificial vision models to human scene understanding %U https://www.frontiersin.org/articles/10.3389/fncom.2015.00008 %V 9 %0 JOURNAL ARTICLE %@ 1662-5188 %X How do we understand the complex patterns of neural responses that underlie scene understanding? Studies of the network of brain regions held to be scene-selective—the parahippocampal/lingual region (PPA), the retrosplenial complex (RSC), and the occipital place area (TOS)—have typically focused on single visual dimensions (e.g., size), rather than the high-dimensional feature space in which scenes are likely to be neurally represented. Here we leverage well-specified artificial vision systems to explicate a more complex understanding of how scenes are encoded in this functional network. We correlated similarity matrices within three different scene-spaces arising from: (1) BOLD activity in scene-selective brain regions; (2) behavioral measured judgments of visually-perceived scene similarity; and (3) several different computer vision models. These correlations revealed: (1) models that relied on mid- and high-level scene attributes showed the highest correlations with the patterns of neural activity within the scene-selective network; (2) NEIL and SUN—the models that best accounted for the patterns obtained from PPA and TOS—were different from the GIST model that best accounted for the pattern obtained from RSC; (3) The best performing models outperformed behaviorally-measured judgments of scene similarity in accounting for neural data. One computer vision method—NEIL (“Never-Ending-Image-Learner”), which incorporates visual features learned as statistical regularities across web-scale numbers of scenes—showed significant correlations with neural activity in all three scene-selective regions and was one of the two models best able to account for variance in the PPA and TOS. We suggest that these results are a promising first step in explicating more fine-grained models of neural scene understanding, including developing a clearer picture of the division of labor among the components of the functional scene-selective brain network.