Edited by: Tomoki Fukai, RIKEN Brain Science Institute, Japan
Reviewed by: Tomoki Fukai, RIKEN Brain Science Institute, Japan; J. Michael Herrmann, University of Edinburgh, UK; Katsunori Kitano, Ritsumeikan University, Japan
*Correspondence: Ehud Ahissar, Department of Neurobiology, Weizmann Institute of Science, Rehovot 76100, Israel. e-mail:
This is an open-access article distributed under the terms of the
During natural viewing, the eyes are never still. Even during fixation, miniature movements of the eyes move the retinal image across tens of foveal photoreceptors. Most theories of vision implicitly assume that the visual system ignores these movements and somehow overcomes the resulting smearing. However, evidence has accumulated to indicate that fixational eye movements cannot be ignored by the visual system if fine spatial details are to be resolved. We argue that the only way the visual system can achieve its high resolution given its fixational movements is by seeing
During fixation, the eyes move across several arcminutes with amplitudes that fall off with the scanning frequency (Findlay,
Since time constants of retinal responses are in the order of 30–100 ms (Sakuranaga et al.,
In fact, our vision of stationary images depends crucially on these movements. Virtually all retinal ganglion cells respond to transients of light, and many respond only to transients (Hartline,
What are the relative contributions of spatial and temporal codes to retinal outputs? There is no doubt that spatial coding, in which spatial location of external stimuli is coded by ganglion cell identity and illumination intensity by the cell’s firing intensity, would work just fine for low resolution vision, where low resolution here means resolution coarser than the extent of FeyeM (around 10′). This is because integration of retinal responses over fields larger than the extent of FeyeM is hardly affected by FeyeM. The question is whether spatial coding would work for fine vision, i.e., for spatial details smaller than 10′, and specifically for hyperacuity vision, i.e., for spatial details smaller than 0.3′. It might be argued that the visual system is still using the spatial code for fine and hyperacuity vision, while applying some error-correction mechanisms to overcome the effect of FeyeM. It appears that error-correction mechanisms proposed in the past for the visual system, such as shifter circuits (Anderson and Van Essen,
We present here a mechanism for temporal encoding and decoding in the visual system. We consider it as a new dynamic theory of vision, to follow previous original suggestions by Marshal and Talbot (
We refer here to the entire spectrum of FeyeM, excluding microsaccades, and to all two-dimensional visual cues that define visual objects (we term those 2D cues), excluding depth cues. Our analysis that follows will show that different 2D cues can be encoded by different variables of retinal responses: shape by inter-receptor temporal phases, texture by instantaneous intra-burst rates, and motion by inter-burst temporal frequencies. The information carried by these retinal signals is valid only during an individual cycle of eye movement, and thus must be read out during that cycle. We will show how an NPLL-like locking-and-decoding process, implemented by thalamocortical loops of the visual system, can set the timing for meaningful cortical processing of 2D cues and decode some of these cues. Decoding of relative depth information is postulated to be implemented by an independent mechanism, possibly a mechanism that relies on receptor activation during microsaccades, since these movements are predominantly in the horizontal direction (Liang et al.,
Visual perception is assumed to be a process that builds on sensory data acquired during either fixation or fixational pauses, i.e., pauses between adjacent saccades (Barlow,
Eye movements during fixation and fixational pauses typically consist of bursts of a few (2–4) cycles, and each burst has a dominant oscillatory frequency (Barlow,
Responses of visual neurons during natural viewing cannot be predicted from responses to flashed stimuli on paralyzed eyes. During natural vision there are no flashes. Rapid activations that are induced by saccades, and could potentially be compared to flashed stimuli, always involve retinal motion. Furthermore, downstream processing of the retinal signals evoked immediately after saccades likely involves significant suppression and distortion that are not present when flashing on passive eyes (Carpenter,
Retinal activations that are relevant for perception during natural viewing are generated by retinal motions caused by object motion or eye rotations (Ahissar and Arieli,
Functional anatomy exposed by flashed stimuli (Hubel,
Temporal precision at the retina and LGN is in the order of 1 ms. Temporal precision at the retina was measured in salamanders and rabbits and was found to depend on stimulus contrast. The temporal jitter (for repeating identical stimuli) of fast OFF cells in the salamander retina is a power function of the stimulus contrast, with an exponent close to −0.5, such that at contrast of 35% the jitter is 4.4 ms (Berry et al.,
The primary description of the hypothesis uses statements and derivations expressed in common terminology of experimental neuroscience, supported by schematic figures. The internal consistency of the hypothesis is demonstrated using formal definitions and derivations, which are presented in Appendix 1 and referenced in the relevant places in the text, and computer simulations. Data that support our assumptions are presented in Appendix 2.
2D spatial details are sampled by each eye independently. The information conveyed to the visual system depends on the external scene, the pattern of FeyeM of that eye, and the structure of afferent RFs. According to the afferent scheme suggested by Hubel and Wiesel (
The size of simple RFs decreases as they get closer to the fovea. However, as recordings approach the fovea, measurements of RF size become difficult due to eye movements, especially in awake primates. Measurements done with the aid of image stabilization, i.e., while moving the image along the on-line-recorded trajectory of the FeyeM, show that at eccentricities of 2–9°, the width of subfields of simple RFs in layer 4 averages around 12 arcmin, and can be as low as 5 arcmin or less (Kagan et al.,
In the natural case, the entire visual field is sampled synchronously by all retinal cells at the frequency of the FeyeM, and each location in the visual field is sampled in succession by neighboring retinal cells (Figure
Feedforward responses contain information about fine spatial relationships within the image. The resolution of this information depends on the orientation of the sRF with respect to the scanned image. Consider, for example, the schematic example in Figure
Another important factor that enormously constrains the way this information can be read is the existence of FeyeM. Simple cell responses depend crucially on the direction of eye movement. Thus, for example, if the movement of the eye is perpendicular to the orientation of the edge (e.g., Figure
So how does the visual system restore the information that is lost by FeyeM smearing? The simple answer is that the visual system does not need to restore this information because it is not lost. This information is conveyed in a form of a temporal code that is encoded
While the mechanisms of spike generation at the retina are not completely understood, it is known that they can involve integration times of tens of ms (Meister et al.,
Figure
Movements in the visual field produce Doppler-shifts effect of the retinal activation frequencies. For example, a light-to-dark edge that is moving away from an OFF-type sRF (Figure
Encoding of shape, i.e., of the outline of an object, is demonstrated by the scanning of a stationary terraced edge (Figure
Encoding of texture, i.e., of the characteristics of surfaces such as patterning and spatial frequencies, is demonstrated by the scanning of two gratings (Figure
Most of sRFs contain a “rod” of one polarity (ON or OFF) surrounded by flanks of the opposite polarity (Hubel and Wiesel,
Decoding of temporally encoded signals can be implemented by various neuronal algorithms (Carr,
For the description of the proposed decoding process, we depict an example similar to that of Figure
Thus, shape and velocity are encoded at the retina by the temporal phases and frequencies of retinal bursts, respectively. Texture, as shown above, is encoded by the intra-burst firing pattern. The temporal structure of retinal activities is extremely reliable (Berry et al.,
A constraint of the FeyeM-driven encoding scheme is that decoding must be locked to movements of the eye. Comparison of onset times or temporal periods of neighboring retinal cells is meaningless if done across different cycles of FeyeM, because eye position and velocity are not constant across cycles. Moreover, decoding of shape and velocity requires an accurate comparison, in each FeyeM cycle, of the burst onset times of retinal sRFs. These constrains suggest that the brain should either use efferent copies of the signals that control the FeyeM (Shakhnovich,
Existing data from the mammalian visual system indicate that the visual cortex employs oscillatory mechanisms that can be used by the visual system to decode the information encoded by FeyeM (see
Below (text and Figures
The core idea behind oscillation-based decoding is that the oscillations establish a local predictor for input timing (Figure
A simplified model for a thalamocortical NPLL, which includes the minimal number of basic components and the constrains of known thalamocortical circuitry (White,
In order to obtain temporal decoding, the following three basic principles should be obeyed by a single loop (see Gardner,
As long as a neuronal loop obeys the requirements mentioned above, its exact implementation is not important, and could vary among different species or even among different individuals. For example, the circuit can be simple in cases where simple cells are by themselves inhibitory or when the oscillatory neurons project directly to the thalamus (as in Figure
In our model, the cortical oscillations produce internal expectations with regard to the timing of the next retinal output. These temporal expectations are manifested by the output of the oscillatory neurons, and thus also by the onset time of the corticothalamic gating signal (blue rectangles in Figure
With less optimal parameters, complete stabilization might require more than a single cycle (Ahissar,
Cortical oscillations can track input frequencies if their frequencies (i) are in the range of the input frequencies, and (ii) can be modulated by local cortical inputs (Cortical Oscillations in Appendix 1, Eq.
Besides frequency ranges, the main difference observed between visual and other cortical oscillations is that visual oscillations have not been observed so far in the absence of visual stimuli. This might indicate that expression of cortical oscillations (e.g., translation of sub-threshold oscillations to spike activity) requires an additional excitatory input (Gray and McCormick,
Thalamocortical neurons of NPLL are required to generate an output whose spike count decreases as the retino-cortical delay increases. This function is easily implemented by corticothalamic gating (Figure
In each oscillation cycle, loop variables depend on the input and on their values during the previous cycle (Appendix 1, Eqs
The dynamics of the loop following changes in FeyeM frequency are similar to those occurring following external motion (Figure
Thalamic gating is not operational following sustained silent periods (McCormick and Bal,
Retinal periodicities are translated into latencies and spike counts of cortical simple cells (Figure
Central processing of 2D details, as that of depth cues, is often described in terms of spatiotemporal filtering (Barlow,
Although typically cyclic, FeyeM are not purely periodic. Trajectories of FeyeM are strongly modulated in both amplitude and frequency (see Examples of Human FeyeM in Appendix 2). Since, the induced eye movement is the same for the entire retina, the nature of relative time coding, i.e., coding spatial offsets by relative temporal delays across neighboring cells, should not depend crucially on the exact pattern of FeyeM. We tested this prediction using computer simulations. Three gray images with patterned left edges (Figure
The dynamics of the simulated responses were analyzed in relation to FeyeM cycles. In each cycle, delay to the first spike (del) and total spike count (sp) were calculated. The vertical sRFs were blind to the fine offsets of the image’s left edge. The delays and spike counts conveyed by such sRFs (Figure
The dynamics of spike generation in relevant horizontal NPLLs are depicted in Figure
These computer simulations demonstrate the validity of temporal encoding of shape by FeyeM, the superiority of temporal coding along over rate coding across the elongated axes of sRFs for fine spatial details, and the ability of NPLLs to phase-lock to retinal outputs, and thus recode shape by relative time coding, with natural FeyeM. These simulations do not address encoding-decoding of object motion or texture.
One long-standing unresolved puzzle in vision is: how come the world appears stable if our eyes move all the time? The answer to this puzzle is surprisingly simple. This puzzle exists
Optimal functioning of NPLLs requires their operation within a motor-sensory feedback loop (Ahissar and Vaadia,
At any given viewing period there should be one or several NPLLs, out of the many NPLLs included in the bank, that dominate the motor-sensory loop; the motor-sensory loop functions to optimize the input for these NPLLs during that period. Given the dynamics of frequency variation in FeyeM (e.g., Moller et al.,
Natural vision is a continuous process that, at any given moment, has to deal with retinal activity that has been affected by recent FeyeM and by optical blurring. FeyeM enforce temporal encoding at the retina. Spikes of retinal outputs indicate times in which image contrasts are crossed by moving receptors, much like spikes of mechanoreceptors indicate times in which their RF crosses a ridge. Such temporal encoding has a hyperacuity resolution and is resistant to optical blurring. Metaphorically, vision can be described as an active process in which the retina “palpates” external objects like rat whiskers and human fingers do. In both vision and touch, the sensory organs are mainly activated when they encounter changes during their scanning movements.
We suggest here a specific way in which the visual system could decode temporally encoded retinal signals. The proposed active decoding scheme involves cortical oscillations (which function as “temporal rulers”) and thalamic phase comparators (which compare input timing against the “ruler”). This scheme is consistent with a large body of anatomical and physiological data. Although the data supporting it are compelling, we do not claim that this proposed scheme is the only possible decoding mechanism or that there is a single decoding mechanism for retinal outputs. Several such mechanisms are available to the visual system, and brains must emphasize one or another mechanism, depending on the visual stimulus, context, and previous experience. During natural viewing, we suggest, the active decoding mechanism described here plays a major role.
The temporal encoding-decoding scheme presented here provides a mechanism that is free from retinal smearing that would otherwise be caused by FeyeM. According to this scheme, local hyperacuity is resistant to optical blurring because of the differential nature of retinal temporal encoding. Utilization of the temporal domain by the visual system allows information to be accumulated, rather than averaged, during the entire fixation period. Furthermore, such utilization of temporal coding enables control of visual resolution by simply controlling eye velocity (Saig et al.,
Accurate vision, if mediated by NPLLs, should require continuous fixation. This is not inconsistent with common experience and controlled studies indicating that visual acuity does improve with longer fixational period (Riggs et al.,
To prevent aliasing (which occurs when the sampling frequency is too close to the sampled frequency) temporal encoding should, and probably does, rely on multiple frequencies (see Ahissar and Arieli,
In fact, the complication induced by slow eye drifts can be better handled by temporal decoders of the type presented here than by spatial decoders that are based on cell identity. Since temporal changes induced by eye drifts are common to the entire visual field, they can be decoded by widely tuned NPLLs, low-pass integration of NPLL outputs (Ahissar,
Reading out cortical differential representations should involve lateral comparisons of outputs of simple cells, which could be ambiguous if the response polarities (i.e., ON vs. OFF) of the compared cells are not known. This problem is probably circumvented by segregation of thalamocortical circuits to ON-center and OFF-center clusters (McConnell and LeVay,
Cortical (and retinal) representations of external velocities are unique only if the amplitude of the FeyeM is smaller than both the sRF length and the external spatial periods. Otherwise, the transformation is not unique; different combinations of external spatial periods and external velocities could induce similar cortical spike counts. When the sRF is longer than the external spatial periods, aliasing problems are introduced, which cause additional ambiguities. The visual system could avoid such ambiguous coding by relying on high frequency low-amplitude FeyeM for foveal vision, and on FeyeM with increasing amplitudes (associated with decreasing frequencies) for increasingly eccentric vision. The finding that rod monochromat subjects, who lack foveal receptors, exhibit large-amplitude-low-frequency nystagmus (Yarbus,
During natural fixation, drift, and tremor movements are often interrupted by brief microsaccades (see Introduction), which bounce retinal RFs to new locations. Obviously, computations of 2D details by NPLLs cannot continue across microsaccades, and should not include data acquired during microsaccades. Interestingly, many cortical neurons respond either during microsaccades (“saccade cells”) or during drift (“position/drift cells”) but not during both (Snodderly et al.,
Every sensory system contains multiple motor-sensory loops that together perceive components of the external world. Such loops contain sub-cortical and cortical stations and pathways which interact in complex ways (see for example the scheme of the vibrissal motor-sensory system in Kleinfeld et al.,
We describe the principles of the encoding-decoding scheme using examples of periodic FeyeM (e.g., Figures
Electrophysiological and anatomical:
In some conditions, cortical alpha oscillations correlate with ocular oscillations (Lippold, Visual cortical activity can be locked to a periodic visual stimulus (a phenomenon called “photic driving,” Walter and Walter, Visual cortical activity can remain “locked” to the stimulus frequency after the cessation of the stimulus (Narici et al., Retinal, thalamic, and cortical neurons exhibit phase-locked activity during stimulus presentations, during which cortical neurons often phase-lead thalamic neurons (Castelo-Branco et al., Frequencies of cortical oscillations change when stimulus velocity changes (Eckhorn et al., Retinal spike times convey more visual information than spike counts (Berry et al., Thalamic transfer and timing depends on cortical feedback (reviewed in Rauschecker, Responses of cat retinal ganglion cells to tiny (= cone separation) retinal motion are locked to movement onset (Shapley and Victor,
Psychophysical:
Temporal aliasing occurs during daylight vision (Purves et al., Metacontrast: consequent non-overlapping visual stimuli cannot be perceived in isolation for temporal intervals smaller than 50–150 ms (Bachmann, Flicker fusion is not integrated between the eyes (Andrews et al., Movement processing is often based on dot elements rather than on line elements, i.e., on two-dimensional discontinuities such as corners, intersections, and endpoints of contours (Rubin et al., Motion smear disappears once motion is perceived (Burr, In some experiments, 2D acuity was found to be limited by temporal differential delays, not by spatial offsets or velocities (Carney et al., Hyperacuity thresholds are not affected by retinal image degradation (Williams et al., Temporal asynchrony interferes with Vernier acuity. Judgment of vertical alignment of two dots is impaired if the two dots are not presented synchronously (Wehrhahn and Westheimer, Elimination of retinal motion induced by FeyeM selectively impairs the discrimination of fine spatial details, while leaving the discrimination of coarse spatial details unaffected (Rucci et al., Humans show power-law dependency on the stimulus contrast in various accuracy and hyper-accuracy tasks, with exponents similar to those observed at the retina (−0.5 to −1; Wilson, EEG recordings reveal that (i) neuronal oscillations code sensory information relevant for visual perception, (ii) frequency, phase, and amplitude play differential roles in coding behaviorally relevant information in the brain, and (iii) phase contain higher information than power (Schyns et al.,
Alleged inconsistencies with relevant experimental data:
Keesey ( Vernier acuity of 2 ms flashed stimuli is as good as that of longer stimuli provided that the intensity × duration product is constant (Hadani et al.,
Critical for temporal encoding at the retina:
Small or slow movements (with amplitudes or velocities smaller than those of FeyeM) of the entire image should not impair local (hyper) acuity. However, spatially-non-coherent movements of details of the image, even if their average locations are kept constant, should. Small or slow movements (with amplitudes or velocities smaller than those of FeyeM) of the entire visual field should not be perceived, while spatially-non-coherent movements should. Synchronous temporal fluctuations of image intensity should not impair local acuity, while asynchronous fluctuations of details of the image, even if their locations are kept constant, should. Retinal latencies depend on contrast (Gawne et al.,
Critical for decoding by thalamocortical NPLLs:
Cortical simple cells should represent local spatial-phase relationships (i.e., fine determinants of shape and texture) by temporal phase relationships, and relative velocities by relative spike counts. When the temporal frequency of the retinal output increases, retino-cortical delays should increase, and the spike counts of simple cells should decrease.
Non-critical:
If FeyeM can be centrally controlled (Shakhnovich, Such adaptations should occur during long fixations (>200 ms), and are expected to stabilize at conditions that produce temporal frequencies (at the retinal output) within the alpha or gamma ranges, which are probably preferred by thalamocortical loops. Opening eyes in full darkness, or against a uniform image, should not desynchronize cortical EEG. Cortical EEG is expected to desynchronize only when viewing a patterned image, in which case different cortical oscillators are expected to track different temporal patterns. End stopping: end-stopped cells are tuned for bar lengths: they are excited by short bars and are being gradually inhibited as the length of the bar increases (see, e.g., Bolz and Gilbert, Snodderly’s “position/drift cells” (Snodderly et al.,
Additional predictions are described in Ahissar (
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
A given spatial offset in a stationary image, Δx, is translated to a neuronal temporal delay, Δ
for clarity, we ignore here the dependence of these variables on time.
Determination of retinal frequencies depends on the relationship between the texture of the image (
Note that due to the dependency of
With moving images, the resulting retinal period for a given sRF will be:
where
We use a linear version of the equation presented by Perkel et al. (
where
In the gating mode, thalamic neurons fire only when the retinal and cortical inputs overlap. Thalamic output would be maximal when the two inputs fully overlap (τ
where
The retino-cortical delay at cycle
This equation shows that the only condition in which τD(
The combination of Eqs
where
Thus,
If
The loop can converge on a limited set of input frequencies; a set termed “the working range”. The working range depends on
The dynamics of the loop while tracking can be revealed by combining Eqs
The only case in which τ
Retinal periodicities [
These cortical (and retinal) representations depend on the ratios between the amplitude of the FeyeM (
The visual motor-sensory loop (Figure
where
Our ability to detect relative offsets is an order of magnitude better than the separation power of the eye. For example, we can detect an offset of a few arcseconds between two co-linear lines (vernier hyperacuity) while we can detect a separation between two parallel lines only if they are separated by about an arcminute (separation acuity; Westheimer,
Thus, with the aid of optical smearing, and perhaps overlapping RFs, the challenge of obtaining hyperacuity resolution can be transferred to the intensity domain. However, with all-or-none spikes, response intensity can only be measured along time. How much time is required for collecting enough spikes to detect a vernier offset? A retinal network that is simulated under realistic conditions can generate a one to two spikes difference for a near-threshold vernier offset, if it is given 60 ms (Wachtler et al.,
The main consideration that invalidates a pure spatial encoding for fine details is the following. As the above observations show, it would take 60 ms or more to obtain a minimally detectable difference in spike counts of ganglion cells, whose RF diameters are smaller than 1 arcmin (foveal RFs get as small as 0.3 arcmin). However, during these 60 ms the eye had traveled over a significant number of such RFs [estimations would vary between 2 and 10 arcmin, equivalent to 6–30 RFs, depending on the study: (Ratliff and Riggs,
High resolution depth information depends on fine coordination between the eyes. During fixation, eye movements are well coordinated only during microsaccades, during which the eyes move simultaneously to the same direction and roughly at the same amplitude (Krauskopf et al.,
How do relative 2D and relative depth computations interact in the visual system is not yet known. However, the complementary nature of the responses of cortical “position/drift cells” and “saccade cells” (Snodderly et al.,
FeyeM were recorded from the right eye of human subjects at 240 Hz and a resolution of 0.01° by the iView-X system (SensoMotoric Instruments, GMBH). Subjects looked at a computer screen distanced 1 m ahead of them and either tried to keep fixation on a single spot marked with a cross or viewed an image freely. Figure
Dependency of
Excellent vernier alignment is achieved with just two or three dots (e.g., Westheimer,
We express the vernier offset,
Since 2
Thus, the difference does not depend at all on
The time window required for achieving one spike difference in such retinal populations is thus:
The maximal retinal response rate observed with line vernier stimuli was
We wish to thank Merav Ahissar, Shabtai Barash, Rafi Malach, Nava Rubin, and Dov Sagi for stimulating discussions and comments on the manuscript, Flemming Møller for providing the data for Figure
FeyeM, fixational eye movements; FF, feedforward; NPLL, neuronal phase-locked loop; PD, phase detector; RCO, rate-controlled oscillator; SC, simple cell; sRF, subfield of a simple receptive field.