Convolved images as hypotheses
From A conversation about the brain
) to estimate the retinal location of the dark feature. A simple example of this procedure is finding the centroid of a zero-bounded region of responses from a centre-surround filter. But the same principle can be applied to a 'face-detector' convolved with an image: the response in the convolved output may rise gradually to a peak, which may be centred on the face, but there is not a separate hypothesis or signal for each $(x,y)$ location in the image. Instead, there is one hypothesis and multiple sources of evidence, from many $(x,y)$ locations. Many locations contribute to a single 'centroid' which is, itself, a hypothesis about the most likely location of the feature to which the filter is tuned: in this case, a dark feature. One might argue that the location of the centroid must be reported in a retinotopic coordinate frame but that leads to another story [expand later]. The important point here is that many pixels (or firing neurons) contribute to one maximum likelihood that is reported. The combination of neural firing rates across space can be extended to combination across time and across eye movements (e.g. micro-saccades). Given that the eye has moved in the latter case, this is a critical step in abandoning the retinal frame.
Back to Hypotheses
- Watt, R. J., & Morgan, M. J. (1985). A theory of the primitive spatial code in human vision. Vision Research, 25(11), 1661-1674.
- Hinton, G. E. (1999). Products of experts. In Artificial Neural Networks, 1999. ICANN 99. Ninth International Conference on (Conf. Publ. No. 470) (Vol. 1, pp. 1-6). IET.