Larger translations

From A conversation about the brain
Jump to: navigation, search

Larger translations of the camera/eye, e.g. in navigation, are generally dealt with in computer vision by building a 3D reconstruction (Simultaneous Localisation and Mapping, SLAM[1][2]). Biologists have explored view-based approaches and compared these to 3D reconstruction approaches. Mallot, Bülthoff and colleagues were among the first to advocate view-based approaches as a model of human navigation[3][4]. The aim in these pages on 3D vision is to describe how a system of rotations and translations of the camera (small and large) could be united in a common framework based only on images related by motor outputs. This framework needs to explain not only the rules for navigating between one image and the next (and so on)[3] but also the ability to discriminate different surface slants, relative depths and locations of objects viewed from a wide variety of vantage points.

Back to 3D vision.


  1. Davison, A. J., Reid, I. D., Molton, N. D., & Stasse, O. (2007). MonoSLAM: Real-time single camera SLAM. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(6), 1052-1067.
  2. Newman, P., & Ho, K. (2005, April). SLAM-loop closing with visually salient features. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on (pp. 635-642). IEEE.
  3. 3.0 3.1 Franz, M. O., Schölkopf, B., Mallot, H. A., & Bülthoff, H. H. (1998). Where did I take that snapshot? Scene-based homing by image matching. Biological Cybernetics, 79(3), 191-202.
  4. Gillner, S., & Mallot, H. A. (1998). Navigation and acquisition of spatial knowledge in a virtual maze. Journal of Cognitive Neuroscience, 10(4), 445-463.