ITA
ENG

Minimizing binding errors using learned conjunctive features

Authors

Mel, BW Fiser, J

Citation

Bw. Mel et J. Fiser, Minimizing binding errors using learned conjunctive features, NEURAL COMP, 12(2), 2000, pp. 247-278

Citations number

Categorie Soggetti

Neurosciences & Behavoir","AI Robotics and Automatic Control

Journal title

NEURAL COMPUTATION

ISSN journal

08997667 → ACNP

Volume

Issue

Year of publication

2000

Pages

247 - 278

Database

ISI

SICI code

0899-7667(200002)12:2<247:MBEULC>2.0.ZU;2-Y

Abstract

We have studied some of the design trade-offs governing visual representati ons based on spatially invariant conjunctive feature detectors, with an emp hasis on the susceptibility of such systems to false-positive recognition e rrors-Malsburg's classical binding problem. We begin by deriving an analyti cal model that makes explicit how recognition performance is affected by th e number of objects that must be distinguished, the number of features incl uded in the representation, the complexity of individual objects, and the c lutter load, that is, the amount of visual material in the field of view in which multiple objects must be simultaneously recognized, independent of p ose, and without explicit segmentation. Using the domain of text to model o bject recognition in cluttered scenes, we show that with corrections for th e nonuniform probability and nonindependence of text features, the analytic al model achieves good fits to measured recognition rates in simulations in volving a wide range of clutter loads, word sizes, and feature counts. We t hen introduce a greedy algorithm for feature learning, derived from the ana lytical model, which grows a representation by choosing those conjunctive f eatures that are most likely to distinguish objects from the cluttered back grounds in which they are embedded. We show that the representations produc ed by this algorithm are compact, decorrelated, and heavily weighted toward features of low conjunctive order. Our results provide a more quantitative basis for understanding when spatially invariant conjunctive features can support unambiguous perception in multiobject scenes, and lead to several i nsights regarding the properties of visual representations optimized for sp ecific recognition tasks.