Deconvolutional Networks

Matthew D. Zeiler, Dilip Kirshnan, Graham W. Taylor, and Rob Fergus

Computer Vision and Pattern Recognition (June 13-18, 2010)
Supplemental Material:  Code  Images  Videos

Abstract
Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features that capture these mid-level cues spontaneously emerge from image data. Our approach is based on the convolutional decomposition of images under a sparsity constraint and is totally unsupervised. By building a hierarchy of such decompositions we can learn rich feature sets that are a robust image representation for both the anal- ysis and synthesis of images.

Supplemental Materials

Videos

Real-time learning of layer 1 filters on city dataset – Shows the progression of the layer 1 filters during training on a dataset of city images. Video is shown at 2.1 fps.

Real-time learning of layer 2 filters on city dataset – Shows the progression of the layer 2 filters during training on a dataset of city images. Video is shown at 0.48 fps.

Real-time learning of layer 1 filters on fruit dataset – Shows the progression of the layer 1 filters during training on a dataset of fruit images. Video is shown at 2.1 fps.

Real-time learning of layer 2 filters on fruit dataset – Shows the progression of the layer 2 filters during training on a dataset of fruit images. Video is shown at 0.48 fps.

Images

 

Most of the images are available in the pdf of the paper, however the following two images are included here as supplemental material. These represent the filters trained on a combination of the city and fruit datasets that were used to infer feature maps activation for the Caltech-101 recognition task.

Layer 1 Filters used for Caltech-101 experiments:

Layer 2 Filters used for Caltech-101 experiments: