Deconvolutional Networks for Feature Learning

Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, and Rob Fergus

The Learning (Snowbird) Workshop (April 6-9, 2010)
Supplemental Material:  Code  Images  Videos

Abstract
Building robust low-level image representations, beyond edge primitives, is a long-standing goal in vision. In its most ba- sic form, an image is a matrix of intensities. How we should progress from this matrix to stable mid-level representations, useful for high-level vision tasks, remains unclear. Popular feature representations such as SIFT or HOG spatially pool edge information to form descriptors that are invariant to local transformations. However, in doing so important cues such as edge intersections, grouping, parallelism and symmetry are lost…

Supplemental Materials

Videos

Real-time learning of layer 1 filters on city dataset – Shows the progression of the layer 1 filters during training on a dataset of city images. Video is shown at 2.1 fps.

Real-time learning of layer 2 filters on city dataset – Shows the progression of the layer 2 filters during training on a dataset of city images. Video is shown at 0.48 fps.

Real-time learning of layer 1 filters on fruit dataset – Shows the progression of the layer 1 filters during training on a dataset of fruit images. Video is shown at 2.1 fps.

Real-time learning of layer 2 filters on fruit dataset – Shows the progression of the layer 2 filters during training on a dataset of fruit images. Video is shown at 0.48 fps.

Images

 

Most of the images are available in the pdf of the paper, however the following two images are included here as supplemental material. These represent the filters trained on a combination of the city and fruit datasets that were used to infer feature maps activation for the Caltech-101 recognition task.

Layer 1 Filters used for Caltech-101 experiments:

Layer 2 Filters used for Caltech-101 experiments: