Welcome to my site
This is my personal site which plays host to my publications, videos, and some information about myself.
I am interested in Machine Learning and Computer Vision and am currently beginning to study towards a Computer Science PhD at New York University. I plan to concentrate my research on both machine learning and computer vision while at NYU.
News & Events
Recent Publications
Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann MachinesMatthew D. Zeiler, Graham W. Taylor, Leonid Sigal, Iain Matthews, and Rob FergusNeural Information Processing Systems(December 12-17, 2011)
Supplemental Material: 
Supplementary 
Code 
VideosAbstract We present a type of Temporal Restricted Boltzmann Machine that defines a prob- ability distribution over an output sequence conditional on an input sequence. It shares the desirable properties of RBMs: efficient exact inference, an exponen- tially more expressive latent state than HMMs, and the ability to model nonlinear structure and dynamics. We apply our model to a challenging real-world graphics problem: facial expression transfer. Our results demonstrate improved perfor- mance over several baselines modeling high-dimensional 2D and 3D data.Adaptive Deconvolutional Networks for Mid and High Level Feature LearningMatthew D. Zeiler, Graham W. Taylor, and Rob FergusInternational Conference on Computer Vision(November 6-13, 2011)
Supplemental Material: 
Code 
SlidesAbstract We present a hierarchical model that learns image de- compositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a va- riety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hier- archical approaches. This makes it possible to learn mul- tiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. When combined with a standard classifier, features extracted from these models outperform SIFT, as well as representations from other feature learning methods.
Recent Software Added
Adaptive Deconvolutional Network Toolbox
This toolbox includes code that implements an Adaptive Deconvolutional Network as described in the paper
Adaptive Deconvolutional Networks for Mid and High Level Feature Learning. It may also be used to implement a Deconvolutional Network as described in the paper
Deconvolutional Networks though this is not longer the recommended method. This has a function to train a Deconvolutional Network, to visualize the learned filters, and to recsontruct a new image from a trained model. Also, there are files that can be used to make descriptors that can be used with
Svetlana Lazebnik's Spatial Pyramid Matching code with a few minor modifications. The Deconvolutional Network Toolbox also works with (and now includes) the
IPP Convolutions Toolbox which drastically improves performance (just ensure the IPP Convolutions Toolbox files are in your MATLAB path in order to use it with this toolbox and that they are compiled with your IPP libraries.).
Download (.zip) 
Documentation (html)Suggested Software Eero SimonCell's Matlab Pyramid Toolbox
the MEX files contained within this package significantly speed up the LUT performance of the Deconvolutional Network (used for hyperlaplacian priors other than L1-norm).
Related Publications
Adaptive Deconvolutional Networks for Mid and High Level Feature Learning
Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus
International Conference on Computer Vision(November 6-13, 2011)
Supplemental Material:  Slides
Abstract
We present a hierarchical model that learns image de- compositions via alternating layers of convolutional sparse coding and max pooling. When trained on natural images, the layers of our model capture image information in a va- riety of forms: low-level edges, mid-level edge junctions, high-level object parts and complete objects. To build our model we rely on a novel inference scheme that ensures each layer reconstructs the input, rather than just the output of the layer directly beneath, as is common with existing hier- archical approaches. This makes it possible to learn mul- tiple layers of representation and we show models with 4 layers, trained on images from the Caltech-101 and 256 datasets. When combined with a standard classifier, features extracted from these models outperform SIFT, as well as representations from other feature learning methods.
Deconvolutional Networks
Matthew D. Zeiler, Dilip Kirshnan, Graham W. Taylor, and Rob Fergus
Computer Vision and Pattern Recognition(June 13-18, 2010)
Supplemental Material:  Images  Videos
Abstract
Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information which destroys cues such as edge intersections, parallelism and symmetry. We present a learning framework where features that capture these mid-level cues spontaneously emerge from image data. Our approach is based on the convolutional decomposition of images under a sparsity constraint and is totally unsupervised. By building a hierarchy of such decompositions we can learn rich feature sets that are a robust image representation for both the anal- ysis and synthesis of images.
Deconvolutional Networks for Feature Learning
Matthew D. Zeiler, Dilip Krishnan, Graham W. Taylor, and Rob Fergus
The Learning (Snowbird) Workshop (April 6-9, 2010)
Supplemental Material:  Images  Videos
Abstract
Building robust low-level image representations, beyond edge primitives, is a long-standing goal in vision. In its most ba- sic form, an image is a matrix of intensities. How we should progress from this matrix to stable mid-level representations, useful for high-level vision tasks, remains unclear. Popular feature representations such as SIFT or HOG spatially pool edge information to form descriptors that are invariant to local transformations. However, in doing so important cues such as edge intersections, grouping, parallelism and symmetry are lost...
Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
This toolbox provides MATLAB implementations of ioTRBMs and FIOTRBM models for use in facial retargeting expeirments.
Download (.zip) 
Documentation (html)Related Publications
Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
Matthew D. Zeiler, Graham W. Taylor, Leonid Sigal, Iain Matthews, and Rob Fergus
Neural Information Processing Systems(December 12-17, 2011)
Supplemental Material:  Supplementary  Videos
Abstract
We present a type of Temporal Restricted Boltzmann Machine that defines a prob- ability distribution over an output sequence conditional on an input sequence. It shares the desirable properties of RBMs: efficient exact inference, an exponen- tially more expressive latent state than HMMs, and the ability to model nonlinear structure and dynamics. We apply our model to a challenging real-world graphics problem: facial expression transfer. Our results demonstrate improved perfor- mance over several baselines modeling high-dimensional 2D and 3D data.