Stochastic Pooling for Regularization of Deep Convolutional Neural Networks

Matthew D. Zeiler and Rob Fergus
ICLR 2013 (May 2, 2013)


We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.

Paper(.pdf)  Images  Videos


We include several videos here of top down visualizations from the convolutional network's 3rd layer feature maps using a deconvolutional network to reconstruct downward to pixel space. By sampling a new set of pooling locations throughout the convolutional network and using them to reconstruct downward with the deconvolutional network we generate each of the 100 frames of video shown below. The sampling is done using the probablities created by the feedforward (FF) pass of the convolutional network to show how the high levels of the network could be perceiving the input image. Each frame of the video as well as the original images input to the convolutional network are shown below the videos.

The player will show in this paragraph unless you do not have flash player installed.

Links to download videos:

[Top-down Visualization of an Airplane FF-FF-FF]

[Top-down Visualization of a Car FF-FF-FF]

[Top-down Visualization of a Horse FF-FF-FF]


All of the frames of the videos above are shown here in a single image to look closely at the subtle changes frame to frame. All images are best viewed in a new window at full resulution. The original image that was input to the convolutional network is also provided.

Airplane image input to convolutional network:
Original Airplane Image.

All 100 frames of the airplane video:
Airplane with 100 samples.

Car image input to convolutional network:
Original Car Image.

All 100 frames of the car video:
			with 100 samples.

Horse image input to convolutional network:
Original Horse Image..

All 100 frames of the horse video:
Horse with 100 samples.