We include several videos here of top down visualizations from the convolutional network’s 3rd layer feature maps using a deconvolutional network to reconstruct downward to pixel space. By sampling a new set of pooling locations throughout the convolutional network and using them to reconstruct downward with the deconvolutional network we generate each of the 100 frames of video shown below. The sampling is done using the probablities created by the feedforward (FF) pass of the convolutional network to show how the high levels of the network could be perceiving the input image. Each frame of the video as well as the original images input to the convolutional network are shown below the videos.
All of the frames of the videos above are shown here in a single image to look closely at the subtle changes frame to frame. All images are best viewed in a new window at full resulution. The original image that was input to the convolutional network is also provided.
Airplane image input to convolutional network:
All 100 frames of the airplane video:
Car image input to convolutional network:
All 100 frames of the car video:
Horse image input to convolutional network:
All 100 frames of the horse video: