Sei sulla pagina 1di 1

Summary on ​Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Networks

In this paper, the author begins by describing how GANs are superior alternatives to
maximum likelihood techniques for unsupervised learning tasks like image
classification. Since they are unstable to train, this paper puts forth a new architecture
called Deep Convolutional GAN which makes training stable in most settings. The
author also addresses other issues like image generation and visualizing the internals of
neural networks which have been historically unsuccessful.

Three essential changes are required to the CNN architecture: 1. An all convolutional
net in the generator which replaces deterministic spatial pooling functions with strided
convolutions, allowing the network to learn its own spatial downsampling, 2. Eliminating
fully connected layers on top of convolutional feature and 3. Batch normalisation. ReLU
activation is used in generator for all layers except for the output, which uses Tanh
function and LeakyReLU activation in the discriminator for all layers. Also remove fully
connected hidden layers for deeper architectures.

DCGANs were trained on three datasets, Large-scale Scene Understanding (LSUN),


Imagenet-1k and Faces dataset. No data augmentation was applied to the images.
Images were scaled to the range of tanh activation [-1,1]. All models were trained with
SGD with 128 mini-batch size. All weights were initialized from a zero-centered normal
distribution with standard deviation 0.02. Slope of LeakyReLU leak was set to 0.2 in all
models. Adam optimizer with tuned hyperparameters to accelerate training and learning
rate of 0.0002 was used.

The quality of unsupervised representation learning algorithms are evaluated by


applying them as a feature extractor on supervised datasets and evaluating the
performance of linear models fitted on top of these features. To evaluate the quality of
the representations learned by DCGAN for supervised tasks, it was trained on
Imagenet-1k and its discriminator’s convolutional features were used from all layers.
This achieves 82.8% accuracy, out performing all K-means based approaches. The
performance of DCGANs is still lesser than Exemplar CNNs. DCGAN also perform well
on SVNH dataset.

The authors conclude by describing the future work of tackling instability introduced by a
subset of filters collapsing to single oscillating mode. Application of DCGAN on other
domains like video and audio and extensive research on latent space would be
interesting.

Potrebbero piacerti anche