Neural Machine Translation Using Generative Adversarial Network

NEURAL MACHINE TRANSLATION
USING BIDIRECTIONAL
GENERATIVE ADVERSARIAL
NETWORK
PRESENTED TO: PRESENTED BY:

Prof. A. K. SINGH SHIVALI GUPTA
B.Tech (4 YEAR)
ABSTRACT:
We used a conditional sequence bidirectional generative

adversarial net. Each generative adversarial network
comprises of two adversarial sub models, a generator and a
discriminator. The generator aims to generate sentences
which are hard to be discriminated from human-translated
sentences ( i.e., the golden target sentences); and the
discriminator makes efforts to discriminate the machine-
generated sentences from human-translated ones.
ABSTRACT CONTD….
The two sub models play a mini-max game and achieve the
win-win situation when they reach a Nash Equilibrium.
Additionally, the static sentence-level BLEU is utilized as
the reinforced objective for the generator, which biases the
generation towards high BLEU points.
Generative Adversarial Network (GAN) has been proposed
to tackle the exposure bias problem of Neural Machine
Translation (NMT). However, the discriminator typically
results in the instability of the GAN training due to the
inadequate training problem: the search space is so huge
that sampled translations are not sufficient for
discriminator training.
ABSTRACT CONTD……
To address this issue and stabilize the GAN training, in this
paper, a novel Bidirectional Generative Adversarial
Network is proposed for Neural Machine Translation
(BGAN-NMT), which aims to introduce a generator model
to act as the discriminator, whereby the discriminator
naturally considers the entire translation space so that the
inadequate training problem can be alleviated.
RELATED RESEARCH PAPERS:
 Generative Adversarial Nets by
Ian J. Goodfellow, Jean Pouget-Abadie∗ , Mehdi Mirza, Bing Xu,
David Warde-Farley, Sherjil Ozair† , Aaron Courville, Yoshua
Bengio‡
 Improving Neural Machine Translation with Conditional
Sequence Generative Adversarial Nets by
Zhen Yang , Wei Chen , Feng Wang , Bo Xu
 Bidirectional Generative Adversarial Networks for Neural
Machine Translation by
Zhirui Zhang†∗ , Shujie Liu§ , Mu Li¶ , Ming Zhou§ , Enhong
Chen†
NETWORKS:
 Generative Adversarial Networks are a powerful class of

neural networks that are used for unsupervised learning.
 It was developed and introduced by Ian J. Good Fellow in
2014.
 Generative Adversarial Networks are basically made up of
a system of two competing neural network models which
compete with each other and are able to analyze, capture
and copy the variations within the dataset.
NETWORK:
 The mapping between input and output in a Generative

Adversarial Network is almost linear.
 Generative Adversarial Networks has a very wide range of
application: Text Generation, Text Summarization, Image
Capturing and Generation.
 In Generative Adversarial Network we use a generator and
a discriminator . The generator generates fake samples of
data and tries to fool the discriminator. The discriminator
on the other hand tries to distinguish between the real and
the fake samples.
NETWORK:
 The discriminator and the generator both are

neural networks and both run in competition
with each other in the training phase.
 The generative model captures the distribution of
data and is trained in such a manner that it tries
to maximize the probability of the discriminator
in making a mistake and vice versa.
TYPES OF GENERATIVE ADVERSARIAL
NETWORK:
 VANILLA GAN: This is the simplest type Generative

Adversarial Network. Here, the generator and the
discriminator are simple multi-layer perceptrons.
In Vanilla Generative Adversarial Network, the algorithm
is really simple, it tries to optimize the mathematical
equation using stochastic gradient descent.
TYPES OF GENERATIVE
ADVERSARIAL NETWORKS:
 CONDITIONAL GAN: CGAN can be described as a deep

learning method in which come conditional parameters are
put into places. In CGAN, an additional parameter Y is
added to the generator for generating the corresponding
data labels and we also put it into the input of the
discriminator in order for the discriminator to help
distinguish the real data from the fake generated data.
NETWORK CONTD….
 DEEP CONVOLUTIONAL GAN: The Deep Convolutional

Generative Adversarial Network is one of the most popular
and also the most successful implementation of Generative
Adversarial Network. It is composed of Convolutional
Networks in place of multi layer perceptrons. The layers
between the Convolutional network are not fully connected.
NETWORK:
 SUPER RESOLUTION GAN: Super Resolution Generative

Adversarial Network as the name suggests is a way of
designing a generative adversarial network in which a deep
neural network is used along with an adversarial network
in order to produce higher resolution data.
This type of generative adversarial network is particularly
useful in optimally upscalling native low resolution images
to enhance its details minimizing errors.
BIDIRECTIONAL GENERATIVE ADVERSARIAL
NETWORK:
 To address the issue of the stability of the Generative
Adversarial Network training, a novel Bidirectional
Generative Adversarial Network is proposed for Neural
Machine Translation (BGAN-NMT), which aims to
introduce a generator model to act as the discriminator,
whereby the discriminator naturally considers the entire
translation space so that the inadequate training problem
can be alleviated. To satisfy this property, generator and
discriminator are both designed to model the sentence
pairs, with the difference that, the generator decomposes
the source language model and a source-to-target
translation model, while the discriminator is formulated as
a target language model and a target-to-source translation
model.
BIDIRECTIONAL GENERATIVE
ADVERSARIAL NETWORK:
 To further leverage the symmetry of them, an auxiliary

GAN is introduced and adopts generator and discriminator
models of original one as its own discriminator and
generator respectively. Two GANs are alternately trained
to update the parameters.
 The Exposure Bias problem is the problem of the neural
network in which the model focuses on one of the
parameters at a time. While using a generative adversarial
network we pay attention to all the parameters so that it
will not suffer from exposure bias problem.
BIDIRECTIONAL GENERATIVE
ADVERSARIAL NETWORK:
 If we use a single Generative Adversarial Network then it

will suffer from inadequate training problem, leading to
the instability of GAN training. In practice, sampling large
translation candidates is time-consuming for NMT system,
so we only use a few samples to train the discriminator.
 For a given source sentence, there is usually only one

positive example (real target sentence). If the sampled
negative examples are also few, the discriminator will
easily over fit to the data. To tackle this problem we use in
one of the GAN generator model as both generator and
discriminator model.
MODEL ARCHITECTURE:
MODEL ARCHITECTURE:
 The Bidirectional Generative Adversarial Network consists of two
Generative Adversarial Networks. One of the GAN consists of a
generative model which works as both the generator as well as
the discriminator.
 In the auxiliary GAN both generative as well as discriminative
models are used.
 An additional layer is used to covert the text used into vector
representation and to create the embedding of the text.
DATASET USED:
 In the original research paper for German-English translation task,
following previous work (Ranzato et al., 2015; Bahdanau et al., 2016),
data from German-English machine translation track of IWSLT2014
evaluation tasks, which consists of sentence-aligned subtitles of TED and
TED x talks is selected. The training corpus contains 153k sentence pairs
with 2.83M English words and 2.68M German words. The validation set
comprises of 6,969 sentence pairs taken from the training data, and the
test set is a combination of dev2010, dev2012, tst2010, tst2011 and
tst2012 with total number of 6,750 sentence pairs. For Chinese-English
translation task, training data consists of a set of LDC datasets1 , which
has around 2.6M sentence pairs with 65.1M Chinese words and 67.1M
English words respectively. Any sentence longer than 80 words is
removed from training data. NIST Open MT 2006 evaluation set is used
as the validation set, and NIST 2005, 2008, 2012 datasets as test sets.
Limiting the vocabulary to contain up to 50K most frequent words on both
source and target sides.
DATASET USED:
We used the IIT Bombay English-Hindi corpus contains parallel corpus for
English-Hindi as well as monolingual Hindi corpus collected from a
variety of existing sources and corpora developed at the Center for Indian
Language Technology, IIT Bombay over the years. This page describes the
corpus. This corpus has been used at the Workshop on Asian Language
Translation Shared Task in 2016 and 2017 for the Hindi-to-English and
English-to-Hindi languages pairs.
The dataset contains 273,885 Hindi - English sentence pairs which are
divided into two parts the development pairs and the testing pairs.
The development pairs contains about 520 segments and the testing data
contains about 5207 segments.
REASON FOR CHOOSING THE MODEL:
 The generative adversarial network can avoid the exposure bias
problem of neural machine translation and is a promising
approach to produce relatively nice results. It can tackle the
problem of the stability of the Generative by using the generator
model in place of both generator and discriminator in the model.
Since the generator model takes into consideration the whole
input sequence.
BLEU SCORE:
 BLEU, or the Bilingual Evaluation Understudy, is a score for
comparing a candidate translation of text to one or more reference
translations.
 Although developed for translation, it can be used to evaluate text
generated for a suite of natural language processing tasks.
 A perfect match results in a score of 1.0, whereas a perfect
mismatch results in a score of 0.0.
 The BLEU score was proposed by Kishore Papineni, et al. in their
2002 paper “BLEU: a Method for Automatic Evaluation of
Machine Translation“.
 It is implemented to find out the closeness of the text generated
by a machine oriented system to the text generated by humans.
RESULT:
 ACCORDING TO THE RESEARCH PAPER:
ACCORDING TO THE USED DATASET:
 The BLEU SCORE is calculated after training the dataset for
5000 Hindi- English sentence pairs and the BLEU SCORE for
this dataset after training the model turns out to be 5.26.
 Increasing the dataset for the same model and again training the
model will probably increase the BLEU SCORE value by a certain
considerable amount.
 When the bigger dataset is used which comprised of 10000
sentences then the BLEU SCORE turns out to be 8.26.
BYTE PAIR ENCODING:
 Byte pair encoding or diagram encoding is a simple form of data
compression in which the most common pairs of consecutive bytes
of data is replaced with a byte that does not occur within that
data.
 A table of replacement is thus made and is required to rebuilt the
original data.
 A variant of the technique has shown to be useful in several
natural language processing applications.
 The algorithm was first described publicly by Philip Gage in a
February 1994 article "A New Algorithm for Data Compression"
in the C Users Journal.
BYTE PAIR ENCODING CONTD….
 For Example:
Let the data be: aaabdaaabac
Let Z = aa
the data thus becomes: ZabdZabac
Let Y = ab
the data thus becomes: ZYdZYac
Let X = ZY
the data thus becomes: XdXac
 This data cannot be compressed further by byte pair encoding because

there are no pair of byte that occur more than once.
 To decompress the data simply perform the replacements in the reverse
order.
CONCLUSION:
 The results obtained from the Hindi-English corpus proves that
the results are much better then the other machine translational
approach and thus it can be used on the low resource language
machine translation.
 In this paper, Bidirectional Generative Adversarial Network for
Neural Machine Translation, consisting of an original GAN and
an auxiliary GAN. Both generator and discriminator in original
GAN are designed to model the joint probability of sentence pairs.
Auxiliary GAN adopts generator and discriminator models of
original one but exchanges their roles to full utilize the symmetry
of them. Then these two GANs are alternately updated using joint
training algorithm. Experimental results on German-English and
Chinese-English translation tasks demonstrate that our proposed
approach not only stabilizes GAN training but also leads to
significant improvements. We can use this model for neural
machine translation for low resource languages.
CONCLUSION CONTD….
 To elevate the BLEU score as well as the quality of machine translation
using generative adversarial network even more we can use the byte pair
encoding technique.
 The technique along with the bidirectional generative adversarial network
will increase the chance of getter a better quality machine translated
sentence at a much lesser cost and effort.
 THANK YOU

Neural Machine Translation Using Generative Adversarial Network

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Neural Machine Translation Using Generative Adversarial Network

Caricato da

Copyright:

Formati disponibili

NEURAL MACHINE TRANSLATION

PRESENTED TO: PRESENTED BY:

We used a conditional sequence bidirectional generative

 Generative Adversarial Networks are a powerful class of

 The mapping between input and output in a Generative

 The discriminator and the generator both are

 VANILLA GAN: This is the simplest type Generative

 CONDITIONAL GAN: CGAN can be described as a deep

 DEEP CONVOLUTIONAL GAN: The Deep Convolutional

 SUPER RESOLUTION GAN: Super Resolution Generative

 To further leverage the symmetry of them, an auxiliary

 If we use a single Generative Adversarial Network then it

 For a given source sentence, there is usually only one

 This data cannot be compressed further by byte pair encoding because

Potrebbero piacerti anche