Introduction of Generative Adversarial Network

Introduction of Generative
Adversarial Network (GAN)

李宏毅
Hung-yi Lee
Generative Adversarial Network
(GAN)
• How to pronounce “GAN”?
Google 小姐
Yann LeCun’s comment
https://www.quora.com/What-are-some-recent-and-
potentially-upcoming-breakthroughs-in-unsupervised-learning
Yann LeCun’s comment
……
https://www.quora.com/What-are-some-recent-and-potentially-upcoming-breakthroughs-
in-deep-learning
Deep Learning in One Slide (Review)
➢ Fully connected feedforward network
Many kinds of
➢ Convolutional neural network (CNN)
network structures:
➢ Recurrent neural network (RNN)
Different networks can take different kinds of input/output.
Vector
Matrix 𝑥 Network 𝑦
Vector Seq
(speech, video, sentence)
How to find Given the example inputs/outputs as

the function? training data: {(x1,y1),(x2,y2), ……, (x1000,y1000)}
Creation
Anime Face
Generation
Drawing?
何之源的知乎: https://zhuanlan.zhihu.com/p/24767059
DCGAN: https://github.com/carpedm20/DCGAN-tensorflow
Powered by: http://mattya.github.io/chainer-DCGAN/
Basic Idea of GAN It is a neural network

(NN), or a function.
vector Generator image matrix
0.1 3
−3 −3
⋮ Generator ⋮ Generator
2.4 2.4
0.9 Each dimension of input vector 0.9 Longer hair
represents some characteristics.
0.1 0.1
2.1 −3
2.4 2.4
0.9 blue hair 3.5 Open mouth
Basic Idea of GAN It is a neural network
(NN), or a function.
Discri- scalar
image
minator Larger value means real,
smaller value means fake.
Discri- Discri-
1.0 minator 1.0
minator
Discri- Discri-
minator
0.1 minator
?
Basic Idea of GAN
Generator
Brown veins
Butterflies are Butterflies do

……..
not brown not have veins
Discriminator
This is where the term
“adversarial” comes from.
Basic Idea of GAN You can explain the process
in different ways…….
NN NN NN
Generator Generator Generator
v1 v2 v3
Discri- Discri- Discri-

minator minator minator
v1 v2 v3
Real images:
Generator Discriminator
(student) (teacher)
Basic Idea of GAN
Generator
v1 Discriminator
v1
No eyes
Generator
v2
Discriminator
v2
No mouth
Generator
v3
Questions
Q1: Why generator cannot learn by itself?
Q2: Why discriminator don’t generate

object itself?
Q3: How discriminator and generator

interact?
Anime Face Generation
100 updates
1000 updates
2000 updates
5000 updates
10,000 updates
20,000 updates
50,000 updates
G G G G G G G G G G
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
感謝陳柏文同學提供實驗結果
Outline
Lecture 1: Introduction of GAN
Lecture 2: Variants of GAN
Lecture 3: Making Decision and Control

Lecture I:
Introduction of GAN
To learn more theory:
https://www.youtube.com/watch?v=0CKeqXl5IY0&lc=z13zuxbgl
pvsgbgpo04cg1bxuoraejdpapo0k
https://www.youtube.com/watch?v=KSN4QYgAtao&lc=z13kz1nq
vuqsipqfn23phthasre4evrdo
Lecture I
When can I use GAN?
Generation by GAN
Improving GAN
Structured Learning
Machine learning is to find a function f
f : X Y
Regression: output a scalar
Classification: output a “class” (one-hot vector)
1 0 0 0 1 0 0 0 1
Class 1 Class 2 Class 3
Structured Learning/Prediction: output a
sequence, a matrix, a graph, a tree ……
Output is composed of components with dependency
Structured
Learning
Regression,
Classification
Output Sequence f : X Y
Machine Translation
X : “機器學習及其深層與 Y : “Machine learning and
結構化” having it deep and structured”
(sentence of language 1) (sentence of language 2)
Speech Recognition
X: Y : 感謝大家來上課”
(speech) (transcription)
Chat-bot
X: “How are you?” Y: “I’m fine.”

(what a user says) (response of machine)
Output Matrix f : X Y
Image to Image Colorization:
X: Y:
Ref: https://arxiv.org/pdf/1611.07004v1.pdf
Text to Image
X : “this white and yellow flower Y:

have thin white petals and a
round yellow stamen”
ref: https://arxiv.org/pdf/1605.05396.pdf
Decision Making and Control
Action: Action: Action:

“right” “fire” “left”
GO Playing is the same.

Why Structured Learning
Interesting?
• One-shot/Zero-shot Learning:
• In classification, each class has some examples.
• In structured learning,
• If you consider each possible output as a
“class” ……
• Since the output space is huge, most “classes”
do not have any training data.
• Machine has to create new stuff during
testing.
• Need more intelligence
Why Structured Learning
Interesting?
• Machine has to learn to planning
• Machine can generate objects component-by-
component, but it should have a big picture in its mind.
• Because the output components have dependency,
they should be considered globally.
Image
Generation
Sentence 這個婆娘不是人
Generation 九天玄女下凡塵
Structured Learning Approach
Generator
Learn to generate Bottom
the object at the Up
component level
Generative
Adversarial
Network (GAN)
Discriminator
Evaluating the Top
whole object, and Down
find the best one
Lecture I
When can I use GAN?
Generation by GAN
Improving GAN
We will control what to generate
Generation latter. → Conditional Generation
Image Generation
0.3 0.1 −0.3
−0.1 −0.1 0.1 NN
⋮ ⋮ ⋮ Generator
−0.7 0.7 0.9
In a specific range
Sentence Generation
0.3 0.1 −0.3 How are you?
−0.1 −0.1 0.1 NN
⋮ ⋮ ⋮ Good morning.
Generator
−0.7 0.2 0.5 Good afternoon.
So many questions ……

object itself?

interact?
code
code
NN
Generator
code
Generator
vectors
code: 0.1 0.1 0.2 0.3
(where does them −0.5 0.9 −0.1 0.2
come from?)
Image:
As close as possible
0.1 NN
image
0.9 Generator
𝑦1 1
NN
c.f. 𝑦2 0
Classifier
⋮ ⋮
code
code
NN
Generator
code
Generator
vectors
code: 0.1 0.1 0.2 0.3
(where does them −0.5 0.9 −0.1 0.2
come from?)
Image:
Encoder in auto-encoder
provides the code ☺
NN
𝑐
Encoder
Auto-encoder Low
dimension
Compact
NN code representation of
Encoder the input object
28 X 28 = 784
Learn together
NN Can reconstruct
code
Decoder the original object
Trainable
NN NN
𝑐
Encoder Decoder
Auto-encoder - Example
NN
𝑐
Encoder
PCA 降到
32-dim
Pixel -> tSNE
Unsupervised Leaning
without annotation
Auto-encoder
code
NN NN
Encoder Decoder
= Generator
Randomly generate
code
NN
a vector as code Decoder
Image ?
= Generator
(real examples)
Auto-encoder
2D
code NN image
Decoder
NN
Decoder
1.5
0
-1.5 1.5
−1.5
0 NN
Decoder
(real examples)
Auto-encoder
-1.5 1.5
code
code
NN
Auto-encoder
code
Generator
vectors
code: 0.1 0.1 0.2 0.3
(where does them −0.5 0.9 −0.1 0.2
come from?)
Image:
NN NN
a b
Generator Generator
NN
0.5x a + 0.5x b ?
Generator
Auto-encoder
NN NN
input output
Encoder Decoder
code = Generator
Variational Auto-encoder
Minimize
(VAE) m 1 reconstruction error
m2
NN m3 𝑐1
input NN
Encoder 𝜎1 exp + 𝑐2 output
Decoder
𝜎2 𝑐3
𝜎3
X 𝑐𝑖 = 𝑒𝑥𝑝 𝜎𝑖 × 𝑒𝑖 + 𝑚𝑖
𝑒
From a normal 1
𝑒2 Minimize
distribution 𝑒3 3
2
෍ 𝑒𝑥𝑝 𝜎𝑖 − 1 + 𝜎𝑖 + 𝑚𝑖
𝑖=1
Why VAE?
decode
code
encode
VAE - Writing Poetry
sentence RNN RNN

sentence
Encoder Decoder
code
Code Space
i went to the store to buy some groceries.
i store to buy some groceries.
i were to buy any groceries.
……
"come with me," she said.
"talk to me," she said.
"don’t worry about it," she said.
Ref: http://www.wired.co.uk/article/google-artificial-intelligence-poetry
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy
Bengio, Generating Sentences from a Continuous Space, arXiv prepring, 2015
What do we miss?
Generated Image Target
G
as close as
possible
It will be fine if the generator can truly copy the target image.
What if the generator makes some mistakes …….
Some mistakes are serious, while some are fine.
What do we miss? Target
1 pixel error 1 pixel error
我覺得不行我覺得不行
6 pixel errors 6 pixel errors
我覺得我覺得
其實 OK 其實 OK
Each neural in output layer
What do we miss? corresponds to a pixel.
Layer L-1 Layer L
…… empty
誰管你
…… ink
……
……
旁邊的，我們產
我覺得不行我覺得其實 OK ……生一樣的顏色
The relation between the components are critical.

The last layer generates each components independently.
Need deep structure to catch the relation between
components.
感謝黃淞楓同學提供結果
(Variational) Auto-encoder
x2
𝑧1 𝑥1
𝑧2 G 𝑥2
x1

object itself?

interact?
Evaluation function, Potential
Discriminator Function, Evaluation Function …
• Discriminator is a function D (network, can deep)
D: X  R
• Input x: an object x (e.g. an image)
• Output D(x): scalar which represents how “good” an
object x is
D 1.0 D 0.1
Can we use the discriminator to generate objects?

Yes.
Discriminator
• It is easier to catch the relation between the
components by top-down evaluation.
This CNN filter is

我覺得不行我覺得其實 OK good enough.
Discriminator
• Suppose we already have a good discriminator
D(x) …
Enumerate all possible x !!!

It is feasible ???
How to learn the discriminator?
Discriminator - Training
• I have some real images
D scalar 1 D scalar 1
(real) (real)
(real) (real)
Discriminator only learns to output “1” (real).
Discriminator training needs some negative examples.

object Discrimi scalar

x nator D D(x)
D(x)
real examples
In practice, you cannot decrease all the x
other than real examples.
• Negative examples are critical.
(real) (real)
(fake) (fake)
D
How to generate realistic
0.9 Pretty real
negative examples?
• General Algorithm
• Given a set of positive examples, randomly
generate a set of negative examples.
• In each iteration
• Learn a discriminator D that can discriminate
positive and negative examples.
v.s. D
• Generate negative examples by discriminator D

x  arg max D x 
~
x X
Discriminator 𝐷 𝑥
- Training
real
generated
𝐷 𝑥
In the end ……
𝐷 𝑥 𝐷 𝑥
Structured Learning ➢ Structured Perceptron
➢ Structured SVM
➢ Gibbs sampling
Graphical Model ➢ Hidden information
➢ Application: sequence
labelling, summarization
Bayesian Network Markov Random Field
(Directed Graph) (Undirected Graph)
Energy-based
Model:
http://www.cs.nyu
Conditional Markov Logic Boltzmann .edu/~yann/resear
Random Field Network Machine ch/ebm/
(Only list some of

the approaches) Restricted
Segmental CRF Boltzmann Machine
Generator v.s. Discriminator
• Generator • Discriminator
• Pros: • Pros:
• Considering the big
• Easy to generate even picture
with deep model • Cons:
• Cons: • Generation is not
always feasible
• Imitate the appearance • Especially when
• Hard to learn the your model is deep
correlation between • How to do negative
components sampling?

object itself?

interact?
• General Algorithm
• Given a set of positive examples, randomly
generate a set of negative examples.
• Learn a discriminator D that can discriminate
positive and negative examples.
v.s. D
• Generate negative examples by discriminator D
G ~
x = x  arg max D x 
~
x X
Generating Negative Examples
G ~
~
x X
hidden layer
NN Discri-
image 0.13
Generator minator
code update fix 0.9
Gradient Ascent
Algorithm
• Initialize generator and discriminator G D
• In each training iteration:
Sample some
Update
real objects: 1 1 1 1
D
Generate some
fake objects:
code 0 0 0 0
code
code
code
G
code
code
code
code
image
G image
image D 1
image
update fix
Algorithm
• Initialize generator and discriminator G D
• In each training iteration:
Sample some
Update
real objects: 1 1 1 1
Generate some D
Learning
fake objects: 0 0 0 0
D code
code
code
code
G
Learning
code
code
code
code
image
G G image
image D 1
image
update fix
Benefit of GAN
• From Discriminator’s point of view
• Using generator to generate negative samples
G ~
~
x X
efficient
• From Generator’s point of view
• Still generate the object component-by-
component
• But it is learned from the discriminator with
global view.
感謝段逸林同學提供結果
GAN
VAE
GAN
https://arxiv.org/a
bs/1512.09300
Lecture I
When can I use GAN?
Generation by GAN
Improving GAN
Discriminator
GAN Real
Generated
• Discriminator leads the generator
𝐷 𝑥 𝐷 𝑥
Binary Classifier real
generated
as Discriminator
Binary Binary
Real Fake
Classifier Classifier
Typical binary classifier uses sigmoid function
at the output layer 1 is the largest, 0 is the smallest
They don’t
move.
You cannot train your classifier too good……

Binary Classifier
as Discriminator
• Don’t let the discriminator perfectly separate real
and generated data
• Weaken your discriminator?
strong real
weak generated
Binary Classifier
as Discriminator
• Don’t let the discriminator perfectly separate real
and generated data
• Add noise to input or label?
real
generated
real
generated
Least Square GAN (LSGAN)
• Replace sigmoid with linear (replace classification
with regression)
Binary Binary
scalar scalar
1 (Real) 0 (Fake)
They don’t 1
move.
0
real
generated
WGAN
• We want the scores of the real examples as large as
possible, generated examples as small as possible.
Binary Binary
scalar scalar
as large as possible as small as possible

Move towards Max +∞
Any
problem?
−∞
WGAN
Move towards Max
The discriminator should
be a 1-Lipschitz function.
It should be smooth.
How to realize?
Lipschitz Function
𝐷 𝑥1 − 𝐷 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2
Output Input
change change
K=1 for "1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧"

Do not change fast
It should be
WGAN smooth enough.
• Original WGAN → Weight Clipping

Force the parameters w between c and -c
After parameter update, if w > c, w = c; if w < -c, w = -c
Do not truly maximize (minimize) the real (generated) examples
• Improved WGAN → Gradient Penalty
Keep the gradient close to 1
𝐷 𝑥 𝐷 𝑥
Easy to move
DCGAN LSGAN Original Improved
WGAN WGAN
G: CNN, D: CNN
G: CNN (no normalization), D: CNN (no normalization)
G: CNN (tanh), D: CNN(tanh)

DCGAN LSGAN Original Improved
WGAN WGAN
G: MLP, D: CNN
G: CNN (bad structure), D: CNN
G: 101 layer, D: 101 layer

I will talk about
RNN later.
Sentence Generation
• good bye.
g o o d <space> ……
…
d 0 0 0 1 1
…
1-of-N g 1 0 0 0 0
Encoding
…
o 0 1 1 0 0
…
<space> 0 0 0 0 1
Consider this matrix as an “image”

I will talk about
RNN later.
Sentence Generation
• Real sentence
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0 A binary classifier
0 0 0 1 0 can immediately
• Generated 0 0 0 0 1 find the difference.
No overlap
0.9 0.1 0.1 0 0
0.1 0.9 0.1 0 0
Can never 0 0 0.7 0.1 0
be 1-of-N 0 0 0.1 0.8 0.1
0 0 0 0.1 0.9 WGAN is helpful
Improved WGAN successfully https://arxiv.org/abs/1704.00028
generating sentences
感謝李仲翊同學提供實
驗結果
W-GAN – 唐詩鍊成輸出 32 個字 (包含標點)
• 升雲白遲丹齋取，此酒新巷市入頭。黃道故海歸中後，不驚入得韻子門。
• 據口容章蕃翎翎，邦貸無遊隔將毬。外蕭曾臺遶出畧，此計推上呂天夢。
• 新來寳伎泉，手雪泓臺蓑。曾子花路魏，不謀散薦船。
• 功持牧度機邈爭，不躚官嬉牧涼散。不迎白旅今掩冬，盡蘸金祇可停。
• 玉十洪沄爭春風，溪子風佛挺橫鞋。盤盤稅焰先花齋，誰過飄鶴一丞幢。
• 海人依野庇，為阻例沉迴。座花不佐樹，弟闌十名儂。
• 入維當興日世瀕，不評皺。頭醉空其杯，駸園凋送頭。
• 鉢笙動春枝，寶叅潔長知。官爲宻爛去，絆粒薛一靜。
• 吾涼腕不楚，縱先待旅知。楚人縱酒待，一蔓飄聖猜。
• 折幕故癘應韻子，徑頭霜瓊老徑徑。尚錯春鏘熊悽梅，去吹依能九將香。
• 通可矯目鷃須浄，丹迤挈花一抵嫖。外子當目中前醒，迎日幽筆鈎弧前。
• 庭愛四樹人庭好，無衣服仍繡秋州。更怯風流欲鴂雲，帛陽舊據畆婷儻。
Loss-sensitive GAN (LSGAN)
WGAN LSGAN
D(x) D(x)
𝑥 Δ 𝑥, 𝑥′
𝑥
Δ 𝑥, 𝑥′
𝑥′′
𝑥′
𝑥′′ 𝑥′
Energy-based GAN (EBGAN)
• Using an autoencoder as discriminator D
An image It can be reconstructed
=
is good. by autoencoder.
Generator is
the same. Discriminator
0.1
EN DE - X -1 -0.1
Autoencoder 0 for best

images
EBGAN
Auto-encoder based discriminator
only give limited region large value.
0 is for gen real gen

the best.
Do not have to
be very negative
Hard to reconstruct, easy to destroy
Mode Collapse
Missing Mode ?
Mode collapse is
easy to detect.
?
Missing Mode ?
• E.g. BEGAN on CelebA
陳柏文同學提供
實驗結果
圖片來原柯達方同學
Ensemble
Generator Generator
1 2
Lecture II:
Variants of GAN
Lecture II
Conditional Generation
Sequence Generation
A Little Bit of Theory (option)

Generation
0.3 0.1 −0.3
−0.1 −0.1 0.1 NN
⋮ ⋮ ⋮ Generator
−0.7 0.7 0.9
In a specific range
“Girl with red hair
and red eyes” NN
“Girl with yellow Generator
ribbon”
• We don’t want to simply generate some random stuff.
• Generate objects based on conditions:
Caption Generation
Given “A young girl

condition: is dancing.”
Chat-bot
“Hello” “Hello. Nice
Given
to see you.”
condition:
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
Modifying Input Code
0.1 3
−3 −3
2.4 2.4
0.9 Each dimension of input vector 0.9 Longer hair
represents some characteristics.
0.1 0.1
2.1 −3
2.4 2.4
0.9 blue hair 3.5 Open mouth
➢The input code determines the generator output.

➢Understand the meaning of each dimension to
control the output.
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
InfoGAN
Regular
(The colors GAN
represents the
characteristics.) Modifying a specific dimension,
no clear meaning
What we expect Actually …
What is InfoGAN?
c Discrimi
z = Generator x scalar
z’ nator
encoder
“Auto- Parameter
decoder Classifier
encoder” sharing
(only the last
Predict the code c layer is different)
that generates x c
What is InfoGAN?
c
z = Generator x c must have clear
z’ influence on x
encoder
“Auto-
decoder Classifier The classifier can
encoder”
recover c from x.
Predict the code c
that generates x c
https://arxiv.org/abs/1606.03657
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
Connecting Code and Attribute
CelebA
GAN+Autoencoder
• We have a generator (input z, output x)
• However, given x, how can we find z?
• Learn an encoder (input x, output z)
as close as possible
x z Generator
Encoder x
(Decoder)
different structures? init fixed
Discriminator Autoencoder
x z Generator
Encoder x
(Decoder)
Attribute Representation
𝑧𝑙𝑜𝑛𝑔
1 1
𝑧𝑙𝑜𝑛𝑔 = ෍ 𝐸𝑛 𝑥 − ෍ 𝐸𝑛 𝑥 ′
𝑁1 𝑁2 ′
𝑥∈𝑙𝑜𝑛𝑔 𝑥 ∉𝑙𝑜𝑛𝑔
Short ′ ′ Long
𝑥 𝐸𝑛 𝑥 + 𝑧𝑙𝑜𝑛𝑔 = 𝑧 𝐺𝑒𝑛 𝑧
Hair Hair
Photo
Editing
https://www.youtube.com/watch?v=kPEIJJsQr7U
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
Conditional GAN
• Generating images based on text description
sentence NN code
“a dog is sleeping” Encoder
NN
code image
Generator
?
c1: a dog is running 𝑥ො 1 :
Conditional GAN c2: a bird is flying 𝑥ො 2 :
• Generating images based on text description
sentence NN code
“a dog is running” Encoder
“a bird is flying”
NN
code
Generator
c1: a dog is running 𝑥ො 1 :
Conditional GAN c2: a bird is flying 𝑥ො 2 :
• Text to image by traditional supervised learning
c1: a dog is running NN Image 𝑥ො 1 :

as close as
possible
Text: “train”
Target of
NN output
A blurry image!
Conditional GAN
c: train
G Image x = G(c,z)
Prior distribution 𝑧
It is a distribution.
Approximate the
Text: “train” distribution below
Target of
NN output
A blurry image!
Conditional GAN
c: train
G Image x = G(c,z)
Prior distribution 𝑧
x is realistic or not x is realistic or not +

c and x are matched or not
D 𝑐 D
𝑥 scalar scalar
(type 1) (type 2)
𝑥
Positive example: Positive example: (train , )

Negative example:
Negative example: (train , ) (cat , )
Text to Image "red flower with
- Results black center"
Text to Image
- Results
MLDS 作業三
Conditional GAN 負責助教：曾柏翔
• 根據文字敘述畫出動漫人物頭像
Red hair, long hair
Black hair, blue eyes
Blue hair, green eyes

感謝曾柏翔助教、
Data Collection 樊恩宇助教蒐集資料
http://konachan.net/post/show/239400/aikatsu-clouds-flowers-hikami_sumire-hiten_goane_r
blue eyes
Released Training Data red hair
short hair
96 x 96
• Data download link:
https://drive.google.com/open?id=0BwJmB7alR-
AvMHEtczZZN0EtdzQ
• Anime Dataset:
• training data: 33.4k (image, tags) pair
• Training tags file format
• img_id <comma> tag1 <colon> #_post <tab> tag2
<colon> …
tags.csv
Image-to-image
𝑐
G x = G(c,z)
𝑧
https://arxiv.org/pdf/1611.07004
Image-to-image
• Traditional supervised approach
NN Image
as close as
possible
Testing:
It is blurry because it is the

average of several images.
input close
Image-to-image
• Experimental results
G Image D scalar
𝑧
Testing:
input close GAN GAN + close

Image super resolution
• Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, Wenzhe Shi, “Photo-Realistic Single Image Super-Resolution
Using a Generative Adversarial Network”, CVPR, 2016
Video Generation
target
Minimize
distance
Generator
Discriminator thinks it is real
Discrimi Last frame is real

nator or generated
https://github.com/dyelax/Adversarial_Video_Generation
Speech Enhancement
• Typical deep learning approach

Noisy Clean
Using CNN
G Output
training data
Speech Enhancement
noisy clean
• Conditional GAN
noisy output clean
output
D scalar
(fake pair or not)
noisy
Speech Enhancement
Enhanced Speech
Noisy Speech
感謝廖峴峰同學提供實驗結果
(和中研院曹昱博士共同指導)
Which Enhanced Speech is better?

More about Speech Processing
• Speech synthesis
• Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru
Hiramatsu, Kunio Kashino, “Generative Adversarial Network-based
Postfiltering for Statistical Parametric Speech Synthesis”, ICASSP 2017
• Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Training
algorithm to deceive anti-spoofing verification for DNN-based speech
synthesis, ”, ICASSP 2017
• Voice Conversion
• Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang,
Voice Conversion from Unaligned Corpora using Variational Autoencoding
Wasserstein Generative Adversarial Networks, Interspeech 2017
• Speech Enhancement
• Santiago Pascual, Antonio Bonafonte, Joan Serrà, SEGAN: Speech
Enhancement Generative Adversarial Network, Interspeech 2017
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
paired data
Cycle GAN, Disco GAN

Transform an object from one domain to another without paired data
Domain X Domain Y
photo van Gogh

Domain X Domain Y
Cycle GAN
https://junyanz.github.io/CycleGAN/
Become similar
Domain X to domain Y
𝐺𝑋→𝑌 Not what we want
ignore input
𝐷𝑌 scalar
Input image
belongs to
domain Y or not
Domain Y
Domain X Domain Y
Cycle GAN
𝐺𝑋→𝑌 𝐺Y→X
Lack of information
for reconstruction
𝐷𝑌 scalar
Input image
belongs to
domain Y or not
Domain Y
c.f. Dual Learning Domain X Domain Y
Cycle GAN
𝐺𝑋→𝑌 𝐺Y→X
scalar: belongs to 𝐷𝑌 scalar: belongs to

domain X or not 𝐷𝑋 domain Y or not
𝐺Y→X 𝐺𝑋→𝑌
動畫化的世界
input output domain
• Using the code:
https://github.com/Hi-
king/kawaii_creator
• It is not cycle GAN,
Disco GAN
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
https://www.youtube.com/watch?v=9c4z6YsBGQ0
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman and Alexei A. Efros. "Generative
Visual Manipulation on the Natural Image Manifold", ECCV, 2016.
Andrew Brock, Theodore Lim, J.M. Ritchie, Nick
Weston, Neural Photo Editing with Introspective
Adversarial Networks, arXiv preprint, 2017
Basic Idea
Fulfill the
space of constraint
code z
Why move on the code space?

Back to z 𝑧∗
𝑥𝑇
• Method 1
𝑧 ∗ = 𝑎𝑟𝑔 min 𝐿 𝐺 𝑧 , 𝑥 𝑇 Difference between G(z) and xT
𝑧
Gradient Descent ➢ Pixel-wise
➢ By another network
• Method 2 as close as possible
x z Generator
Encoder x
• Method 3 (Decoder)
Using the results from method 2 as the initialization of method 1

Editing Photos
• z0 is the code of the input image Using discriminator
to check the image is
image realistic or not
𝑧 ∗ = 𝑎𝑟𝑔 min 𝑈 𝐺 𝑧 + 𝜆1 𝑧 − 𝑧0 2 − 𝜆2 𝐷 𝐺 𝑧
𝑧
Not too far away from

the original image
Does it fulfill the constraint of editing?

• Paired data
• Unpaired data
• Unsupervised
Feature extraction
Domain Independent Features
• Training and testing data are in different domains
Generator feature
Training
data: The same
Testing distribution
data:
Generator feature
feature extractor
Too easy to feature
extractor ……
Domain classifier
Domain-adversarial training
Maximize label classification accuracy + Maximize label
minimize domain classification accuracy classification accuracy
feature extractor Label predictor
Domain classifier
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
Maximize domain
classification accuracy
This is a big network, but different parts have different goals.
Domain classifier fails in the end

It should struggle ……
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Train:
Test:
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,

ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
Anders Boesen, Lindbo Larsen, Søren Kaae
Sønderby, Hugo Larochelle, Ole Winther, “Autoencoding
VAE-GAN beyond pixels using a learned similarity metric”, ICML. 2016
➢ Minimize ➢ Minimize ➢ Discriminate real,

reconstruction error reconstruction error generated and
➢ z close to normal ➢ Cheat discriminator reconstructed images
x z Generator Discrimi
Encoder x scalar
(Decoder) nator
VAE Discriminator provides the similarity measure
GAN
Jeff Donahue, Philipp Krähenbühl, Trevor Darrell,
“Adversarial Feature Learning”, ICLR, 2017
BiGAN Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier

Mastropietro, Alex Lamb, Martin Arjovsky, Aaron
Courville, “Adversarially Learned Inference”, ICLR, 2017
(from prior
distribution) from encoder
code z code z or decoder?
Encoder Decoder Discriminator
Image x Image x Image x code z

(real) (generated)
Lecture II
Sequence Generation

Sentence Generation
• Training
Reference: A B B
A A A
B B B
<BOS> A
B
Sentence Generation
• Generation (Testing)
B A A
A A A
B B B
<BOS> A
B
Sentence Generation
Generator sentence x
sentence x Discriminator Real or fake
Original GAN
Sentence Generation
• Initialize generator G and discriminator D
• In each iteration:
• Sample real sentences 𝑥 from database

• Generate sentences 𝑥෤ by G
• Update D to increase 𝐷 𝑥 and decrease 𝐷 𝑥෤
• Update Gen such that

Discrimi
Generator scalar
nator
update
Sentence Generation scalar
Discriminator
Can we do
backpropogation?
NO! B A A
Tuning generator a A A A
little bit will not B
B B
change the output.
In the paper of
improved WGAN …
(ignoring <BOS> A
sampling process) B
Sentence Generation - SeqGAN
Using Policy Gradient
Discrimi
Generator scalar
nator
update
• Using Reinforcement learning

• Consider the discriminator as reward function
• Consider the output of discriminator as total reward
• Update generator to increase discriminator = to get
maximum total reward
Ref: Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu, “SeqGAN: Sequence
Generative Adversarial Nets with Policy Gradient”, AAAI, 2017
Sequence-to-
Conditional Generation sequence learning
• Represent the input condition as a vector, and consider the

vector as the input of RNN generator
• E.g. Machine translation / Chat-bot
. (period)
I’m fine
Information of the
whole sentences
How are you ?
Encoder Jointly train Decoder

Jiwei Li, Will Monroe, Tianlin
Shi, Sébastien Jean, Alan Ritter, Dan
Jurafsky, “Adversarial Learning for
Chat-bot with GAN Neural Dialogue Generation”, arXiv
preprint, 2017
Input
sentence/history h En De response sentence x
Chatbot
Input
sentence/history h
Discriminator Real or fake
response sentence x
Conditional GAN human

dialogues
感謝段逸林同學提供實驗結果
Example Results
感謝王耀賢同學提供實驗結果
Cycle GAN
✘ Negative sentence to positive sentence:
it's a crappy day -> it's a great day
i wish you could be hereExamples
-> you could be here
it's not a good idea -> it's good idea
i miss you -> i love you
i don't love you -> i love you
i can't do that -> i can do that
i feel so sad -> i happy
it's a bad day -> it's a good day
it's a crappy day -> it's a great day
sorry for doing such a horrible thing -> thanks for doing a
great thing
my doggy is sick -> my doggy is my doggy
i am so hungry -> i am so
my little doggy is sick -> my little doggy is my little doggy
Lecture II
Sequence Generation

Theory behind GAN
• The data we want to generate has a distribution
𝑃𝑑𝑎𝑡𝑎 𝑥
High
Probability
Low
Image Probability
Space
Theory behind GAN
• A generator G is a network. The network defines a
probability distribution.
Normal 𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥
Distribution
generator
𝑧 G 𝑥=𝐺 𝑧
As close as
It is difficult to compute 𝑃𝐺 𝑥 possible
We do not know what the distribution
looks like.
https://blog.openai.com/generative-models/
When the discriminator is well trained, its
loss represents a specific divergence
measure between PG and Pdata.
Discri-
minator
Update discriminator is to minimize
the divergence measure
Normal 𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥
Distribution
generator
𝑧 G 𝑥=𝐺 𝑧
As close as
Binary classifier evaluates the
possible
JS divergence
You can design the discriminator to evaluate other divergence.
https://blog.openai.com/generative-models/
http://www.guokr.com/post/773890/
Why GAN is hard to train?

Better
Why GAN is hard to train?
𝑃𝐺0 𝑥 𝐽𝑆 𝑃𝐺1 ||𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2

……
? 𝑃𝐺50 𝑥 𝐽𝑆 𝑃𝐺2 ||𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2
Not really better ……
……
𝑃𝐺100 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥 𝐽𝑆 𝑃𝐺2 ||𝑃𝑑𝑎𝑡𝑎 = 0
Evaluating JS divergence https://arxiv.org/a
bs/1701.07875
• JS divergence estimated by discriminator telling

little information
Weak Generator Strong Generator

Earth Mover’s Distance
• Considering one distribution P as a pile of earth,
and another distribution Q as the target
• The average distance the earth mover has to move
the earth.
𝑃 𝑄
𝑊 𝑃, 𝑄 = 𝑑
Why Earth Mover’s Distance?
𝐷𝑓 𝑃𝑑𝑎𝑡𝑎 ||𝑃𝐺
𝑊 𝑃𝑑𝑎𝑡𝑎 , 𝑃𝐺
𝑑0 𝑑50
𝑃𝐺0 𝑃𝑑𝑎𝑡𝑎 …… 𝑃𝐺50 𝑃𝑑𝑎𝑡𝑎 …… 𝑃𝐺100 𝑃𝑑𝑎𝑡𝑎
𝐽𝑆 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 𝐽𝑆 𝑃𝐺50 , 𝑃𝑑𝑎𝑡𝑎 𝐽𝑆 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎

= 𝑙𝑜𝑔2 = 𝑙𝑜𝑔2 =0
𝑊 𝑃𝐺0 , 𝑃𝑑𝑎𝑡𝑎 𝑊 𝑃𝐺50 , 𝑃𝑑𝑎𝑡𝑎 𝑊 𝑃𝐺100 , 𝑃𝑑𝑎𝑡𝑎

= 𝑑0 = 𝑑50 =0
Vertical
To Learn more ……
• GAN Zoo
• https://github.com/hindupuravinash/the-gan-zoo
• Tricks: https://github.com/soumith/ganhacks
• Tutorial of Ian Goodfellow:
Lecture III:
Decision Making
and Control
• Machine observe some inputs, takes an action, and
finally achieve the target.
• E.g. Go playing
Target: win the game

observation action
State: summarization of observation
• Machine plays video games
• Widely studies:
• Gym: https://gym.openai.com/
• Universe: https://openai.com/blog/universe/
Machine learns to play video

games as human players
➢ What machine observes are pixels
➢ Machine learns to take
proper action itself
• E.g. self-driving car
Object 紅燈
Recognition
observation
Action: Decision
a giant network? Making
Stop!
• E.g. dialogue system
a giant network?
我想訂11月5日 Slot 抵達時間：11月5日

到台北的機票 Filling 目的地：台北
Decision
Making
Natural
請問您要從
Language 問出發地
哪裡出發呢？
Generation
How to solve this problem?
• Network as a function, learn as typical supervised
tasks
Target: Target: 請問您要
Target: 3-3
Stop! 從哪裡出發呢？
Network Network Network
我想訂11月5日
到台北的機票
Machine do not know some
Behavior Cloning behavior must copy, but
some can be ignored.
https://www.youtube.com/watch?v=j2FSB3bseek
Properties of
Machine does not know the
What do we miss?
influence of each action.
• Agent’s actions affect the subsequent data it
receives
• Reward delay
• In space invader, only “fire” obtains reward
• Although the moving before “fire” is
important
• In Go playing, sacrificing immediate reward to
gain more long-term reward
Better Way ……
A sequence
of decision
Next move
Network (19 x 19
positions)
Classification
Two Learning Scenarios
• Scenario 1: Reinforcement Learning
• Machine interacts with the environment.
• Machine obtains the reward from the environment, so it
knows its performance is good or bad.
• Scenario 2: Learning by demonstration
• Also known as imitation learning, apprenticeship
learning
• An expert demonstrates how to solve the task, and
machine learns from the demonstration.
Lecture III
Reinforcement Learning
Inverse Reinforcement
Learning
Alpha Go: policy-based + value-based
RL + model-based
Model-free
Approach
Policy-based Value-based
Learning an Actor Actor + Critic Learning a Critic
Model-based Approach
Basic Components
You cannot control
Reward
Actor Env
Function
Video 殺一隻怪
Game 得 20 分 …
Go 圍棋規則
Scenario
Actor/Policy
Observation Action
Action =
Function Function
π( Observation ) output
input
Used to pick the

Reward
best function
Environment
Learning to play Go
Observation Action
Reward
Next Move
Environment
Agent learns to take
Learning to play Go actions maximizing
expected reward.
Observation Action
Reward
reward = 0 in most cases
If win, reward = 1
If loss, reward = -1
Environment
Actor, Environment, Reward
𝑠1 𝑎1 𝑠2 𝑎2
Env Actor Env Actor Env ……
𝑠1 𝑎1 𝑠2 𝑎2 𝑠3
Reward Reward
𝑟1 𝑟2
Total reward:
𝑇
Trajectory
𝑅 𝜏 = ෍ 𝑟𝑡
𝜏 = 𝑠1 , 𝑎1 , 𝑠2 , 𝑎2 , ⋯ , 𝑠𝑇 , 𝑎 𝑇
𝑡=1
Example: Playing Video Game
Start with
observation 𝑠1 Observation 𝑠2 Observation 𝑠3
Obtain reward Obtain reward

𝑟1 = 0 𝑟2 = 5
Action 𝑎1 : “right” Action 𝑎2 : “fire”

(kill an alien)
Usually there is some randomness in the environment
Example: Playing Video Game
Start with
observation 𝑠1 Observation 𝑠2 Observation 𝑠3
This is an episode.
After many turns Total reward:
Game Over 𝑇
(spaceship destroyed)
𝑅 = ෍ 𝑟𝑡
𝑡=1
Obtain reward 𝑟𝑇
We want the total
Action 𝑎 𝑇 reward be maximized.
Neural network as Actor
• Input of neural network: the observation of machine
represented as a vector or a matrix
• Output neural network : each action corresponds to a
neuron in output layer
Take the action
NN as actor based on the
left 0.7
probability.
… … right 0.2 Score of an

action
fire 0.1
pixels
What is the benefit of using network instead of lookup table?
generalization
Actor, Environment, Reward
Network Network
𝑠1 𝑎1 𝑠2 𝑎2
𝑠1 𝑎1 𝑠2 𝑎2 𝑠3
Reward argmax Reward

They are not
networks.
𝑟1 𝑟2
如果發現不能微分就用 𝑇
policy gradient 硬 train 一發 𝑅 𝜏 = ෍ 𝑟𝑡

𝑡=1
Ref: https://www.youtube.com/watch?v=W8XF3ME8G2I
Warning of Math
Policy Gradient
Three Steps for Deep Learning
Step 1: Step 2: Step 3: pick

define
Neural a set
Network goodness of the best
ofasfunction
Actor function function
Deep Learning is so simple ……


define
Neural a set
ofasfunction

Goodness of Actor
• Given an actor 𝜋𝜃 𝑠 with network parameter 𝜃
• Use the actor 𝜋𝜃 𝑠 to play the video game
• Start with observation 𝑠1
• Machine decides to take 𝑎1
Total reward: 𝑅𝜃 = σ𝑇𝑡=1 𝑟𝑡
• Machine obtains reward 𝑟1 Even with the same actor,
• Machine sees observation 𝑠2 𝑅𝜃 is different each time
• Machine decides to take 𝑎2
• Machine obtains reward 𝑟2 Randomness in the actor
• Machine sees observation 𝑠3 and the game
• ……
• Machine decides to take 𝑎 𝑇 We define 𝑅ത𝜃 as the
• Machine obtains reward 𝑟𝑇 END expected value of R θ
𝑅ത𝜃 evaluates the goodness of an actor 𝜋𝜃 𝑠

We define 𝑅ത𝜃 as the
Goodness of Actor expected value of R θ
• 𝜏 = 𝑠1 , 𝑎1 , 𝑟1 , 𝑠2 , 𝑎2 , 𝑟2 , ⋯ , 𝑠𝑇 , 𝑎 𝑇 , 𝑟𝑇
𝑃 𝜏|𝜃 =
𝑝 𝑠1 𝑝 𝑎1 |𝑠1 , 𝜃 𝑝 𝑟1 , 𝑠2 |𝑠1 , 𝑎1 𝑝 𝑎2 |𝑠2 , 𝜃 𝑝 𝑟2 , 𝑠3 |𝑠2 , 𝑎2 ⋯
𝑇
𝑝 𝑎𝑡 = "𝑓𝑖𝑟𝑒"|𝑠𝑡 , 𝜃
= 𝑝 𝑠1 ෑ 𝑝 𝑎𝑡 |𝑠𝑡 , 𝜃 𝑝 𝑟𝑡 , 𝑠𝑡+1 |𝑠𝑡 , 𝑎𝑡 = 0.7
𝑡=1
left
0.1
Actor right
not related Control by 𝑠𝑡 0.2
𝜋𝜃
to your actor your actor 𝜋𝜃 fire
0.7
Goodness of Actor
• An episode is considered as a trajectory 𝜏
• 𝜏 = 𝑠1 , 𝑎1 , 𝑟1 , 𝑠2 , 𝑎2 , 𝑟2 , ⋯ , 𝑠𝑇 , 𝑎 𝑇 , 𝑟𝑇
• 𝑅 𝜏 = σ𝑇𝑡=1 𝑟𝑡
• If you use an actor to play the game, each 𝜏 has a
probability to be sampled
• The probability depends on actor parameter 𝜃:
𝑃 𝜏|𝜃
𝑁
1 Use 𝜋𝜃 to play the
𝑅ത𝜃 = ෍ 𝑅 𝜏 𝑃 𝜏|𝜃 ≈ ෍ 𝑅 𝜏 𝑛 game N times,
𝑁
𝜏 𝑛=1 obtain 𝜏 1 , 𝜏 2 , ⋯ , 𝜏 𝑁
Sum over all Sampling 𝜏 from 𝑃 𝜏|𝜃
possible trajectory N times

define
Neural a set
ofasfunction

Gradient Ascent
• Problem statement
𝜃 ∗ = 𝑎𝑟𝑔 max 𝑅ത𝜃
𝜃
• Gradient ascent 𝜃 = 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , ⋯
• Start with 𝜃 0
• 𝜃1 ← 𝜃 0 + 𝜂𝛻𝑅ത𝜃0 𝜕 𝑅ത𝜃 Τ𝜕𝑤1
• 𝜃 2 ← 𝜃1 + 𝜂𝛻𝑅ത𝜃1 𝜕𝑅ത𝜃 Τ𝜕𝑤2
𝛻 𝑅ത𝜃 = ⋮
• ……
𝜕𝑅ത𝜃 Τ𝜕𝑏1
⋮
Policy Gradient 𝑅ത𝜃 = ෍ 𝑅 𝜏 𝑃 𝜏|𝜃 𝛻 𝑅ത𝜃 =?
𝜏
𝛻𝑃 𝜏|𝜃
𝛻 𝑅ത𝜃 = ෍ 𝑅 𝜏 𝛻𝑃 𝜏|𝜃 = ෍ 𝑅 𝜏 𝑃 𝜏|𝜃
𝑃 𝜏|𝜃
𝜏 𝜏
𝑅 𝜏 do not have to be differentiable

It can even be a black box.
= ෍ 𝑅 𝜏 𝑃 𝜏|𝜃 𝛻𝑙𝑜𝑔𝑃 𝜏|𝜃 𝑑𝑙𝑜𝑔 𝑓 𝑥 1 𝑑𝑓 𝑥

=
𝜏 𝑑𝑥 𝑓 𝑥 𝑑𝑥
𝑁
1 Use 𝜋𝜃 to play the game N times,
≈ ෍ 𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑃 𝜏 𝑛 |𝜃
𝑁 Obtain 𝜏 1 , 𝜏 2 , ⋯ , 𝜏 𝑁
𝑛=1
Policy Gradient 𝛻𝑙𝑜𝑔𝑃 𝜏|𝜃 =?
• 𝜏 = 𝑠1 , 𝑎1 , 𝑟1 , 𝑠2 , 𝑎2 , 𝑟2 , ⋯ , 𝑠𝑇 , 𝑎 𝑇 , 𝑟𝑇
𝑇
𝑃 𝜏|𝜃 = 𝑝 𝑠1 ෑ 𝑝 𝑎𝑡 |𝑠𝑡 , 𝜃 𝑝 𝑟𝑡 , 𝑠𝑡+1 |𝑠𝑡 , 𝑎𝑡

𝑡=1
𝑙𝑜𝑔𝑃 𝜏|𝜃
𝑇
= 𝑙𝑜𝑔𝑝 𝑠1 + ෍ 𝑙𝑜𝑔𝑝 𝑎𝑡 |𝑠𝑡 , 𝜃 + 𝑙𝑜𝑔𝑝 𝑟𝑡 , 𝑠𝑡+1 |𝑠𝑡 , 𝑎𝑡

𝑡=1
𝑇
Ignore the terms
𝛻𝑙𝑜𝑔𝑃 𝜏|𝜃 = ෍ 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡 |𝑠𝑡 , 𝜃
not related to 𝜃
𝑡=1
𝛻𝑙𝑜𝑔𝑃 𝜏|𝜃
𝑇
Policy Gradient = ෍ 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡 |𝑠𝑡 , 𝜃
𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 𝑅ത𝜃𝑜𝑙𝑑 𝑡=1
𝑁 𝑁 𝑇𝑛
1 1
𝛻 𝑅ത𝜃 ≈ ෍ 𝑅 𝜏 𝛻𝑙𝑜𝑔𝑃 𝜏 |𝜃 = ෍ 𝑅 𝜏 𝑛 ෍ 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛 , 𝜃
𝑛 𝑛
𝑁 𝑁
𝑛=1 𝑛=1 𝑡=1
𝑁 𝑇𝑛
1 What if we replace
= ෍ ෍ 𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛 , 𝜃
𝑁 𝑅 𝜏 𝑛 with 𝑟𝑡𝑛 ……
𝑛=1 𝑡=1
If in 𝜏 𝑛 machine takes 𝑎𝑡𝑛 when seeing 𝑠𝑡𝑛 in

𝑅 𝜏 𝑛 is positive Tuning 𝜃 to increase 𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛
𝑅 𝜏 𝑛 is negative Tuning 𝜃 to decrease 𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛
It is very important to consider the cumulative reward 𝑅 𝜏 𝑛 of
the whole trajectory 𝜏 𝑛 instead of immediate reward 𝑟𝑡𝑛
Policy Gradient
Update
Model
Given actor parameter 𝜃
𝜏1 : 𝑠11 , 𝑎11 𝑅 𝜏1 𝜃 ← 𝜃 + 𝜂𝛻𝑅ത𝜃
𝑠21 , 𝑎21 𝑅 𝜏1
𝛻 𝑅ത𝜃 =
……
……
𝑁 𝑇𝑛
1
𝜏2: 𝑠12 , 𝑎12 𝑅 𝜏2 ෍ ෍ 𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛 , 𝜃
𝑁
𝑠22 , 𝑎22 𝑅 𝜏2 𝑛=1 𝑡=1
……
……
Data
Collection
Policy Gradient
3
Considered as Minimize: − ෍ 𝑦ො𝑖 𝑙𝑜𝑔𝑦𝑖
Classification Problem 𝑖=1
𝑦𝑖 𝑦ො𝑖
left 1
… right 0
s … fire 0
Maximize: 𝑙𝑜𝑔𝑦𝑖 =
𝑙𝑜𝑔𝑃 "𝑙𝑒𝑓𝑡"|𝑠
𝜃 ← 𝜃 + 𝜂𝛻𝑙𝑜𝑔𝑃 "𝑙𝑒𝑓𝑡"|𝑠
𝜃 ← 𝜃 + 𝜂𝛻𝑅ത𝜃
Policy Gradient
𝛻 𝑅ത𝜃 =
𝑁 𝑇𝑛
1
෍ ෍ 𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛 , 𝜃
Given actor parameter 𝜃 𝑁
𝑛=1 𝑡=1
𝜏1 : 𝑠11 , 𝑎11 𝑅 𝜏1
𝑎11 = 𝑙𝑒𝑓𝑡
𝑠21 , 𝑎21 𝑅 𝜏1
left 1
……
……
𝑠11 NN right 0
𝜏2: 𝑠12 , 𝑎12 𝑅 𝜏2 fire 0
𝑠22 , 𝑎22 𝑅 𝜏2 𝑎12 = 𝑓𝑖𝑟𝑒
…
……
……
left 0
𝑠12 NN right 0
fire 1
𝜃 ← 𝜃 + 𝜂𝛻𝑅ത𝜃
Policy Gradient
𝛻 𝑅ത𝜃 =
𝑁 𝑇𝑛
1
෍ ෍ 𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑝 𝑎𝑡𝑛 |𝑠𝑡𝑛 , 𝜃
Given actor parameter 𝜃 𝑁
𝑛=1 𝑡=1
𝜏1 : 𝑠11 , 𝑎11 𝑅 𝜏1 2
Each training data is
𝑠21 , 𝑎21 𝑅 𝜏1 2 weighted by 𝑅 𝜏 𝑛
……
……
𝑠11 NN 𝑎11 = 𝑙𝑒𝑓𝑡

𝜏2: 𝑠12 , 𝑎12 𝑅 𝜏2 1
𝑠22 , 𝑎22 𝑅 𝜏2 1 𝑠11 NN 𝑎11 = 𝑙𝑒𝑓𝑡
……
……
…
𝑠12 NN 𝑎12 = 𝑓𝑖𝑟𝑒
End of Warning
Critic
• A critic does not determine the action.
• Given an actor π, it evaluates the how good the
actor is
An actor can be
found from a critic.
e.g. Q-learning
http://combiboilersleeds.com/picaso/critics/critics-4.html
Critic
• State value function 𝑉 𝜋 𝑠
• When using actor 𝜋, the cumulated reward
expects to be obtained after seeing observation
(state) s
𝑉𝜋 𝑠
s 𝑉𝜋
scalar
𝑉 𝜋 𝑠 is large 𝑉 𝜋 𝑠 is smaller
Critic
𝑉 以前的阿光大馬步飛 = bad
𝑉 變強的阿光大馬步飛 = good
How to estimate 𝑉 𝜋 𝑠
• Monte-Carlo based approach
• The critic watches 𝜋 playing the game
After seeing 𝑠𝑎 ,
Until the end of the episode, 𝑠𝑎 𝑉𝜋 𝑉 𝜋 𝑠𝑎 𝐺𝑎
the cumulated reward is 𝐺𝑎
After seeing 𝑠𝑏 ,
Until the end of the episode,
𝑠𝑏 𝑉𝜋 𝑉 𝜋 𝑠𝑏 𝐺𝑏
the cumulated reward is 𝐺𝑏
How to estimate 𝑉 𝜋 𝑠
• Temporal-difference approach
⋯ 𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝑠𝑡+1 ⋯
𝑉 𝜋 𝑠𝑡 = 𝑟𝑡 + 𝑉 𝜋 𝑠𝑡+1
𝑠𝑡 𝑉𝜋 𝑉 𝜋 𝑠𝑡
- 𝑉 𝜋 𝑠𝑡 − 𝑉 𝜋 𝑠𝑡+1 𝑟𝑡
𝑠𝑡+1 𝑉𝜋 𝑉 𝜋 𝑠𝑡+1
Some applications have very long episodes, so that

delaying all learning until an episode's end is too slow.
MC v.s. TD …
𝜋 𝜋 Larger variance
𝑠𝑎 𝑉 𝑉 𝑠𝑎 𝐺𝑎
unbiased
𝑠𝑡 𝑉𝜋 𝑉 𝜋 𝑠𝑡 𝑟 + 𝑉 𝜋 𝑠𝑡+1 𝑉𝜋 𝑠𝑡+1
Smaller variance
May be biased
[Sutton, v2,
MC v.s. TD Example 6.4]
• The critic has the following 8 episodes

• 𝑠𝑎 , 𝑟 = 0, 𝑠𝑏 , 𝑟 = 0, END
• 𝑠𝑏 , 𝑟 = 1, END
𝑉 𝜋 𝑠𝑏 = 3/4
• 𝑠𝑏 , 𝑟 = 1, END
• 𝑠𝑏 , 𝑟 = 1, END 𝑉 𝜋 𝑠𝑎 =? 0? 3/4?
• 𝑠𝑏 , 𝑟 = 1, END
• 𝑠𝑏 , 𝑟 = 1, END Monte-Carlo: 𝑉 𝜋 𝑠𝑎 = 0
• 𝑠𝑏 , 𝑟 = 1, END
• 𝑠𝑏 , 𝑟 = 0, END Temporal-difference:
𝑉 𝜋 𝑠𝑏 + 𝑟 = 𝑉 𝜋 𝑠𝑎
3/4 0 3/4
(The actions are ignored here.)
Actor = Generator ?
Actor-Critic Critic = Discriminator ?
𝜋 interacts with
the environment
𝜋 = 𝜋′ TD or MC
Update actor from

𝜋 → 𝜋’ based on Learning Critic
Critic
Playing On-line Game
劉廷緯、溫明浩
https://www.youtube.com/watch?v=8iRD1w73fDo&feature=youtu.be
Lecture III
Reinforcement Learning
Inverse Reinforcement
Learning
Imitation Learning
𝑠1 𝑎1 𝑠2 𝑎2
𝑠1 𝑎1 𝑠2 𝑎2 𝑠3
reward function is not available
We have demonstration of the expert.

Self driving: record
human drivers
Each 𝜏Ƹ is a trajectory
Robot: grab the 𝜏Ƹ1 , 𝜏Ƹ 2 , ⋯ , 𝜏Ƹ 𝑁 of the export.
arm of robot
Motivation
• It is hard to define reward in some tasks.
• Hand-crafted rewards can lead to uncontrolled
behavior.
機器人三大法則：
一、機器人不得傷害人類，或坐視人類受到
傷害而袖手旁觀。
二、除非違背第一法則，機器人必須服從人
類的命令。
三、在不違背第一法則及第二法則的情況下，
機器人必須保護自己。
神邏輯
因此為了保護人類整體，控制人類的自由是
必須的
Inverse Reinforcement Learning
demonstration
of the expert
Environment
𝜏Ƹ1 , 𝜏Ƹ 2 , ⋯ , 𝜏Ƹ 𝑁
Reward Inverse Reinforcement

Reinforcement Optimal
Function
Expert
Actor
Learning
➢ Using the reward function to find the optimal actor.

➢ Modeling reward can be easier. Simple reward
function can lead to complex policy.
Inverse Reinforcement Learning
• Principle: The teacher is always the best.
• Basic idea:
• Initialize an actor
• The actor interacts with the environments to obtain
some trajectories
• Define a reward function, which makes the
trajectories of the teacher better than the actor
• The actor learns to maximize the reward based on
the new reward function.
• Output the reward function and the actor learned from
the reward function
Framework of IRL 𝑁 𝑁
෍ 𝑅 𝜏Ƹ 𝑛 > ෍ 𝑅 𝜏
𝑛=1 𝑛=1
Expert 𝜋ො 𝜏Ƹ1 , 𝜏Ƹ 2 , ⋯ , 𝜏Ƹ 𝑁 Obtain

Reward Function R
Reward
𝜏1 , 𝜏2 , ⋯ , 𝜏𝑁 Function R
Actor
= Generator
Find an actor based
Reward function Actor 𝜋 on reward function R
= Discriminator
By Reinforcement learning
GAN High score for real,
low score for generated
Find a G whose output

G
obtains large score from D
IRL
Larger reward for 𝜏Ƹ 𝑛 ,
Expert 𝜏Ƹ1 , 𝜏Ƹ 2 , ⋯ , 𝜏Ƹ 𝑁 Lower reward for 𝜏
𝜏1 , 𝜏2 , ⋯ , 𝜏𝑁 Reward
Function
Find a Actor obtains

Actor large reward
Teaching Robot
• In the past …… https://www.youtube.com/watch?v=DEGbtjTOIB0
Chelsea Finn, Sergey Levine, Pieter Abbeel, ”
Guided Cost Learning: Deep Inverse Optimal
Robot Control via Policy Optimization”, ICML, 2016
http://rll.berkeley.edu/gcl/
Parking a Car
https://pdfs.semanticscholar.org/27c3/6fd8e6aeadd50ec1e180399df70c46dede00.pdf
http://martin.zinkevich.org/publications
Path Planning /maximummarginplanning.pdf
Third Person Imitation Learning
• Ref: Bradly C. Stadie, Pieter Abbeel, Ilya Sutskever, “Third-Person Imitation
Learning”, arXiv preprint, 2017
First Person Third Person
http://lasa.epfl.ch/research_new/ML/index.php https://kknews.cc/sports/q5kbb8.html
http://sc.chinaz.com/Files/pic/icons/1913/%E6%9C%BA%E5%99%A8%E4%BA%BA%E5%9B
%BE%E6%A0%87%E4%B8%8B%E8%BD%BD34.png
Third Person Imitation Learning
Concluding Remarks
Lecture 1: Introduction of GAN
Lecture 2: Variants of GAN
Lecture 3: Making Decision and Control

To Learn More …
• Machine Learning
• Slides:
http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML16.htm
l
• Video:
https://www.youtube.com/watch?v=fegAeph9UaA&list=
PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49
• Machine Learning and Having it Deep and Structured
• Slides:
http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLDS17.h
tml
• Video:
https://www.youtube.com/watch?v=IzHoNwlCGnE&list=
PLJV_el3uVTsPMxPbjeX7PicgWbY7F8wW9

Introduction of Generative Adversarial Network

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Introduction of Generative Adversarial Network

Caricato da

Copyright:

Formati disponibili

Introduction of Generative

Adversarial Network (GAN)

How to find Given the example inputs/outputs as

Basic Idea of GAN It is a neural network

vector Generator image matrix

Butterflies are Butterflies do

Discri- Discri- Discri-

Q1: Why generator cannot learn by itself?

Q2: Why discriminator don’t generate

Q3: How discriminator and generator

Lecture 1: Introduction of GAN

Lecture 2: Variants of GAN

Lecture 3: Making Decision and Control

When can I use GAN?

X: “How are you?” Y: “I’m fine.”

X : “this white and yellow flower Y:

Action: Action: Action:

GO Playing is the same.

When can I use GAN?

Q1: Why generator cannot learn by itself?

Q2: Why discriminator don’t generate

Q3: How discriminator and generator

Pixel -> tSNE

sentence RNN RNN

1 pixel error 1 pixel error

6 pixel errors 6 pixel errors

The relation between the components are critical.

Q1: Why generator cannot learn by itself?

Q2: Why discriminator don’t generate

Q3: How discriminator and generator

• Discriminator is a function D (network, can deep)

Can we use the discriminator to generate objects?

This CNN filter is

Enumerate all possible x !!!

Discriminator only learns to output “1” (real).

Discriminator training needs some negative examples.

object Discrimi scalar

• Generate negative examples by discriminator D

(Only list some of

Q1: Why generator cannot learn by itself?

Q2: Why discriminator don’t generate

Q3: How discriminator and generator

• Generate negative examples by discriminator D

When can I use GAN?

• Discriminator leads the generator

You cannot train your classifier too good……

as large as possible as small as possible

K=1 for "1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧"

• Original WGAN → Weight Clipping

G: CNN (no normalization), D: CNN (no normalization)

G: CNN (tanh), D: CNN(tanh)

G: CNN (bad structure), D: CNN

G: 101 layer, D: 101 layer

Consider this matrix as an “image”

Autoencoder 0 for best

0 is for gen real gen

A Little Bit of Theory (option)

Given “A young girl

➢The input code determines the generator output.

• Generating images based on text description

• Text to image by traditional supervised learning

c1: a dog is running NN Image 𝑥ො 1 :

x is realistic or not x is realistic or not +

Positive example: Positive example: (train , )