生成式對抗網路 (Generative Adversarial Network, GAN) 顯然是深度學習領域的下一個熱點,Yann LeCun 說這是機器學習領域這十年來最有趣的想法 (the most interesting idea in the last 10 years in ML),又說這是有史以來最酷的東西 (the coolest thing since sliced bread)。生成式對抗網路解決了什麼樣的問題呢?在機器學習領域,回歸 (regression) 和分類 (classification) 這兩項任務的解法人們已經不再陌生,但是如何讓機器更進一步創造出有結構的複雜物件 (例如:圖片、文句) 仍是一大挑戰。用生成式對抗網路,機器已經可以畫出以假亂真的人臉,也可以根據一段敘述文字,自己畫出對應的圖案,甚至還可以畫出二次元人物頭像 (左邊的動畫人物頭像就是機器自己生成的)。本課程希望能帶大家認識生成式對抗網路這個深度學習最前沿的技術。
2. Outline
Lecture 1: Brief Review of Deep Learning
Lecture 2: Introduction of Generative
Adversarial Network
Lecture 3: Conditional Generation
Lecture 4: Making Decision and Control
9. Anime Face Generation
何之源的知乎: https://zhuanlan.zhihu.com/p/24767059
DCGAN: https://github.com/carpedm20/DCGAN-tensorflow
Drawing?
10. Basic Idea of GAN
Generator
It is a neural network
(NN), or a function.
Generator
0.1
−3
⋮
2.4
0.9
imagevector
Generator
3
−3
⋮
2.4
0.9
Generator
0.1
2.1
⋮
2.4
0.9
Generator
0.1
−3
⋮
2.4
3.5
matrix
Powered by: http://mattya.github.io/chainer-DCGAN/
Each dimension of input vector
represents some characteristics.
Longer hair
blue hair Open mouth
11. Discri-
minator
scalar
image
Basic Idea of GAN It is a neural network
(NN), or a function.
Larger value means real,
smaller value means fake.
Discri-
minator
Discri-
minator
Discri-
minator
Discri-
minator
1.0 1.0
0.1 ?
12. Basic Idea of GAN
Brown veins
Butterflies are
not brown
Butterflies do
not have veins
……..
Generator
Discriminator
13. Basic Idea of GAN
NN
Generator
v1
Discri-
minator
v1
Real images:
NN
Generator
v2
Discri-
minator
v2
NN
Generator
v3
Discri-
minator
v3
This is where the term
“adversarial” comes from.
You can explain the process
in different ways…….
15. So many questions ……
Q1: Why generator cannot learn by itself?
Q2: Why discriminator don’t generate
object itself?
Q3: How discriminator and generator
interact?
26. Machine Learning
≈ Looking for a Function
• Binary Classification (是
非題)
• Multi-class
Classification (選擇題)
Function f Function f
Input Input
Yes/No Class 1, Class 2, … Class N
27. Machine Learning
≈ Looking for a Function
• Structured input/output
“大家好,歡迎大
家來修機器學習”
f
Speech Recognition
“girl with red hair
and red eyes”
f
f
Summarization
“近半選民挺脫歐
公投 ……”
(title, summary)
28. Framework
A set of
function 21, ff
1f “cat”
1f “dog”
2f “money”
2f “snake”
Model
f “cat”
Image Recognition:
29. Framework
A set of
function 21, ff
f “cat”
Image Recognition:
Model
Training
Data
Goodness of
function f
Better!
“monkey” “cat” “dog”
function input:
function output:
Supervised Learning
30. Framework
A set of
function 21, ff
f “cat”
Image Recognition:
Model
Training
Data
Goodness of
function f
“monkey” “cat” “dog”
*
f
Pick the “Best” Function
Using
f
“cat”
Training Testing
31. Example Application
Input Output
16 x 16 = 256
1x
2x
256x
……
Ink → 1
No ink → 0
……
y1
y2
y10
Each dimension represents
the confidence of a digit.
is 1
is 2
is 0
……
0.1
0.7
0.2
The image
is “2”
32. Example Application
• Handwriting Digit Recognition
Function “2”
1x
2x
256x
……
……
y1
y2
y10
is 1
is 2
is 0
……
Input:
256-dim vector
output:
10-dim vector
Neural
Network
33. Training Data
• Preparing training data: images and their labels
The learning target is defined on
the training data.
“5” “0” “4” “1”
“3”“1”“2”“9”
34. bwawawaz KKkk 11
Neural Network
z
1w
kw
Kw
…
1a
ka
Ka
b
z
bias
a
weights
Neuron…
……
A simple function
Activation
function
36. Fully Connect Feedforward
Network
Input Output
1x
2x
Layer 1
……
Nx
……
Layer 2
……
Layer L
……
……
……
……
……
y1
y2
yM
Simple
Function 1
is 1
is 2
is 0
……
a1 Simple
Function 2
a2 Simple
Function L
y…
A very complex function
How can we know the weights and
biases of the neurons?
37. 1x
2x
……
Nx
……
……
……
……
Ink → 1
No ink → 0
……
y1
y2
y10
y1 has the maximum value
Find a set of weights and bias such that
Input:
y2 has the maximum valueInput:
is 1
is 2
is 0
Gradient descent is the algorithm finding the weights
and biases of the neurons that can achieve the target.
Gradient descent has lots of problems ……
Learning/
Training:
39. Convolutional Layer……
1
1
2
2
Sparse Connectivity
3
4
5
Each neural only connects to
part of the output of the
previous layer
3
4
Parameter Sharing
The neurons with different
receptive fields can use the
same set of parameters.
Less parameters then
fully connected layer
40. Convolutional Layer……
1
1
2
2
3
4
5
3
4
Considering neuron 1 and 3 as
“filter 1” (kernel 1)
filter (kernel) size: size of the
receptive field of a neuron
Stride = 2
Considering neuron 2 and 4 as
“filter 2” (kernel 2)
Kernel size, no. of filter,
stride are all designed by
the developers.
41. Pooling Layer
Group the neurons corresponding
to the same filter with nearby
receptive fields
……
1
1
2
2
3
4
5
3
4
Convolutional
Layer
Pooling
Layer
1
2
Subsampling
42. Recurrent Neural Network
• Recurrent Structure: usually used when the input is a
sequence
Sentiment Analysis
我 覺 太得 糟 了
超好雷
好雷
普雷
負雷
超負雷
看了這部電影覺
得很高興 …….
這部電影太糟了
…….
這部電影很
棒 …….
Positive (正雷) Negative (負雷) Positive (正雷)
……
Function
43. Recurrent Neural Network
• Recurrent Structure: usually used when the input is a
sequence
fh0
h1
x1
f h2 f h3
g
No matter how long the
input sequence is, we
only need one function f
y
x2 x3
Func
f ht
xt
ht-1
46. Reference: Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. "Reducing the
dimensionality of data with neural networks." Science 313.5786 (2006): 504-507
• NN encoder + NN decoder = a deep network
Deep Auto-encoderInputLayer
Layer
Layer
bottle
OutputLayer
Layer
Layer
Layer
Layer
… …
Code
As close as possible
𝑥 ො𝑥
Encoder Decoder
47. Deep Auto-encoder - Example
Pixel -> tSNE
𝑐
NN
Encoder
PCA 降到
32-dim
Unsupervised Leaning
without annotation
48. Sequence-to-sequence
Auto-encoder
• Machine represents each audio segment also by a
vector
audio segment
vector
(word-level)
Learn from lots of audio
without supervision
[Chung, Wu, Lee, Lee, Interspeech 16)
Used in the following
spoken language
understanding applications
52. Network
Many kinds of
network structures:
Matrix
How to find
the function?
Given the example inputs/outputs as
training data: {(x1,y1),(x2,y2), ……, (x1000,y1000)}
➢ Fully connected feedforward network
➢ Convolutional neural network (CNN)
➢ Recurrent neural network (RNN)
Vector
Vector Seq
𝑥 𝑦
Deep Learning in One Slide (Review)
(speech, video, sentence)
Different networks can take input/output in different domains.
53. Lecture II:
Introduction of GAN
To learn more theory:
https://www.youtube.com/watch?v=0CKeqXl5IY0&lc=z13zuxbgl
pvsgbgpo04cg1bxuoraejdpapo0k
https://www.youtube.com/watch?v=KSN4QYgAtao&lc=z13kz1nq
vuqsipqfn23phthasre4evrdo
54. Outline
When can I use GAN?
Generation by GAN
Improving GAN
Evaluation
A little bit theory
55. Structured Learning
YXf :
Regression: output a scalar
Classification: output a “class”
Structured Learning/Prediction: output a
sequence, a matrix, a graph, a tree ……
(one-hot vector)
1 0 0
Machine learning is to find a function f
Output is composed of components with dependency
0 1 0 0 0 1
Class 1 Class 2 Class 3
57. Output Sequence
(what a user says)
YXf :
“機器學習及其深層與
結構化”
“Machine learning and
having it deep and structured”
:X :Y
感謝大家來上課”
(response of machine)
:X :Y
Machine Translation
Speech Recognition
Chat-bot
(speech)
:X :Y
(transcription)
(sentence of language 1) (sentence of language 2)
“How are you?” “I’m fine.”
58. Output Matrix YXf :
ref: https://arxiv.org/pdf/1605.05396.pdf
“this white and yellow flower
have thin white petals and a
round yellow stamen”
:X :Y
Text to Image
Image to Image
:X :Y
Colorization:
Ref: https://arxiv.org/pdf/1611.07004v1.pdf
59. Decision Making and Control
Action:
“right”
GO Playing is the same.
Action:
“fire”
Action:
“left”
60. Why Structured Learning
Interesting?
• One-shot/Zero-shot Learning:
• In classification, each class has some examples.
• In structured learning,
• If you consider each possible output as a
“class” ……
• Since the output space is huge, most “classes”
do not have any training data.
• Machine has to create new stuff during
testing.
• Need more intelligence
61. Why Structured Learning
Interesting?
• Machine has to learn to planning
• Machine can generate objects component-by-
component, but it should have a big picture in its mind.
• Because the output components have dependency,
they should be considered globally.
Image
Generation
Sentence
Generation
這個婆娘不是人
九天玄女下凡塵
65. So many questions ……
Q1: Why generator cannot learn by itself?
Q2: Why discriminator don’t generate
object itself?
Q3: How discriminator and generator
interact?
66. Generator
Image:
code: 0.1
0.9(where does them
come from?)
NN
Generator
0.1
−0.5
0.2
−0.1
0.3
0.2
NN
Generator
code
vectors
code
code0.1
0.9
image
As close as possible
c.f. NN
Classifier
𝑦1
𝑦2
⋮
As close as possible
1
0
⋮
67. Generator
Encoder in auto-encoder
provides the code ☺
𝑐
NN
Encoder
NN
Generator
code
vectors
code
code
Image:
code:
(where does them
come from?)
0.1
0.9
0.1
−0.5
0.2
−0.1
0.3
0.2
68. Remember Auto-encoder?
As close as possible
NN
Encoder
NN
Decoder
code
NN
Decoder
code
Randomly generate
a vector as code Image ?
= Generator
= Generator
74. Ref: http://www.wired.co.uk/article/google-artificial-intelligence-poetry
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy
Bengio, Generating Sentences from a Continuous Space, arXiv prepring, 2015
VAE - Writing Poetry
RNN
Encoder
sentence RNN
Decoder
sentence
code
Code Space
i went to the store to buy some groceries.
"come with me," she said.
i store to buy some groceries.
i were to buy any groceries.
"talk to me," she said.
"don’t worry about it," she said.
……
75. What do we miss?
G
as close as
possible
TargetGenerated Image
It will be fine if the generator can truly copy the target image.
What if the generator makes some mistakes …….
Some mistakes are serious, while some are fine.
76. What do we miss? Target
1 pixel error 1 pixel error
6 pixel errors 6 pixel errors
我覺得不行 我覺得不行
我覺得
其實 OK
我覺得
其實 OK
77. What do we miss?
我覺得不行 我覺得其實 OK
The relation between the components are critical.
The last layer generates each components independently.
Need deep structure to catch the relation between
components.
Layer L-1
……
Layer L
……
……
……
……
Each neural in output layer
corresponds to a pixel.
ink
empty
旁邊的,我們產
生一樣的顏色
誰管你
79. So many questions ……
Q1: Why generator cannot learn by itself?
Q2: Why discriminator don’t generate
object itself?
Q3: How discriminator and generator
interact?
80. Discriminator
• Discriminator is a function D (network, can deep)
• Input x: an object x (e.g. an image)
• Output D(x): scalar which represents how “good” an
object x is
R:D X
D 1.0 D 0.1
Can we use the discriminator to generate objects?
Yes.
Evaluation function, Potential
Function, Evaluation Function …
81. Discriminator
• It is easier to catch the relation between the
components by top-down evaluation.
我覺得不行 我覺得其實 OK
This CNN filter is
good enough.
82. Discriminator
• Suppose we already have a good discriminator
D(x) …
How to learn the discriminator?
Enumerate all possible x !!!
It is feasible ???
83. Discriminator - Training
• I have some real images
Discriminator training needs some negative examples.
D scalar 1
(real)
D scalar 1
(real)
D scalar 1
(real)
D scalar 1
(real)
Discriminator only learns to output “1” (real).
84. Discriminator - Training
D(x)
real examples
In practice, you cannot decrease all the x
other than real examples.
Discrimi
nator D
object
x
scalar
D(x)
85. Discriminator - Training
• Negative examples are critical.
D scalar 1
(real)
D scalar 0
(fake)
D 0.9 Pretty real
D scalar 1
(real)
D scalar 0
(fake)
How to generate realistic
negative examples?
86. Discriminator - Training
• General Algorithm
• Given a set of positive examples, randomly
generate a set of negative examples.
• In each iteration
• Learn a discriminator D that can discriminate
positive and negative examples.
• Generate negative examples by discriminator D
D
xDx
Xx
maxarg~
v.s.
88. Graphical Model
Bayesian Network
(Directed Graph)
Markov Random Field
(Undirected Graph)
Markov Logic
Network
Boltzmann
Machine
Restricted
Boltzmann Machine
Structured Learning ➢Structured Perceptron
➢Structured SVM
➢Gibbs sampling
➢Hidden information
➢Application: sequence
labelling, summarization
Conditional
Random Field
Segmental CRF
(Only list some of
the approaches)
Energy-based
Model:
http://www.cs.nyu
.edu/~yann/resear
ch/ebm/
89. Generator v.s. Discriminator
• Generator
• Pros:
• Easy to generate even
with deep model
• Cons:
• Imitate the appearance
• Hard to learn the
correlation between
components
• Discriminator
• Pros:
• Considering the big
picture
• Cons:
• Generation is not
always feasible
• Especially when
your model is deep
• How to do negative
sampling?
90. So many questions ……
Q1: Why generator cannot learn by itself?
Q2: Why discriminator don’t generate
object itself?
Q3: How discriminator and generator
interact?
91. Discriminator - Training
• General Algorithm
• Given a set of positive examples, randomly
generate a set of negative examples.
• In each iteration
• Learn a discriminator D that can discriminate
positive and negative examples.
• Generate negative examples by discriminator D
D
xDx
Xx
maxarg~
v.s.
G x~ =
93. • Initialize generator and discriminator
• In each training iteration:
DG
Sample some
real objects:
Generate some
fake objects:
G
Algorithm
D
Update
G D
image
code
code
code
code
0000
1111
code
code
code
code
image
image
image
1
update fix
94. • Initialize generator and discriminator
• In each training iteration:
DG
Learning
D
Sample some
real objects:
Generate some
fake objects:
G
Algorithm
D
Update
Learning
G G D
image
code
code
code
code
1111
code
code
code
code
image
image
image
1
update fix
0000
95. Benefit of GAN
• From Discriminator’s point of view
• Using generator to generate negative samples
• From Generator’s point of view
• Still generate the object component-by-
component
• But it is learned from the discriminator with
global view.
xDx
Xx
maxarg~G x~ =
efficient
97. Outline
When can I use GAN?
Generation by GAN
Improving GAN
Evaluation
A little bit theory
98. Binary Classifier
as Discriminator
Binary
Classifier
Real
Typical binary classifier uses sigmoid function
at the output layer
You cannot train your classifier too good……
They don’t
move.
Binary
Classifier
Fake
1 is the largest, 0 is the smallest
real
generated
99. Binary Classifier
as Discriminator
• Don’t let the discriminator perfectly separate real
and generated data
• Weaken your discriminator?
real
generated
strong
weak
100. Binary Classifier
as Discriminator
• Don’t let the discriminator perfectly separate real
and generated data
• Add noise to input or label?
real
generated
101. Least Square GAN (LSGAN)
• Replace sigmoid with linear (replace classification
with regression)
1 (Real) 0 (Fake)
Binary
Classifier
scalar
Binary
Classifier
scalar
They don’t
move.
0
1
real
generated
102. WGAN
• We want the scores of the real examples as large as
possible, generated examples as small as possible.
Any
problem?
Binary
Classifier
scalar
Binary
Classifier
scalar
as large as possible as small as possible
real
generated
Move towards Max +∞
−∞
103. WGAN
Move towards Max
𝐷 𝑥1 − 𝐷 𝑥2 ≤ 𝐾 𝑥1 − 𝑥2
Lipschitz Function
K=1 for "1 − 𝐿𝑖𝑝𝑠𝑐ℎ𝑖𝑡𝑧"
Output
change
Input
change
Do not change fast
The discriminator should
be a 1-Lipschitz function.
It should be smooth.
How to realize?
104. WGAN
• Original WGAN → Weight Clipping
• Improved WGAN → Gradient Penalty
Force the parameters w between c and -c
After parameter update, if w > c, w = c; if w < -c, w = -c
Do not truly maximize (minimize) the real (generated) examples
It should be
smooth enough.
𝐷 𝑥𝐷 𝑥
Keep the gradient close to 1
Easy to move
105. DCGAN
G: CNN, D: CNN
G: CNN (no normalization), D: CNN (no normalization)
LSGAN Original
WGAN
Improved
WGAN
G: CNN (tanh), D: CNN(tanh)
107. Sentence Generation
• good bye.
g o o d <space> ……
0
1
0
0
…………
d
g
o
<space>
0
0
1
0
0
0
1
0
1
0
0
0
1
0
0
1
Consider this matrix as an “image”
1-of-N
Encoding
I will talk about
RNN later.
108. Sentence Generation
• Real sentence
• Generated
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0.9
0.1
0
0
0
0.1
0.9
0
0
0
0.1
0.1
0.7
0.1
0
0
0
0.1
0.8
0.1
0
0
0
0.1
0.9
Can never
be 1-of-N
A binary classifier
can immediately
find the difference.
WGAN is helpful
No overlap
I will talk about
RNN later.
112. Energy-based GAN (EBGAN)
• Using an autoencoder as discriminator D
scalar
Discriminator
An image
is good.
It can be reconstructed
by autoencoder.
=
0 for best
images
Generator is
the same.
-
0.1
EN DE
Autoencoder
X -1 -0.1
113. EBGAN
realgen gen
Hard to reconstruct, easy to destroy
m
0 is for
the best.
Do not have to
be very negative
Auto-encoder based discriminator
only give limited region large value.
116. Solution?
• Feature matching, Minibatch discrimination
• https://arxiv.org/abs/1606.03498
• Unroll GAN
• https://arxiv.org/abs/1611.02163
• InfoGAN
• Next lecture
• Ensemble GAN
• Next lecture
117. Outline
When can I use GAN?
Generation by GAN
Improving GAN
Evaluation
A little bit theory
Lucas Theis, Aäron van den Oord, Matthias Bethge,
“A note on the evaluation of generative models”,
arXiv preprint, 2015
118. Likelihood
: real data (not observed
during training)
: generated data
𝑥 𝑖
Log Likelihood:
Generator
PG
𝐿 =
1
𝑁
𝑖
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
We cannot compute 𝑃𝐺 𝑥 𝑖 . We can only
sample from 𝑃𝐺.
Prior
Distribution
119. Likelihood
- Kernel Density Estimation
• Estimate the distribution of PG(x) from sampling
Generator
PG
https://stats.stackexchange.com/questions/244012/can-
you-explain-parzen-window-kernel-density-estimation-in-
laymans-terms/244023
Now we have an approximation of 𝑃𝐺, so we can
compute 𝑃𝐺 𝑥 𝑖 for each real data 𝑥 𝑖
Then we can compute the likelihood.
Each sample is the mean of a
Gaussian with the same covariance.
120. Likelihood v.s. Quality
• Low likelihood, high quality?
• High likelihood, low quality?
Considering a model generating good images (small variance)
Generator
PG
Generated data Data for evaluation 𝑥 𝑖
𝑃𝐺 𝑥 𝑖 = 0
G1
𝐿 =
1
𝑁
𝑖
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
G2
X 0.99
X 0.01
= −𝑙𝑜𝑔100 +
1
𝑁
𝑖
𝑙𝑜𝑔𝑃𝐺 𝑥 𝑖
4.6100
121. Evaluate by Other Networks
Well-trained
CNN
𝑥 𝑃 𝑦|𝑥
Lower entropy means
higher visual quality
CNN𝑥1
𝑃 𝑦1|𝑥1
Higher entropy means
higher variety
CNN𝑥2
𝑃 𝑦2|𝑥2
CNN𝑥3
𝑃 𝑦3
|𝑥3
…
1
𝑁
𝑛
𝑃 𝑦 𝑛|𝑥 𝑛
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen,
“Improved Techniques for Training GANs”, arXiv prepring, 2016
123. Outline
When can I use GAN?
Generation by GAN
Improving GAN
Evaluation
A little bit theory Can skip this part
124. Theory behind GAN
• The data we want to generate has a distribution
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥
High
Probability
Low
ProbabilityImage
Space
125. Theory behind GAN
• A generator G is a network. The network defines a
probability distribution.
generator
G𝑧 𝑥 = 𝐺 𝑧
Normal
Distribution
𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥
As close as
possible
https://blog.openai.com/generative-models/
It is difficult to compute 𝑃𝐺 𝑥
We do not know what the distribution
looks like.
126. generator
G𝑧 𝑥 = 𝐺 𝑧
Normal
Distribution
𝑃𝐺 𝑥 𝑃𝑑𝑎𝑡𝑎 𝑥
As close as
possible
https://blog.openai.com/generative-models/
Discri-
minator
When the discriminator is well trained, its
loss represents a specific divergence
measure between PG and Pdata.
Update discriminator is to minimize
the divergence measure
Binary classifier evaluates the
JS divergence
You can design the discriminator to evaluate other divergence.
127. Why GAN is hard to train?
http://www.guokr.com/post/773890/
Better
128. Why GAN is hard to train?
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝐺0
𝑥
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝐺50
𝑥
𝐽𝑆 𝑃𝐺1
||𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2
𝐽𝑆 𝑃𝐺2
||𝑃𝑑𝑎𝑡𝑎 = 𝑙𝑜𝑔2
𝑃𝑑𝑎𝑡𝑎 𝑥
𝑃𝐺100
𝑥
𝐽𝑆 𝑃𝐺2
||𝑃𝑑𝑎𝑡𝑎 = 0
?
…………
Not really better ……
129. Evaluating JS divergence
• JS divergence estimated by discriminator telling
little information
https://arxiv.org/a
bs/1701.07875
Weak Generator Strong Generator
130. Earth Mover’s Distance
• Considering one distribution P as a pile of earth,
and another distribution Q as the target
• The average distance the earth mover has to move
the earth.
𝑃 𝑄
d
𝑊 𝑃, 𝑄 = 𝑑
133. When can I use GAN?
• I consider GAN as a new approach for structured learning.
Generation by GAN
• Generator: generate locally
• Discriminator: evaluate globally
• Generator + Discriminator → GAN
Improving GAN
Evaluation
A Little Bit Theory
Concluding Remarks
140. Conditional Generation
• We don’t want to simply generate some random stuff.
• Generate objects based on conditions:
Given
condition:
Caption Generation
Chat-bot
Given
condition:
“Hello”
“A young girl
is dancing.”
“Hello. Nice
to see you.”
141. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
151. GAN+Autoencoder
• We have a generator (input z, output x)
• However, given x, how can we find z?
• Learn an encoder (input x, output z)
Generator
(Decoder)
Encoder z xx
as close as possible
fixed
Discriminator
init
Autoencoder
different structures?
155. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
156. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
157. Conditional GAN
• Generating images based on text description
NN
Encoder
NN
Generator
sentence code
code image
“a dog is sleeping”
?
158. Conditional GAN
• Generating images based on text description
NN
Encoder
NN
Generator
sentence code
code
“a bird is flying”
c1: a dog is running ො𝑥1:
ො𝑥2:c2: a bird is flying
“a dog is running”
159. Target of
NN output
Conditional GAN
• Text to image by traditional supervised learning
NN Image
Text: “train”
c1: a dog is running ො𝑥1:
ො𝑥2:c2: a bird is flying
A blurry image!
c1: a dog is running ො𝑥1:
as close as
possible
160. Conditional GAN
G
𝑧Prior distribution
x = G(c,z)
c: train
Target of
NN output
Text: “train”
A blurry image!
Approximate the
distribution below
It is a distribution.
Image
161. Conditional GAN
D
(type 2)
scalar
𝑐
𝑥
D
(type 1)
scalar𝑥
Positive example:
Negative example:
G
𝑧Prior distribution
x = G(c,z)
c: train
x is realistic or not
Image
x is realistic or not +
c and x are matched or not
(train , )
(train , ) (cat , )
Positive example:
Negative example:
165. as close as
possible
Image-to-image
• Traditional supervised approach
NN Image
It is blurry because it is the
average of several images.
Testing:
input close
167. Image super resolution
• Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
Totz, Zehan Wang, Wenzhe Shi, “Photo-Realistic Single Image Super-Resolution
Using a Generative Adversarial Network”, CVPR, 2016
173. More about Speech Processing
• Speech synthesis
• Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo, Yusuke Ijima, Kaoru
Hiramatsu, Kunio Kashino, “Generative Adversarial Network-based
Postfiltering for Statistical Parametric Speech Synthesis”, ICASSP 2017
• Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari, "Training
algorithm to deceive anti-spoofing verification for DNN-based speech
synthesis, ”, ICASSP 2017
• Voice Conversion
• Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang,
Voice Conversion from Unaligned Corpora using Variational Autoencoding
Wasserstein Generative Adversarial Networks, Interspeech 2017
• Speech Enhancement
• Santiago Pascual, Antonio Bonafonte, Joan Serrà, SEGAN: Speech
Enhancement Generative Adversarial Network, Interspeech 2017
174. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
175. Cycle GAN, Disco GAN
Transform an object from one domain to another without paired data
photo van Gogh
Domain X Domain Y
paired data
176. Cycle GAN
𝐺 𝑋→𝑌
Domain X Domain Y
𝐷 𝑌
Domain Y
Domain X
scalar
Input image
belongs to
domain Y or not
Become similar
to domain Y
Not what we want
ignore input
https://arxiv.org/abs/1703.10593
https://junyanz.github.io/CycleGAN/
177. Cycle GAN
Domain X Domain Y
𝐺 𝑋→𝑌
𝐷 𝑌
Domain Y
scalar
Input image
belongs to
domain Y or not
𝐺Y→X
as close as possible
Lack of information
for reconstruction
178. Cycle GAN
Domain X Domain Y
𝐺 𝑋→𝑌 𝐺Y→X
as close as possible
𝐺Y→X 𝐺 𝑋→𝑌
as close as possible
𝐷 𝑌𝐷 𝑋
scalar: belongs to
domain Y or not
scalar: belongs to
domain X or not
c.f. Dual Learning
179. 動畫化的世界
• Using the code:
https://github.com/Hi-
king/kawaii_creator
• It is not cycle GAN,
Disco GAN
input output domain
180. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
184. Generator
(Decoder)
Encoder z xx
as close as possible
Back to z
• Method 1
• Method 2
• Method 3
𝑧∗ = 𝑎𝑟𝑔 min
𝑧
𝐿 𝐺 𝑧 , 𝑥 𝑇
𝑥 𝑇
𝑧∗
Difference between G(z) and xT
➢ Pixel-wise
➢ By another network
Using the results from method 2 as the initialization of method 1
Gradient Descent
185. Editing Photos
• z0 is the code of the input image
𝑧∗ = 𝑎𝑟𝑔 min
𝑧
𝑈 𝐺 𝑧 + 𝜆1 𝑧 − 𝑧0
2 − 𝜆2 𝐷 𝐺 𝑧
Does it fulfill the constraint of editing?
Not too far away from
the original image
Using discriminator
to check the image is
realistic or notimage
186. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
187. Domain Independent Features
• Training and testing data are in different domains
Training
data:
Testing
data:
Generator
Generator
The same
distribution
feature
feature
190. Domain-adversarial training
Not only cheat the domain
classifier, but satisfying label
classifier at the same time
This is a big network, but different parts have different goals.
Maximize label
classification accuracy
Maximize domain
classification accuracy
Maximize label classification accuracy +
minimize domain classification accuracy
feature extractor
Domain classifier
Label predictor
191. Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Domain classifier fails in the end
It should struggle ……
192. Domain-adversarial training
Yaroslav Ganin, Victor Lempitsky, Unsupervised Domain Adaptation by Backpropagation,
ICML, 2015
Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand,
Domain-Adversarial Training of Neural Networks, JMLR, 2016
Train:
Test:
193. Conditional Generation
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
194. VAE-GAN
Discrimi
nator
Generator
(Decoder)
Encoder z x scalarx
as close as possible
VAE
GAN
➢ Minimize
reconstruction error
➢ z close to normal
➢ Minimize
reconstruction error
➢ Cheat discriminator
➢ Discriminate real,
generated and
reconstructed images
Discriminator provides the similarity measure
Anders Boesen, Lindbo Larsen, Søren Kaae
Sønderby, Hugo Larochelle, Ole Winther, “Autoencoding
beyond pixels using a learned similarity metric”, ICML. 2016
195. BiGAN
Encoder Decoder Discriminator
Image x
code z
Image x
code z
Image x code z
from encoder
or decoder?
(real) (generated)
(from prior
distribution)
Jeff Donahue, Philipp Krähenbühl, Trevor Darrell,
“Adversarial Feature Learning”, ICLR, 2017
Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier
Mastropietro, Alex Lamb, Martin Arjovsky, Aaron
Courville, “Adversarially Learned Inference”, ICLR, 2017
196. BiGAN
• Initialize encoder En, decoder De, discriminator Dis
• In each iteration:
• Sample M images 𝑥1, 𝑥2, ⋯ , 𝑥 𝑀 from database
• Generate M codes ǁ𝑧1, ǁ𝑧2, ⋯ , ǁ𝑧 𝑀 from encoder
• ǁ𝑧 𝑖 = 𝐸𝑛 𝑥 𝑖
• Sample M codes 𝑧1, 𝑧2, ⋯ , 𝑧 𝑀 from prior P(z)
• Generate M codes 𝑥1, 𝑥2, ⋯ , 𝑥 𝑀 from decoder
• 𝑥 𝑖 = 𝐷𝑒 𝑧 𝑖
• Update Discriminator to increase 𝐷 𝑥 𝑖, ǁ𝑧 𝑖 , decrease
𝐷 𝑥 𝑖, 𝑧 𝑖
• Update En and De to decrease 𝐷 𝑥 𝑖
, ǁ𝑧 𝑖
, increase
𝐷 𝑥 𝑖, 𝑧 𝑖
197. Encoder Decoder Discriminator
Image x
code z
Image x
code z
Image x code z
from encoder
or decoder?
(real) (generated)
(from prior
distribution)
𝑃 𝑥, 𝑧 𝑄 𝑥, 𝑧
Evaluate the
difference
between P and QTo make P and Q the same
Optimal encoder
and decoder:
En(x’) = z’ De(z’) = x’
En(x’’) = z’’De(z’’) = x’’
For all x’
For all z’’
198. BiGAN
En(x’) = z’ De(z’) = x’
En(x’’) = z’’De(z’’) = x’’
For all x’
For all z’’
How about?
Encoder
Decoder
x
z
𝑥
Encoder
Decoder
x
z
ǁ𝑧
En, De
En, De
Optimal encoder
and decoder:
199.
200. Concluding Remarks
Modifying input code
• Making code has influence (InfoGAN)
• Connection code space with attribute
Controlling by input objects
• Paired data
• Unpaired data
• Unsupervised
Feature extraction
• Domain Independent Feature
• Improving Auto-encoder (VAE-GAN, BiGAN)
202. Anime Face Generation
• My Course at NTU: “Machine Learning and having
it Deep and Structured”
• This spring I opened this course the third time
• I want to update the course.
• Including the HW for sequence-to-sequence
learning, reinforcement learning, GAN
• What is the HW for GAN?
“girl with red hair
and red eyes”
machine
203. Anime Face Generation
• https://github.com/mattya/chainer-DCGAN
• https://zhuanlan.zhihu.com/p/24767059
• https://github.com/jayleicn/animeGAN
204. Anime Face Generation
- Girl Friend Factory
• http://qiita.com/Hiroshiba/items/d5749d8896613e6f0b48
206. Released Training Data
• Data download link:
https://drive.google.com/open?id=0BwJmB7alR-
AvMHEtczZZN0EtdzQ
• Anime Dataset:
• training data: 33.4k (image, tags) pair
• Training tags file format
• img_id <comma> tag1 <colon> #_post <tab> tag2
<colon> …
blue eyes
red hair
short hair
tags.csv
96 x 96
207. Data
• The students have to submit their codes.
• The code generates 5 64x64 images based on text
description.
• Input testing text only includes ‘color hair’ and ‘color
eyes’.
• Testing text file format
• testing_text_id <comma> testing_text
• Extra bonus if you can generate images beyond the
above description
sample
testing_text.txt
208. Evaluation
• Lots evaluation methods in the literature
• We will put all the generated images in the grading
platform
• Giving 2 scores for each image
• How the image fits the text?
• How the image looks real?
• Scores should be integer from 1 to 5
• 1 to 5 corressponding to (super bad, bad,
average, good, super good)
• It is double blind.
209. Implementation Details
• Updates between Generator and Discriminator
• 1 : 1 or 2 : 1
• ADAM with lr = 0.0002, momentum = 0.5
• gaussian noise dim = 100
• batch size = 64, epoch = 300
Paper: https://arxiv.org/pdf/1605.05396.pdf
Code: https://github.com/paarthneekhara/text-to-image
210. Results of TA
• input text: black hair blue eyes
• input text: green hair green eyes
• input text: blue hair green eyes
212. Sentence Representation
• Using RNN to encode the input text
• But you already know the testing input …...
One-hot encoding
Why not 15*15
one-hot encoding?
本技巧由江東峻、陳翰浩、鄭嘉文提供
213. Data Augmentation
• Total training data: 33.4K
• Only 14.8K is useful
• Not sufficient for image generation?
• Data Augmentation
• Horizonal Flipping + Rotation
本技巧由江東峻、陳翰浩、鄭嘉文提供
218. Bonus
• hat, twin tails, ribbon, horns, headdress, headphones,
glasses ……
• Whole images
• Deep Cleavage GAN (DCGAN)
• Successful …… but I will not show this ……
感謝 孫凡耕、羅啟心、許晉嘉、郭子生 提供結果
220. Decision Making and Control
• Machine observe some inputs, takes an action, and
finally achieve the target.
• E.g. Go playing
observation action
Target: win the game
State: summarization of observation
221. Decision Making and Control
• Machine plays video games
• Widely studies:
• Gym: https://gym.openai.com/
• Universe: https://openai.com/blog/universe/
Machine learns to play video
games as human players
➢ Machine learns to take
proper action itself
➢ What machine observes are pixels
222. Decision Making and Control
• E.g. self-driving car
Object
Recognition
紅燈
Decision
Making
observation
Stop!
Action:
a giant network?
223. Decision Making and Control
• E.g. dialogue system
Slot
Filling
Decision
Making
我想訂11月5日
到台北的機票
問出發地
a giant network?
抵達時間:11月5日
目的地:台北
Natural
Language
Generation
請問您要從
哪裡出發呢?
224. How to solve this problem?
• Network as a function, learn as typical supervised
tasks
Network
Target: 3-3
Network
Target:
Stop!
我想訂11月5日
到台北的機票
Network
Target: 請問您要
從哪裡出發呢?
226. Properties of
Decision Making and Control
• Agent’s actions affect the subsequent data it
receives
• Reward delay
• In space invader, only “fire” obtains reward
• Although the moving before “fire” is
important
• In Go playing, sacrificing immediate reward to
gain more long-term reward
Machine does not know the
influence of each action.
What do we miss?
227. Better Way ……
Network (19 x 19
positions)
Next move
A sequence
of decision
Classification
228. Two Learning Scenarios
• Scenario 1: Reinforcement Learning
• Machine interacts with the environment.
• Machine obtains the reward from the environment, so it
knows its performance is good or bad.
• Scenario 2: Learning by demonstration
• Also known as imitation learning, apprenticeship
learning
• An expert demonstrates how to solve the task, and
machine learns from the demonstration.
234. Start with
observation 𝑠1 Observation 𝑠2 Observation 𝑠3
Example: Playing Video Game
Action 𝑎1: “right”
Obtain reward
𝑟1 = 0
Action 𝑎2 : “fire”
(kill an alien)
Obtain reward
𝑟2 = 5
Usually there is some randomness in the environment
235. Start with
observation 𝑠1 Observation 𝑠2 Observation 𝑠3
Example: Playing Video Game
After many turns
Action 𝑎 𝑇
Obtain reward 𝑟𝑇
Game Over
(spaceship destroyed)
This is an episode.
We want the total
reward be maximized.
Total reward:
𝑅 =
𝑡=1
𝑇
𝑟𝑡
236. • Input of neural network: the observation of machine
represented as a vector or a matrix
• Output neural network : each action corresponds to a
neuron in output layer
……
NN as actor
pixels
fire
right
left
Score of an
action
0.7
0.2
0.1
Take the action
with the
highest score
Neural network as Actor
What is the benefit of using network instead of lookup table?
generalization
239. Step 1:
define a set
of function
Step 2:
goodness of
function
Step 3: pick
the best
function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
as Actor
240. Step 1:
define a set
of function
Step 2:
goodness of
function
Step 3: pick
the best
function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
as Actor
241. Goodness of Actor
• Given an actor 𝜋 𝜃 𝑠 with network parameter 𝜃
• Use the actor 𝜋 𝜃 𝑠 to play the video game
• Start with observation 𝑠1
• Machine decides to take 𝑎1
• Machine obtains reward 𝑟1
• Machine sees observation 𝑠2
• Machine decides to take 𝑎2
• Machine obtains reward 𝑟2
• Machine sees observation 𝑠3
• ……
• Machine decides to take 𝑎 𝑇
• Machine obtains reward 𝑟𝑇
Total reward: 𝑅 𝜃 = σ 𝑡=1
𝑇
𝑟𝑡
Even with the same actor,
𝑅 𝜃 is different each time
We define ത𝑅 𝜃 as the
expected value of Rθ
ത𝑅 𝜃 evaluates the goodness of an actor 𝜋 𝜃 𝑠
Randomness in the actor
and the game
END
242. Goodness of Actor
• 𝜏 = 𝑠1, 𝑎1, 𝑟1, 𝑠2, 𝑎2, 𝑟2, ⋯ , 𝑠 𝑇, 𝑎 𝑇, 𝑟𝑇
𝑃 𝜏|𝜃 =
𝑝 𝑠1 𝑝 𝑎1|𝑠1, 𝜃 𝑝 𝑟1, 𝑠2|𝑠1, 𝑎1 𝑝 𝑎2|𝑠2, 𝜃 𝑝 𝑟2, 𝑠3|𝑠2, 𝑎2 ⋯
= 𝑝 𝑠1 ෑ
𝑡=1
𝑇
𝑝 𝑎 𝑡|𝑠𝑡, 𝜃 𝑝 𝑟𝑡, 𝑠𝑡+1|𝑠𝑡, 𝑎 𝑡
Control by
your actor 𝜋 𝜃
not related
to your actor
Actor
𝜋 𝜃
left
right
fire
𝑠𝑡
0.1
0.2
0.7
𝑝 𝑎 𝑡 = "𝑓𝑖𝑟𝑒"|𝑠𝑡, 𝜃
= 0.7
We define ത𝑅 𝜃 as the
expected value of Rθ
243. Goodness of Actor
• An episode is considered as a trajectory 𝜏
• 𝜏 = 𝑠1, 𝑎1, 𝑟1, 𝑠2, 𝑎2, 𝑟2, ⋯ , 𝑠 𝑇, 𝑎 𝑇, 𝑟𝑇
• 𝑅 𝜏 = σ 𝑡=1
𝑇
𝑟𝑡
• If you use an actor to play the game, each 𝜏 has a
probability to be sampled
• The probability depends on actor parameter 𝜃:
𝑃 𝜏|𝜃
ത𝑅 𝜃 =
𝜏
𝑅 𝜏 𝑃 𝜏|𝜃
Sum over all
possible trajectory
Use 𝜋 𝜃 to play the
game N times,
obtain 𝜏1, 𝜏2, ⋯ , 𝜏 𝑁
Sampling 𝜏 from 𝑃 𝜏|𝜃
N times
≈
1
𝑁
𝑛=1
𝑁
𝑅 𝜏 𝑛
244. Step 1:
define a set
of function
Step 2:
goodness of
function
Step 3: pick
the best
function
Three Steps for Deep Learning
Deep Learning is so simple ……
Neural Network
as Actor
248. Policy Gradient
𝛻 ത𝑅 𝜃 ≈
1
𝑁
𝑛=1
𝑁
𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑃 𝜏 𝑛|𝜃
𝛻𝑙𝑜𝑔𝑃 𝜏|𝜃
=
𝑡=1
𝑇
𝛻𝑙𝑜𝑔𝑝 𝑎 𝑡|𝑠𝑡, 𝜃
=
1
𝑁
𝑛=1
𝑁
𝑅 𝜏 𝑛
𝑡=1
𝑇𝑛
𝛻𝑙𝑜𝑔𝑝 𝑎 𝑡
𝑛
|𝑠𝑡
𝑛
, 𝜃
𝜃 𝑛𝑒𝑤 ← 𝜃 𝑜𝑙𝑑 + 𝜂𝛻 ത𝑅 𝜃 𝑜𝑙𝑑
=
1
𝑁
𝑛=1
𝑁
𝑡=1
𝑇𝑛
𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑝 𝑎 𝑡
𝑛
|𝑠𝑡
𝑛
, 𝜃
If in 𝜏 𝑛
machine takes 𝑎 𝑡
𝑛
when seeing 𝑠𝑡
𝑛
in
𝑅 𝜏 𝑛 is positive Tuning 𝜃 to increase 𝑝 𝑎 𝑡
𝑛
|𝑠𝑡
𝑛
𝑅 𝜏 𝑛 is negative Tuning 𝜃 to decrease 𝑝 𝑎 𝑡
𝑛
|𝑠𝑡
𝑛
It is very important to consider the cumulative reward 𝑅 𝜏 𝑛 of
the whole trajectory 𝜏 𝑛 instead of immediate reward 𝑟𝑡
𝑛
What if we replace
𝑅 𝜏 𝑛
with 𝑟𝑡
𝑛
……
249. Add a Baseline
𝜃 𝑛𝑒𝑤
← 𝜃 𝑜𝑙𝑑
+ 𝜂𝛻 ത𝑅 𝜃 𝑜𝑙𝑑
𝛻 ത𝑅 𝜃 ≈
1
𝑁
𝑛=1
𝑁
𝑡=1
𝑇𝑛
𝑅 𝜏 𝑛 𝛻𝑙𝑜𝑔𝑝 𝑎 𝑡
𝑛
|𝑠𝑡
𝑛
, 𝜃
It is possible that 𝑅 𝜏 𝑛
is always positive.
Ideal
case
Sampling
……
a b c a b c
It is probability …
a b c
Not
sampled
a b c
The probability of the
actions not sampled
will decrease.
1
𝑁
𝑛=1
𝑁
𝑡=1
𝑇𝑛
𝑅 𝜏 𝑛 − 𝑏 𝛻𝑙𝑜𝑔𝑝 𝑎 𝑡
𝑛
|𝑠𝑡
𝑛
, 𝜃
258. RNN
Generation
• Images are composed of pixels
• Generating a pixel at each time by RNN
blue pink blue
<BOS> ……
……
……
red
Consider as a sentence
blue red yellow gray ……
Each pixel is a character.
red blue pink
~
~
~
~
c1 c2 c3
P(c1) P(c2|c1) P(c3|c1,c2) P(c4|c1,c2,c3)
259. 寶可夢鍊成
• Small images of 792 Pokémon's
• Can machine learn to create new Pokémons?
• Source of image:
http://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A
9mon_by_base_stats_(Generation_VI)
Don't catch them! Create them!
Original image is 40 x 40
Making them into 20 x 20
262. PixelRNN
Real
World
Ref: Aaron van den Oord, Nal Kalchbrenner, Koray
Kavukcuoglu, Pixel Recurrent Neural Networks,
arXiv preprint, 2016
263. More than images ……
Audio: Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol
Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu,
WaveNet: A Generative Model for Raw Audio, arXiv preprint, 2016
Video: Nal Kalchbrenner, Aaron van den Oord, Karen Simonyan, Ivo
Danihelka, Oriol Vinyals, Alex Graves, Koray Kavukcuoglu, Video Pixel Networks ,
arXiv preprint, 2016
265. Sentence Generation
• Initialize generator G and discriminator D
• In each iteration:
• Sample real sentences 𝑥 from database
• Generate sentences 𝑥 by G
• Update D to increase 𝐷 𝑥 and decrease 𝐷 𝑥
• Update Gen such that
Generator
Discrimi
nator
scalar
update
266. Sentence Generation
A A
A
B
A
B
A
A
B
B
B
<BOS>
Can we do
backpropogation?
Tuning generator a
little bit will not
change the output.
Discriminator
scalar
NO!
In the paper of
improved WGAN …
(ignoring
sampling process)
268. Sentence Generation - SeqGAN
• Using Reinforcement learning
• Consider the discriminator as reward function
• Consider the output of discriminator as total reward
• Update generator to increase discriminator = to get
maximum total reward
Ref: Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu, “SeqGAN: Sequence
Generative Adversarial Nets with Policy Gradient”, AAAI, 2017
Generator
Discrimi
nator
scalar
update
Using Policy Gradient
269. Chat-bot with GAN
Discriminator
Input
sentence/history h response sentence x
Real or fake
human
dialogues
Chatbot
En De
Conditional GAN
response sentence x
Input
sentence/history h
Jiwei Li, Will Monroe, Tianlin
Shi, Sébastien Jean, Alan Ritter, Dan
Jurafsky, “Adversarial Learning for
Neural Dialogue Generation”, arXiv
preprint, 2017
272. Critic
• A critic does not determine the action.
• Given an actor π, it evaluates the how good the
actor is
http://combiboilersleeds.com/picaso/critics/critics-4.html
An actor can be
found from a critic.
e.g. Q-learning
Not suitable for
continuous action a
273. Actor-Critic
𝜋 interacts with
the environment
Learning Critic
Update actor from
𝜋 → 𝜋’ based on
Critic
TD or MC𝜋 = 𝜋′
Actor = Generator ?
Critic = Discriminator ?
https://arxiv.org/abs/1610.01945
274. Visual Doom AI Competition @ CIG 2016
https://www.youtube.com/watch?v=94EPSjQH38Y
277. Imitation Learning
We have demonstration of the expert.
Actor
𝑠1
𝑎1
Env
𝑠2
Env
𝑠1
𝑎1
Actor
𝑠2
𝑎2
Env
𝑠3
𝑎2
……
reward function is not available
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
Each Ƹ𝜏 is a trajectory
of the export.
Self driving: record
human drivers
Robot: grab the
arm of robot
278. Motivation
• It is hard to define reward in some tasks.
• Hand-crafted rewards can lead to uncontrolled
behavior.
機器人三大法則:
一、機器人不得傷害人類,或坐視人類受到
傷害而袖手旁觀。
二、除非違背第一法則,機器人必須服從人
類的命令。
三、在不違背第一法則及第二法則的情況下,
機器人必須保護自己。
因此為了保護人類整體,控制人類的自由是
必須的
神邏輯
280. Inverse Reinforcement Learning
• Principle: The teacher is always the best.
• Basic idea:
• Initialize an actor
• In each iteration
• The actor interacts with the environments to obtain
some trajectories
• Define a reward function, which makes the
trajectories of the teacher better than the actor
• The actor learns to maximize the reward based on
the new reward function.
• Output the reward function and the actor learned from
the reward function
281. Framework of IRL
Expert ො𝜋
Actor 𝜋
Obtain
Reward Function R
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
𝜏1, 𝜏2, ⋯ , 𝜏 𝑁
Find an actor based
on reward function R
By Reinforcement learning
𝑛=1
𝑁
𝑅 Ƹ𝜏 𝑛 >
𝑛=1
𝑁
𝑅 𝜏
Reward function
= Discriminator
Actor
= Generator
Reward
Function R
282. 𝜏1, 𝜏2, ⋯ , 𝜏 𝑁
GAN
IRL
G
D
High score for real,
low score for generated
Find a G whose output
obtains large score from D
Ƹ𝜏1, Ƹ𝜏2, ⋯ , Ƹ𝜏 𝑁
Expert
Actor
Reward
Function
Larger reward for Ƹ𝜏 𝑛,
Lower reward for 𝜏
Find a Actor obtains
large reward
283. Teaching Robot
• In the past …… https://www.youtube.com/watch?v=DEGbtjTOIB0
289. Concluding Remarks
Lecture 1: Brief Review of Deep Learning
Lecture 2: Introduction of Generative
Adversarial Network
Lecture 3: Conditional Generation
Lecture 4: Making Decision and Control
290. To Learn More …
• Machine Learning
• Slides:
http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML16.htm
l
• Video:
https://www.youtube.com/watch?v=fegAeph9UaA&list=
PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49
• Machine Learning and Having it Deep and Structured
• Slides:
http://speech.ee.ntu.edu.tw/~tlkagk/courses_MLDS17.h
tml
• Video:
https://www.youtube.com/watch?v=IzHoNwlCGnE&list=
PLJV_el3uVTsPMxPbjeX7PicgWbY7F8wW9