Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Session 608
© 2017 Apple Inc. All rights reserved. Redistribution or public display not permitted without written permission from Apple.
Metal 2 Ecosystem
GPU Tools
MetalKit
Metal 2
Metal 2 Ecosystem
GPU Tools
MetalKit
Metal 2
Metal Performance Shaders (MPS)
• Linear Algebra
• Linear Algebra
Sobel Transpose
Image Keypoints
Bilinear Rescale
Image Statistics
Matrix-Matrix Multiplication
Matrix-Vector Multiplication
MPSVector
• Interprets data in MTLBuffer as a 1-dimensional array
Data Representations
MPSVector
• Interprets data in MTLBuffer as a 1-dimensional array
MPSMatrix
• Interprets data in MTLBuffer as a rectangular array
• Row-major order
Data Representations
MPSVector
• Interprets data in MTLBuffer as a 1-dimensional array
MPSMatrix
• Interprets data in MTLBuffer as a rectangular array
• Row-major order
MPSTemporaryMatrix
• Allocated from MTLHeap
MPSMatrixMultiplication
https://developer.apple.com/library/content/samplecode/MPSMatrixMultiplicationSample
Applications
Domain Specific
Frameworks
Vision NLP
ML Framework Core ML
ML Performance
Primitives
Accelerate MPS
What Is Deep Learning?
panda
house
ocean
dress dog
giraffe
girl sunset bicycle
ramp
horse man
plant
skateboard
lights
Training and Inference
cat
rabbit
dog
giraffe
horse
giraffe
dog
cat
rabbit
dog
giraffe
horse
cat
rabbit
dog
giraffe
horse
cat
rabbit
Trained
dog
Parameters
giraffe
horse
cat
rabbit
Trained
dog
Parameters
giraffe
horse
Input Image
cat
CNN
rabbit
dog cat
giraffe
horse
Inference
Training to Classify Images
Agenda
Hierarchical representation
• Organized into a hierarchy of layers
Hierarchical representation
• Organized into a hierarchy of layers
Convolution Fully-Connected
Pooling Softmax
• Average
Neuron
• Max
• Linear
Normalization • ReLU
• Cross-Channel • Sigmoid
• Spatial • Absolute
Convolutional Neural Networks
Primitives available in iOS 10
Convolution Fully-Connected
Pooling Softmax
• Average
Neuron
• Max
• Linear
Normalization • ReLU
• Cross-Channel • Sigmoid
• Spatial • Absolute
Convolution
• 8-bit Integer
• Binary
Convolution NEW
Primitives
Standard
Dilated
Sub-Pixel
Transpose
Binary and XNOR Convolution
Less memory
0.86 -0.66 -0.12 0.19 0.23
Binary and XNOR Convolution
• Full-sized input, binary weights Regular 0.15 0.12 0.23 0.31 -0.51
0.65 -0.78 -0.36 0.72 0.13
• Full-sized input, binary weights Regular 0.15 0.12 0.23 0.31 -0.51
0.65 -0.78 -0.36 0.72 0.13
XNOR +1 +1 +1 +1 -1
+1 -1 -1 +1 +1
Convolution +1
+1
-1
+1
+1
-1
-1
+1
+1
-1
+1 -1 +1 -1 +1
+1 -1 -1 +1 +1
Dilated Convolution
Comparison to regular convolution
Input Output
Dilated Convolution
Comparison to regular convolution
Input Output
Dilated Convolution
Comparison to regular convolution
Input Output
3 x 3 kernel
Dilated Convolution
Comparison to regular convolution
Input Output
3 x 3 kernel
Dilated Convolution
How it works
Input Output
3 x 3 kernel
dilationFactorX = 2
dilationFactorY = 2
Dilated Convolution
How it works
Input Output
3 x 3 kernel
dilationFactorX = 2
dilationFactorY = 2
Sub-Pixel Convolution and Convolution Transpose
Input Output
WxH 2W x 2H
Upscaling
Using a box filter
Input Output
WxH 2W x 2H
Upscaling
Using a box filter
Input Output
WxH 2W x 2H
Sub-Pixel Convolution
How it works
Trained Parameters
Reshuffle
Input
WxH
Convolution Transpose
How it works
Input
WxH
Convolution Transpose
How it works
Convolution
Input Output Dilated Convolution
Batch Normalization
Convolution Transpose
SoftMax
Colorization network*
*Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016, http://richzhang.github.io/colorization/
New Convolution Primitives
Example: colorizing black and white images
Convolution
Dilated Convolution
Batch Normalization
Convolution Transpose
SoftMax
Colorization network*
*Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016, http://richzhang.github.io/colorization/
New Convolution Primitives
Example: colorizing black and white images
Convolution
Dilated Convolution
Batch Normalization
Convolution Transpose
SoftMax
Colorization network*
*Colorful Image Colorization, Richard Zhang, Phillip Isola, Alexei A. Efros, ECCV 2016, http://richzhang.github.io/colorization/
Demo
•
Image colorization
Performance Improvements in iOS 11
Higher is better
40
Improvement
Percentage
20
0
iPhone 6S iPhone 7 Plus iPad Pro 9.7” iPad Pro 10.5"
Inception-v3 network
*Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, CVPR 2015, https://arxiv.org/abs/1512.00567
Performance Improvements in iOS 11
Higher is better
40
29%
Improvement
Percentage
0
iPhone 6S iPhone 7 Plus iPad Pro 9.7” iPad Pro 10.5"
Inception-v3 network
*Rethinking the Inception Architecture for Computer Vision, Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna, CVPR 2015, https://arxiv.org/abs/1512.00567
Agenda
Compact representation
Neural Network Graph API
Ease of use
Compact representation
Compact representation
Compact representation
Compact representation
Compact representation
Compact representation
https://developer.apple.com/library/content/samplecode/MetalImageRecognition
Neural Network Graph API
Deliver best performance
conv1
pool1
conv2
pool2
conv3
pool3
conv4
fc1
fc2
}
// Example: create a graph
func makeGraph() -> MPSNNImageNode {
pool1
conv2
pool2
conv3
pool3
conv4
fc1
fc2
}
// Example: create a graph
func makeGraph() -> MPSNNImageNode {
conv2
pool2
conv3
pool3
conv4
fc1
fc2
}
// Example: create a graph
func makeGraph() -> MPSNNImageNode {
fc2
return fc2.resultImage
}
// Example: create a graph
func makeGraph() -> MPSNNImageNode {
return fc2.resultImage
fc2
}
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
// Example: execute graph on the GPU
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()
let commandBuffer = commandQueue.makeCommandBuffer()
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.encode(to: commandBuffer,
sourceImages: [input])
// Tell GPU to start executing work and wait until GPU work is done
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
CPU GPU
// Example: execute graph on the GPU
// Metal setup encode
let device = MTLCreateSystemDefaultDevice()!
task1
encode
// Initialize graph task2
Bubble
sourceImages: [input])
encode
// Tell GPU to start executing work andtask2
wait until GPU work is done
Bubble
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
time
// Example: execute graph on the GPU asynchronously
// Metal setup
let device = MTLCreateSystemDefaultDevice()!
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.executeAsync(sourceImages: [input]) {
resultImage, error in
// check for error and use resultImage inside closure
}
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.executeAsync(sourceImages: [input]) {
resultImage, error in
// check for error and use resultImage inside closure
}
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.executeAsync(sourceImages: [input]) {
resultImage, error in
// check for error and use resultImage inside closure
}
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.executeAsync(sourceImages: [input]) {
resultImage, error in
// check for error and use resultImage inside closure
}
// Initialize graph
let graph = MPSNNGraph(device: device, resultImage: makeGraph())
// Encode graph
let output = graph?.executeAsync(sourceImages: [input]) {
resultImage, error in
// check for error and use resultImage inside closure
}
encode
execute
// Create input image task3 task2
let input = MPSImage(texture: texture, …)
encode
execute
// Encode graph task4 task3
time
Demo
•
One input
Image
CNN
One - to - one
CNN
dog
grass
…
Inference
CNN
Inference
RNN
Sequences: one - to - many
Inference Inference
Inference
Sequence of inputs
Sentence in English
RNN
Sequences: many - to - many
RNN Mustan ja
A black and
Чёрно-белая valkoisen
white dog
собака лежит värinen koira
laying in the
на траве makaa
grass
ruohikolla
Inference
Single Gate
Recurrent
Unit
Input
Long Short-Term Memory (LSTM)
Input
Long Short-Term Memory (LSTM)
Input
LSTM
Architecture
Output
LSTM
Memory
Cell
Input
LSTM
Architecture
LSTM
Memory
Cell
LSTM
Architecture
LSTM
Old New
Memory Memory
LSTM
Architecture M Matrix-Matrix or Matrix-Vector Multiply
*+ Point-wise operations
LSTM What to keep from old memory
Old New
*
Memory Memory
Previous
M Forget
Output
Input M
Gate
LSTM
Architecture M Matrix-Matrix or Matrix-Vector Multiply
*+ Point-wise operations
LSTM What to keep from old memory
How new input affects new memory
Old New
*
Memory Memory
*+ Point-wise operations
LSTM What to keep from old memory
How new input affects new memory
Old New
*
Memory Memory
*+ Point-wise operations
LSTM What to keep from old memory
How new input affects new memory
Old New
Memory * +
Memory
*+ Point-wise operations
LSTM What to keep from old memory
Output How new input affects new memory
Previous How previous output, current input,
M Output
Output new memory affect new output
*
Input M
Gate
Old New
*
Memory Memory
// Create and initialize gate weights with trained parameters, using a data source provider
// for just-in-time loading and purging of weights
descriptor.forgetGateInputWeights = MyWeights(file:“forgetGateWeights.dat”))
descriptor.cellGateInputWeights = MyWeights(file:“cellGateWeights.dat”))
// Initialize the rest of the gates…
// Metal setup
let device = MTLCreateSystemDefaultDevice()! // Also get commandQueue and commandBuffer
// Create and initialize gate weights with trained parameters, using a data source provider
// for just-in-time loading and purging of weights
descriptor.forgetGateInputWeights = MyWeights(file:“forgetGateWeights.dat”))
descriptor.cellGateInputWeights = MyWeights(file:“cellGateWeights.dat”))
// Initialize the rest of the gates…
// Metal setup
let device = MTLCreateSystemDefaultDevice()! // Also get commandQueue and commandBuffer
// Create and initialize gate weights with trained parameters, using a data source provider
// for just-in-time loading and purging of weights
descriptor.forgetGateInputWeights = MyWeights(file:“forgetGateWeights.dat”))
descriptor.cellGateInputWeights = MyWeights(file:“cellGateWeights.dat”))
// Initialize the rest of the gates…
// Metal setup
let device = MTLCreateSystemDefaultDevice()! // Also get commandQueue and commandBuffer
// Create and initialize gate weights with trained parameters, using a data source provider
// for just-in-time loading and purging of weights
descriptor.forgetGateInputWeights = MyWeights(file:“forgetGateWeights.dat”))
descriptor.cellGateInputWeights = MyWeights(file:“cellGateWeights.dat”))
// Initialize the rest of the gates…
// Metal setup
let device = MTLCreateSystemDefaultDevice()! // Also get commandQueue and commandBuffer
caption
caption
caption
Trained
Parameters
caption
caption
caption
Determine what Generate
is depicted in image
the image caption
CNN RNN
Trained
Parameters
Example: Image Captioning
Inference
Trained
Parameters
Example: Image Captioning
Inference
CNN RNN
Trained
Parameters
Example: Image Captioning
Inference
CNN RNN
CNN RNN
Example: Image Captioning
Inference
a man riding a
wave on top of a
Determine what Generate surfboard
is depicted in image
the image caption
CNN RNN
Example: Image Captioning
Inference
a man riding a
wave on top of a
surfboard
Determine what Generate
is depicted in image
the image caption
LSTM
Memory
Inception-v3 Cell
*Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, IEEE Transactions on Pattern Analysis and Machine Intelligence,
https://arxiv.org/abs/1609.06647
Example: Image Captioning Convolution
LSTM initialization phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
LSTM
Memory
Cell
Inception-v3
Example: Image Captioning Convolution
LSTM initialization phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
LSTM
Memory
Cell
Inception-v3
Example: Image Captioning Convolution
LSTM initialization phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
LSTM
Memory
Cell
Inception-v3
Feature vector
Example: Image Captioning Convolution
LSTM initialization phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
LSTM
Memory
Cell
Inception-v3
Feature vector
Example: Image Captioning Convolution
Caption generation phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
Sentence start
Input
token
LSTM
Memory
Cell
Output
Example: Image Captioning Convolution
Caption generation phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
Sentence start
Input
token
LSTM
Memory
Cell
3 best one-word
Output
captions
Example: Image Captioning Convolution
Caption generation phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
LSTM
Memory
Cell
3 best one-word
Output
captions
Example: Image Captioning Convolution
Caption generation phase Pooling (Avg.)
Pooling (Max.)
Fully-Connected
SoftMax
LSTM LSTM
Memory Memory
Cell Cell
...
LSTM LSTM LSTM
Memory Memory Memory
Cell Cell Cell
Iteration 1 Iteration 2
Caption Probability Caption Probability
man 0.021986
a 0.862899
the 0.039906
Iteration 1 Iteration 2
Caption Probability Caption Probability
man 0.021986
a 0.862899
the 0.039906
Iteration 1 Iteration 2
Caption Probability Caption Probability
man 0.021986 man on 0.005290
a 0.862899 man in 0.004869
the 0.039906 man surfing 0.003914
Iteration 1 Iteration 2
Caption Probability Caption Probability
man 0.021986 man on 0.005290
a 0.862899 man in 0.004869
the 0.039906 man surfing 0.003914
a man 0.385814
Top three captions:
a person 0.136590
1.
2. a surfer 0.116651
3.
Caption Generation
Iteration 1 Iteration 2
Caption Probability Caption Probability
man 0.021986 man on 0.005290
a 0.862899 man in 0.004869
the 0.039906 man surfing 0.003914
a man 0.385814
Top three captions:
a person 0.136590
1.
2. a surfer 0.116651
3. the man 0.014275
the surfer 0.012315
the young 0.003500
Caption Generation
Iteration 1 Iteration 2
Caption Probability Caption Probability
man 0.021986 man on 0.005290
a 0.862899 man in 0.004869
the 0.039906 man surfing 0.003914
a man 0.385814
Top three captions:
a person 0.136590
1.
2. a surfer 0.116651
3. the man 0.014275
the surfer 0.012315
the young 0.003500
Caption Generation
Iteration 2 Iteration 3
Caption Probability Caption Probability
man on 0.005290 a man riding 0.115423
man in 0.004869 a man on 0.060142
man surfing 0.003914 a man is 0.048678
a man 0.385814 a person riding 0.041114
Top three captions:
a person 0.136590 a person on 0.031153
1.
2. a surfer 0.116651 a person in 0.014218
3. the man 0.014275 a surfer is 0.027462
the surfer 0.012315 a surfer riding 0.016631
the young 0.003500 a surfer in 0.015737
Caption Generation
Iteration 2 Iteration 3
Caption Probability Caption Probability
man on 0.005290 a man riding 0.115423
man in 0.004869 a man on 0.060142
man surfing 0.003914 a man is 0.048678
a man 0.385814 a person riding 0.041114
Top three captions:
a person 0.136590 a person on 0.031153
1.
2. a surfer 0.116651 a person in 0.014218
3. the man 0.014275 a surfer is 0.027462
the surfer 0.012315 a surfer riding 0.016631
the young 0.003500 a surfer in 0.015737
Caption Generation
Iteration 3 Iteration 4
Caption Probability Caption Probability
a man riding 0.115423 a man riding a 0.100079
a man on 0.060142 a man riding on 0.008264
a man is 0.048678 a man riding the 0.002604
a person riding 0.041114 a man on a 0.055997
Top three captions:
a person on 0.031153 a man on his 0.001288
1.
2. a person in 0.014218 a man on the 0.001211
3. a surfer is 0.027462 a man is surfing 0.021393
a surfer riding 0.016631 a man is riding 0.015222
a surfer in 0.015737 a man is on 0.003434
Caption Generation
Iteration 3 Iteration 4
Caption Probability Caption Probability
a man riding 0.115423 a man riding a 0.100079
a man on 0.060142 a man riding on 0.008264
a man is 0.048678 a man riding the 0.002604
a person riding 0.041114 a man on a 0.055997
Top three captions:
a person on 0.031153 a man on his 0.001288
1.
2. a person in 0.014218 a man on the 0.001211
3. a surfer is 0.027462 a man is surfing 0.021393
a surfer riding 0.016631 a man is riding 0.015222
a surfer in 0.015737 a man is on 0.003434
Caption Generation