Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Q&A
2
AGENDA
Q&A
3
DIRECTX 12: MORE CONTROL
Gives expert programmers more explicit control over the GPU
4
AGENDA
Q&A
5
WHAT DOES THE DIRECTX11 DRIVER DO FOR YOU?
Worker Hardware Engines/Queues
threads
Resource
Residency
Resource
Barriers
GPU HW
6
DIRECTX 12: MORE RESPONSIBILITIES
Worker HW Queues
threads
Resource
Residency
MultiGPU
Resource
Barriers
GPU HW
7
AGENDA
DirectX 12: more control & responsibilities
Q&A
8
EFFICIENT DIRECTX 12 ON NVIDIA GPUS (1/2)
Construct balanced number of Command Lists (CLs) in parallel
9
EFFICIENT DIRECTX 12 ON NVIDIA GPUS (2/2)
Gracefully deal with the hardware tiers of NVIDIA GPUs
10
COMMAND LISTS
Use multiple threads to construct CLs in parallel
50-80
microsecs 11
BARRIERS
You need to get the use of barriers right!
Avoid redundancy
12
ROOT SIGNATURES
Dont just use one RS
Use a reasonably small set of RSs
Fill all the bindings and descriptor heap entries with sensible data before the CL
execution
Even if the used shaders do not reference all descriptors
Use nullCBVs and nullUAVs with descriptor tables
14
RESOURCE TIER 2 BINDING GONE WRONG
RootSignature RootSignature2
CBV: not init. CBV: gpuvadr1
Change Fill
UAV: not init. UAV: not init.
RS RS
Desc Table: Desc Table: Issue
not init. Desc Table X Drawcall
Change Shader does not
Shader not use UAV or filled
Desc Table X DescTable::CBV0 Desc Table X
CBV0: not init.
CBV0: not init.
CBV1: not init. Fill CBV1:CBVDsc1
UAV: not init. Table
UAVDsc2
Draw calls
15
RESOURCE TIER 2 BINDING GONE WRONG
RootSignature RootSignature2
CBV: not init. CBV: gpuvadr1
Change Fill
UAV: not init. UAV: nullptr
RS RS
Desc Table: Desc Table: Issue
not init. Desc Table X Drawcall
Change Shader does not
Shader not use UAV or filled
Desc Table X DescTable::CBV0 Desc Table X
CBV0: not init.
CBV0: not init.
CBV1: not init. Fill CBV1:CBVDsc1
UAV: not init. Table
UAVDsc2
Draw calls
16
RESOURCE TIER 2 BINDING DONE RIGHT
RootSignature RootSignature2
CBV: not init. CBV: gpuvadr1
Change Fill
UAV: not init. UAV: nullptr
RS RS
Desc Table: Desc Table: Issue
not init. Desc Table X Drawcall
Change Shader does
Shader not use UAV or
Desc Table X DescTable::CBV0 Desc Table X
CBV0: not init.
CBV0: nullCBV
CBV1: not init. Fill CBV1:CBVDsc1
UAV: not init. Table
UAVDsc2
Draw calls
17
RESOURCE HEAPS
Current NVidia GPUs support Resource Heap Tier 1
18
STRATEGIC CONSTANT FOLDING FOR SHADERS
DirectX 12 makes it harder for the driver to fold shader constants
If you detect a big DX11 vs DX12 perf delta for key shaders
Try to strategically fold constants manually
19
SHADERS FOLDING CONSTANTS MANUALLY
cbuffer
{
cbuffer #ifdef FOLD_CBSWITCH
{ float cfSpecWeightCB;
float cfSpecWeight; #define cfSpecWeight 0.0f
#else
manual transform
} float cfSpecWeight;
#endif
float4 computeLighting() cfSpecWeight == 0.0f
{ }
res=CalcLighting(cfSpecWeight); float4 computeLighting()
{
}
res=CalcLighting(cfSpecWeight);
}
20
SHADERS FOLDING CONSTANTS MANUALLY
cbuffer
{
cbuffer #ifdef FOLD_CBSWITCH
{ float cfSpecWeightCB;
float cfSpecWeight; #define cfSpecWeight 0.0f
#else
manual transform
} float cfSpecWeight;
#endif
float4 computeLighting() cfSpecWeight == 0.0f
{ }
res=CalcLighting(cfSpecWeight); float4 computeLighting()
{
}
res=CalcLighting(0.0f);
}
21
RESOURCE RESIDENCY
IDXGIAdapter3::QueryVideoMemoryInfo
Foreground app is guaranteed a subset of total vidmem this is your budget
App needs to deal with changes in available mem and Evict() resources
22
VIDEO MEMORY OVER-COMMITMENT
DX12 gives user a real advantage over the DX11 driver
You know whats more important to have in vidmem
Create overflow heaps in the system and move some resources from
vidmem heaps
Video memory System memory
COMPUTE
Use compute queues with care
COPY
Not all workloads pair up nicely
Remember IHV specific path for DX12!
Come and talk to us about getting this right
24
AGENDA
DirectX 12: more control & responsibilities
Q&A
25
NEW DIRECTX 12 PROGRAMMING MODEL USE CASES
Predication
Offers more flexibility than DirectX 11
ExecuteIndirect
More powerful than DirectX 11 DrawIndirect() or DispatchIndirect()
26
NEW DIRECTX 12 PREDICATION MODEL
Now fully decoupled from queries
Predication on the value at a location in a buffer
GPU reads buffer value when executing SetPredication
Predication
0 1 ... 1
Buffer
27
JUST FYI : CALLS THAT CAN BE PREDICATED
DrawInstanced, DrawIndexedInstanced, Dispatch,
CopyTextureRegion, CopyBufferRegion,
CopyResource, CopyTiles, ResolveSubresource,
ClearDepthStencilView, ClearRenderTargetView,
ClearUnorderedAccessViewUint,
ClearUnorderedAccessViewFloat, ExecuteIndirect
28
CASE: ASYNCHRONOUS CPU BASED OCCLUSION
CPU threads set 1: record command lists for objects
SetPredication(0) SetPredication(1) SetPredication(..) SetPredication(N)
CL DrawObj(0) DrawObj(1) DrawObj(..) DrawObj(N)
CPU threads set 2: perform software occlusion queries and fill in buf
Predication 0 1 ... 1
Buffer
29
EXECUTE INDIRECT (1/2)
Inbetween Draws/Dispatches:
Change
30
EXECUTE INDIRECT API
void ExecuteIndirect(
Defines the operations
ID3D12CommandSignature* pCommandSignature,
to be carried out repeatedly
Max count of
UINT MaxCommandCount,
repetitions
);
31
EXECUTE INDIRECT (2/2)
Draw thousands of different objects in one ExecuteIndirect
32
EXECUTE INDIRECT DRAWING SIMULATED TREES
Imagine large set of physically simulated unique trees
Perhaps even broken up into tree parts by destruction
For simplicity : All trees share the same texture atlas or texture array
Each tree has a unique mesh and unique vertex and index buffer
This also means vertex count can be unique as well
33
EXECUTE INDIRECT DRAWING SIMULATED TREES
DirectX 11
Solution 1:
Solution 2:
SetupMeshForAllTrees(VB,IB);
DrawTreesInstanced();
Needs to draw each tree with the
same numbers of vertices for
instancing to work
34
EXECUTE INDIRECT DRAWING SIMULATED TREES
DirectX 12
CommandSignature
Argument Type Data
VertexBufferView VirtualAddressVB
Size
Stride
IndexBufferView VirtualAddressIB
Size
Format
DrawIndexed IndexCount
InstanceCount
StartIndexLocation
BaseLocation
StartInstanceLocation
35
EXECUTE INDIRECT DRAWING SIMULATED TREES
DirectX 11 DirectX 12
Solution 1: CreateTreeCommandSignature();
foreach(tree) foreach(tree)
Slow too
SetupMesh(VB,IB); many API call
appendDrawArgsAndVB(tree,argbuffer);
DrawTree();
ExcuteIndirect(...,argbuffer,..)
Solution 2:
36
EXPLICIT MULTI GPU
Finally full control over what goes on each GPU
Create resources on specific GPUs
Execute command lists on specific GPUs
Explicitly copy resources between GPUs
Perfect usecase for DirectX 12 copy queues
37
MULTIGPU WORK DISTRIBUTION SAMPLE
Frame time
Check Juha Sjholms GDC talk Explicit Multi GPU Programming with DirectX 12
38
AGENDA
DirectX 12: more control & responsibilities
Q&A
39
CONSERVATIVE RASTER
Door opener to advanced AA
techniques
Enables the rasterizer to be
used to do triangle binning
40
DX12&11.3 FL3 HARDWARE FEATURES USE CASES
Volume Tiled Resources
see Latency Resistant Sparse Fluid Simulation: [Alex Dunn, D3D Day GDC 2015]
41
Q&A OLEG KUZNETSOV : OKUZNETSOV@NVIDIA.COM
NVIDIA GeForce
6xx series 9xx series
and above and above
Feature Level 11_0 12_1
Resource Binding Tier 2
Tiled Resources Tier 1 Tier 3
Typed UAV Loads No Yes
Conservative
No Tier 1
Rasterization
Rasterizer-Ordered
No Yes
Views
Stencil Reference Output No
UAV Slots 64
Resource Heap Tier 1
https://developer.nvidia.com/dx12-dos-and-donts 42