SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Siggraph Asia 2014 
Tristan Lorach 
Manager of Devtech for Professional Visualization group 
OPENGL NVIDIA "COMMAND-LIST": "APPROACHING ZERO DRIVER OVERHEAD"
2 
Siggraph Asia 2014 
GPU Scalability GPUs are extremely powerful But only if used correctly Mix of Application and Graphic API (Driver) responsibility Increase amount of work per batch (Job) Minimize CPUGPU interactions Lower Memory traffic Lower API calls: Batch things together Factorize Data Re-use data uploaded to Video memory (instancing…)
3 
Siggraph Asia 2014 
Use of the GPU Games ~1500 to 3000 Drawcalls for a scene Intensive use of Multi-layer image processing over the scene Heavy shaders CAD/FCC/professional applications Hard to batch user’s works (CAD applications) ~10,000 to… 300,000 Drawcalls for a scene: heavy CPU workload for our driver Shaders simpler than games (but catching-up these days...) Post-Processing more and more used (SSAO)
4 
Siggraph Asia 2014 
Use of the GPU New Graphic APIs are trying to address these concerns Khronos (Announced it at Siggraph Vancouver 2014) Metal Mantle DX12 All propose better ways to issue commands and render-states But can we improve OpenGL to go the same path ? Yes: NV_command_list
5 
Siggraph Asia 2014 
OpenGL OpenGL is ~25 years old (!!) 1.x: fixed function 2.x: Shading language (GLSL) 3.x/4.x: more as a massive compute system More graphic stages (Geom.Shader; Tessellation) High flexibility & efficiency on memory accesses High programability Design "flaws": still “State Machine”; ServerClient; poor multi-threading management 
1992 
2012
6 
Siggraph Asia 2014 
Challenge of Issuing Commands Issuing drawcalls and state changes can be a real bottleneck 
650,000 Triangles 
68,000 Parts 
~ 10 Triangles per part 
3,700,000 Triangles 
98 000 Parts 
~ 37 Triangles per part 
14,338,275 Triangles/lines 
300,528 drawcalls (parts) 
~ 48 Triangles per part 
App + driver 
GPU 
GPU 
idle 
CPU 
Excessive Work from App & Driver On CPU 
! 
! 
courtesy of PTC
7 
Siggraph Asia 2014 
Big Picture – Typical Case 
Vertex Puller (IA) 
Vertex Shader 
TCS (Tessellation) 
TES (Tessellation) 
Tessellator 
Geometry Shader 
Transform Feedback 
Rasterization 
Fragment Shader 
Per-Fragment Ops 
Framebuffer 
Tr. Feedback buffer 
Uniform Block 
Texture Fetch 
Image Load/Store 
Atomic Counter 
Shader Storage 
Element buffer (EBO) 
Draw Indirect Buffer 
Vertex Buffer (VBO) 
Front-End (decoder) 
Cmd bundles 
OpenGL 
Driver 
Application 
Push-Buffer 
(FIFO) 
cmds 
FBO resources (Textures / RB) 
64 bits pointers 
Handles 
(IDs) 
Id  64 bits 
Addr. 
OpenGL 
Commands 
OpenGL 
resources
8 
Siggraph Asia 2014 
Application 
Token-buffer 
OpenGL 
Driver 
Big Picture 
Vertex Puller (IA) 
Vertex Shader 
TCS (Tessellation) 
TES (Tessellation) 
Tessellator 
Geometry Shader 
Transform Feedback 
Rasterization 
Fragment Shader 
Per-Fragment Ops 
Framebuffer 
Tr. Feedback buffer 
Uniform Block 
Texture Fetch 
Image Load/Store 
Atomic Counter 
Shader Storage 
Element buffer (EBO) 
Draw Indirect Buffer 
Vertex Buffer (VBO) 
Front-End (decoder) 
Cmd bundles 
Push-Buffer 
(FIFO) 
cmds 
FBO resources 
(Textures / RB) 
OpenGL 
Commands 
resources 
64 bits 
Pointers 
(bindless) 
64 bits GPU Address 
Offload cmd bundles creation to the App.
9 
Siggraph Asia 2014 
Application 
Command-list 
Token-buffer 
OpenGL 
Driver 
Big Picture 
Vertex Puller (IA) 
Vertex Shader 
TCS (Tessellation) 
TES (Tessellation) 
Tessellator 
Geometry Shader 
Transform Feedback 
Rasterization 
Fragment Shader 
Per-Fragment Ops 
Framebuffer 
Tr. Feedback buffer 
Uniform Block 
Texture Fetch 
Image Load/Store 
Atomic Counter 
Shader Storage 
Element buffer (EBO) 
Draw Indirect Buffer 
Vertex Buffer (VBO) 
Front-End 
(decoder) 
Cmd bundles 
Push-Buffer 
(FIFO) 
cmds 
FBO resources 
(Textures / RB) 
OpenGL 
Commands 
Token-buffers 
+ state objects 
resources 
64 bits 
Pointers 
(bindless) 
State Object 
More work for 
FE – but fast !
10 
Siggraph Asia 2014 
Demo – Quadro K5000 CAD Car model 14,338,275 Primitives 300,528 drawcalls 348,862 attribute update 12,004 uniform update Set of CAD models together 29,344,075 Primitives 26,144 drawcalls 18,371 attribute update 13,632 uniform update
11 
Siggraph Asia 2014 
DemoS – Quadro K5000 CAD Car model K5000: 920,363,200 primitives/S 19,290,668 drawcalls/S
12 
Siggraph Asia 2014 
DemoS – Quadro K5000 
Set of CAD models together K5000 1,211,684,096 primitives/S 1,079,545 drawcalls/S
13 
Siggraph Asia 2014 
DemoS – Maxwell CAD Car model K5000: 920,363,200 primitives/S 19,290,668 drawcalls/S Maxwell GM204 (GeForce): 1,782,257,920 primitives/S 37,355,848 drawcalls/S Drawcall time : 8 Micro- seconds on CPU (!) Set of CAD models together K5000 1,211,684,096 primitives/S 1,079,545 drawcalls/S Maxwell GM204 (GeForce) 3,012,795,392 primitives/S 2,684,239 drawcalls/S Drawcall time: 42 Micro- seconds on CPU 
!
14 
Siggraph Asia 2014 
More Performances Another test: always using bindless UBO and VBO CPU-Emulation: Token stream executed as CPU "switch-case" Strategies Un-grouped: ~ 1 MB, 79,000 Tokens, ~68,000 drawcalls Material-grouped: ~300 KB, 22,000 Tokens, ~11,000 drawcalls 
Strategy 
Draw time Material-grouped 
0.73 
HW TOKEN-Buffers 
0.56 
Technique 
GPU (ms) 
1.4 1.14 x 
~0 BIG x 
CPU (ms) 
Draw time un-grouped 
2.4 
1.1 
GPU (ms) 
4.5 1.6 x 
~0 BIG x 
CPU (ms) 
CPU-emulated TOKEN-buffers 
glBindBufferRange + drawcalls 
0.73 
1.6 
3.5 
7.2 
Preliminary results K6000
15 
Siggraph Asia 2014 
More Performances 5 000 shader changes: toggling between two shaders in „shaded & edges“ 5 000 fbo changes: similar as above but with fbo toggle instead of shader Almost no additional cost compared to rendering without fbo changes 
Timing 
GPU (ms) 
CPU (ms) 
CPU-emulated TOKEN-buffers 
12.7 
15.1 
2.9 4.3 x 
0.4 37 x 
HW TOKEN-buffers 
2.8 4.5 x 
0.005 BIG x 
Command-Lists 
Timer 
GPU (ms) 
CPU (ms) 
CPU-emulated TOKEN-buffers 
60.0 
60.0 
1.8 33 x 
0.9 66 x 
HW TOKEN-buffers 
1.7 35 x 
0.022 BIG x 
Command-Lists 
Preliminary results on K5000
16 
Siggraph Asia 2014 
GPU Virtual Memory 
Bindless Technology What is it about? Work from native GPU pointers/handles (NVIDIA pioneered this technology) A lot less CPU work (memory hopping, validation...) Allow GPU to use flexible data structures Bindless Buffers Vertex & Global memory since Tesla Generation (CUDA capable) Bindless Textures Since Kepler Bindless Constants (UBO) New driver feature, support for Fermi and above Bindless plays a central role for Command-List 
Vertex Puller (IA) 
Vertex Shader 
TCS (Tessellation) 
TES (Tessellation) 
Tessellator 
Geometry Shader 
Transform Feedback 
Rasterization 
Fragment Shader 
Per-Fragment Ops 
Framebuffer 
Uniform Block 
Texture Fetch 
Element buffer (EBO) 
Vertex Buffer (VBO) 
Front-End (decoder) 
Push buffer 
64 bits address 
Attr.s 
Idx 
Uniform Block 
Send Ptrs
17 
Siggraph Asia 2014 
#define UBA UNIFORM_BUFFER_ADDRESS_NV UpdateBuffers(); glEnableClientState(UNIFORM_BUFFER_UNIFIED_NV); glBufferAddressRangeNV (UBA, 0, addrView, viewSize); foreach (obj in scene) { ... // glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); glBufferAddressRangeNV(UBA, 1, addrMatrices + obj.matrixOffset, maSize); foreach ( batch in obj.primitive_group_material) { // glBindBufferRange (UBO, 2, uboMaterial, batch.materialOffset, mtlSize); glBufferAddressRangeNV(UBA, 2, addrMaterial + batch.materialOffset, mtlSize); ... } } 
Example On Using Bindless UBO 
New pointer for UBO#1 updated per Object 
New pointer for UBO#2 updated per Primitive group 
pointer for UBO#0 updated once for all objects 
regular UBO binds are now ignored!
18 
Siggraph Asia 2014 Pointers must be aligned So do Ptr Offsets glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &offsetAlignment); Normally: 256 bytes 
 gaps between each item 
Try to fit as much data as possible… Not the case if passing an index for array access (MDI + “Base-instance”) But requires special GLSL code (indexing) 
Bindless And Memory Alignement 
GPU 
Virtual 
Memory 
64 bits address 
Uniform Material 0 
Uniform Material 1 
Uniform Material 2 
… 
glBufferAddressRangeNV (UBA, 2, addrMaterial + batch.materialOffset , mtlSize); 
256b 
Uniform array materials[] 
Material 0 
Material 1 
Material 2 
… 
Uniform …
19 
Siggraph Asia 2014 
NV_command_list Key Concepts 
1.Tokenized Rendering: some state changes and draw commands are encoded into binary data stream Depends on bindless technology 
2.State Objects Whole OpenGL States (program, blending...) captured as an object Allows pre-validation of state combinations, later reuse of objects is very fast 
3.Command List Object „Display-list“ paradigm but more flexible: buffer are outside (referenced by address), so content can still be modified (matrics, vertices...)
20 
Siggraph Asia 2014 
UpdateBuffers(); 
glBindBufferBase (UBO, 0, uboView); 
foreach (obj in scene) { 
// redundancy filter for these (if (used != last)... 
glBindVertexBuffer (0, obj.geometry->vbo, 0, vtxSize); 
glBindBuffer (ELEMENT, obj.geometry->ibo); 
glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); 
// iterate over cached material groups 
foreach ( batch in obj.materialGroups) { 
glBindBufferRange (UBO, 2, uboMaterial, batch.materialOffset, mtlSize); 
glMultiDrawElements (...); 
} 
} 
Typical Scene Drawing Example 
~ 2 500 api drawcalls 
~11 000 drawcalls 
~55 triangles per call
21 
Siggraph Asia 2014 
UpdateBuffers(); 
glBindBufferBase (UBO, 0, uboView); 
foreach (obj in scene) { 
// redundancy filter for these (if (used != last)... 
glBindVertexBuffer (0, obj.geometry->vbo, 0, vtxSize); 
glBindBuffer (ELEMENT, obj.geometry->ibo); 
glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); 
// iterate over all parts individually 
foreach ( part in obj.parts) { 
if (part.material != lastMaterial){ 
glBindBufferRange (UBO, 2, uboMaterial, part.materialOffset, mtlSize); 
} 
glDrawElements (...); 
} 
} 
Typical Scene Drawing Example 
~68 000 drawcalls ~10 triangles per call
22 
Siggraph Asia 2014 
Scene Drawing Converted To Token Buffer 
glDrawCommandsNV (GL_TRIANGLES, tokenBuffer , offsets[], sizes[], count); // {0}, {bufferSize}, 1 
foreach (obj in scene) { 
glBufferAddressRangeNV(VERTEX.., 0, obj.geometry->addrVBO, ...); 
glBufferAddressRangeNV(ELEMENT..., 0, obj.geometry->addrIBO, ...); 
glBufferAddressRangeNV(UNIFORM..., 1, addrMatrices ...); 
foreach ( batch in obj.materialGroups) { 
glBufferAddressRangeNV(UNIFORM, 2, addrMaterial ...); 
glMultiDrawElements(...) 
} 
} 
becomes a single drawcall 
replaces 80k calls to GL 
(for graphic-card model) 
Token buffer 
Set Attr#0 on VBO address … 
Set Attr#1 on VBO address … 
Set Elements on EBO address … 
Uniform Matrix on UBO address … 
Uniform Material 
on UBO address … 
DrawElements 
Uniform Material 
on UBO address … 
DrawElements 
Obj#1 
Obj#2 
… 
For all objects 
…
23 
Siggraph Asia 2014 
Tokens-buffers are tightly packed structs in linear memory 
Token Buffer Structures 
TokenUbo 
{ 
GLuint header; 
UniformAddressCommandNV 
{ 
GLushort index; 
GLushort stage; 
GLuint64 address; 
} cmd; 
} 
TokenVbo { GLuint header; AttributeAddressCommandNV { GLuint index; GLuint64 address; } cmd; } 
TokenIbo 
{ 
GLuint header; 
ElementAddressCommandNV 
{ 
GLuint64 address; 
GLuint typeSizeInByte; 
} cmd; 
} 
TokenDrawElements 
{ 
Gluint header; 
DrawElementsCommandNV 
{ 
GLuint count; 
GLuint firstIndex; 
GLuint baseVertex; 
} cmd; 
} 
Token buffer 
Set Attr#0 on VBO address … 
Set Attr#1 on VBO address … 
Set Elements on EBO address … 
Uniform Matrix 
on UBO address … 
Uniform Material on UBO address … 
DrawElements 
Uniform Material 
on UBO address … 
DrawElements 
… 
Uniform Material 
on UBO address … 
DrawElements 
Uniform Material 
on UBO address … 
DrawElements 
Uniform Material 
on UBO address … 
DrawElements 
Uniform Material 
on UBO address … 
DrawElements 
Uniform Material 
on UBO address … 
DrawElements
24 
Siggraph Asia 2014 
Tokenized Rendering 
Token Headers are GPU commands, followed by arguments 
FRONTFACE_COMMAND_NV 
BLEND_COLOR_COMMAND_NV 
STENCIL_REF_COMMAND_NV 
LINE_WIDTH_COMMAND_NV 
POLYGON_OFFSET_COMMAND_NV 
ALPHA_REF_COMMAND_NV 
VIEWPORT_COMMAND_NV 
SCISSOR_COMMAND_NV 
ELEMENT_ADDRESS_COMMAND_NV 
ATTRIBUTE_ADDRESS_COMMAND_NV 
UNIFORM_ADDRESS_COMMAND_NV 
DRAW_ELEMENTS_COMMAND_NV 
DRAW_ARRAYS_COMMAND_NV 
DRAW_ELEMENTS_STRIP_COMMAND_NV 
DRAW_ARRAYS_STRIP_COMMAND_NV 
DRAW_ELEMENTS_INSTANCED_COMMAND_NV 
DRAW_ARRAYS_INSTANCED_COMMAND_NV 
TERMINATE_SEQUENCE_COMMAND_NV 
NOP_COMMAND_NV 
Final list might change 
Render States 
Index Buffer; 
Position Attribute; Normals; Texcoords... 
Matrices; Material data... 
Draw Commands ! 
Special Commands (see later)
25 
Siggraph Asia 2014 
Tokenized Rendering 
Example: UBO Address assignment structure 
FRONTFACE_COMMAND_NV 
BLEND_COLOR_COMMAND_NV 
STENCIL_REF_COMMAND_NV 
LINE_WIDTH_COMMAND_NV 
POLYGON_OFFSET_COMMAND_NV 
ALPHA_REF_COMMAND_NV 
VIEWPORT_COMMAND_NV 
SCISSOR_COMMAND_NV 
ELEMENT_ADDRESS_COMMAND_NV 
ATTRIBUTE_ADDRESS_COMMAND_NV 
UNIFORM_ADDRESS_COMMAND_NV 
DRAW_ELEMENTS_COMMAND_NV 
DRAW_ARRAYS_COMMAND_NV 
DRAW_ELEMENTS_STRIP_COMMAND_NV 
DRAW_ARRAYS_STRIP_COMMAND_NV 
DRAW_ELEMENTS_INSTANCED_COMMAND_NV 
DRAW_ARRAYS_INSTANCED_COMMAND_NV 
TERMINATE_SEQUENCE_COMMAND_NV 
NOP_COMMAND_NV 
TokenUbo 
{ 
GLuint header; 
UniformAddressCommandNV 
{ 
GLushort index; 
GLushort stage; 
GLuint64 address; 
} cmd; 
} 
Example 
Real Token to be queried 
header = glGetCommandHeaderNV( 
UNIFORM_ADDRESS_COMMAND_NV, Sizeof(TokenUbo) ) 
One Shading stage to target (not ‘program’) 
Stage = glGetStageIndexNV(S) 
(S: STAGE_VERTEX|GEOMETRY|FRAGMENT|TESS_CONTROL/EVAL) 
Hardware Index 
*not* Uniform Program index
26 
Siggraph Asia 2014 
TOKENIZED RENDERING Allows rendering of a mix of LIST, STRIP, FAN, LOOP modes in one go Pass „basic type“ as Token, use „mode“ for more in "Instanced" drawing 
TokenDrawElements 
{ 
Gluint header; 
DrawElementsCommandNV 
{ 
GLuint count; 
GLuint firstIndex; 
GLuint baseVertex; 
} cmd; 
} 
TokenDrawElementsInstanced 
{ 
Gluint header; 
DrawElementsInstancedCommandNV 
{ 
GLuint mode; 
GLuint count; 
GLuint instanceCount; 
GLuint firstIndex; 
GLuint baseVertex; 
GLuint baseInstance; 
} cmd; 
} 
LIST, STRIP, 
FAN, LOOP types 
(GL_LINE_LOOP...) 
DRAW_ELEMENTS_COMMAND_NV 
DRAW_ARRAYS_COMMAND_NV 
DRAW_ELEMENTS_STRIP_COMMAND_NV 
DRAW_ARRAYS_STRIP_COMMAND_NV 
glGetCommandHeaderNV() 
DRAW_ELEMENTS_INSTANCED_COMMAND_NV 
DRAW_ARRAYS_INSTANCED_COMMAND_NV 
glGetCommandHeaderNV()
27 
Siggraph Asia 2014 
TERMINATE_SEQUENCE: Occlusion Culling Now overcomes deficit of previous methods http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering- techniques.pdf Enqueue all tokens for entire object Shader to Check Visibility; Shader for SCAN operation; Add TERMINATE_SEQUENCE_COMMAND_NV to end 
Technique 
Draw time GROUPED 
Timing 
GPU (ms) 
CPU (ms) 
6.3 2.6 x 
0.9 6 x 
CULL Old Bindless MDI (*) 
CULL NEW TOKEN native 
0.2 27 x 
glBindBufferRange + drawcalls 
16.8 
5.4 
6.2 2.7 x 
6.3 0.85 x 
CULL Old Readback (stalls pipe) (*) 
K5000, Preliminary results, (*) taken from slightly differnt framework 
HW TOKEN-buffers 
~0 BIG x 
No instancing! 
(synthetic test, data replication) 
SCENE: 
materials: 138 
objects: 13,032 
geometries: 3,312 
parts: 789,464 
triangles: 29,527,840 
vertices: 27,584,376 
16.8 
5.3 3.1 x
28 
Siggraph Asia 2014 Just a little... “commandBindableNV” turns binding IDs to Hardware IDs Texturing must go through bindless 
ARB_bindless_texture Regular Texture binding Turned Off Changing UBO Ptr Addresses == texture changes (here ‘myBuf’) 
Does Shaders Need Special Code ? 
#extension GL_ARB_bindless_texture : require #extension GL_NV_command_list : enable #if GL_NV_command_list 
layout(commandBindableNV) uniform; #endif layout(std140,binding=2) uniform myBuf { 
sampler2D mySampler; 
//or: 
in int whichSampler; 
}; layout(std140,binding=1) uniform samplers { sampler2D allSamplers[NUM_TEXTURES]; }; 
… 
c = texture(mySampler, tc); 
//or 
c = texture(allSamplers [ whichSampler ] , tc); 
… 
Example:
29 
Siggraph Asia 2014 
State Objects StateObject Encapsulates majority of state (fbo format, active shader, blend, depth ...) State Object immutable: gives more control over validation cost no bindings captured: bindless used instead; textures passed via UBO Primitive type is a state: participates to important validation work 
Cannot be put in the token buffer 
glCaptureStateNV(stateobject, GL_TRIANGLES ); 
Primitive type as argument: 
because it is part of states
30 
Siggraph Asia 2014 Render entire scenes with different shaders/fbos... in one go 
Driver caches state transitions ! 
Draw With State Objects 
glDrawCommandsStatesNV( 
tokenBuffer, 
offsets[], sizes[], 
states[], fbos[], 
count); 
Token buffer 
VBO address … 
VBO address … 
EBO address … 
Uniform Matrix 
Uniform Material 
DrawElements 
Uniform Material 
DrawElements 
… 
EBO address … 
Uniform Matrix 
Uniform Material 
DrawElements 
Uniform Material 
DrawElements 
FBO 1 
FBO 2 
fbos[] 
FBO 1 
FBO 1 
State Obj 1 
State Obj 2 
State Obj 1 
State Obj 4 
Count = 4 
States[] 
offsets[] 
sizes[0] 
sizes[1] 
sizes[2] 
sizes[3]
31 
Siggraph Asia 2014 Driver bakes „Diff“ between states and apply only the difference 
Pseudo Code of glDrawCommandsStatesNV 
for i < count { 
if (i == 0) set state from states[i]; 
else set state transition states[i-1] to states[i] 
if (fbo[i]) { 
// fbo[i] must be compatible with states[i].fbo 
glBindFramebuffer( fbo[i] ) 
} else glBindFramebuffer( states[i].fbo ) 
ProcessCommandSequence(...tokenBuffer, offsets[i], sizes[i]) 
}
32 
Siggraph Asia 2014 Rendering must always happen in a FBO Backbuffer Ptr. Address is un-reliable (dynamic re-alloc when resizing…) Use glBlitFramebuffer() for FBO  final Backbuffer result FBO Resources Must be Resident (glMakeTextureHandleResidentARB()) Resources can be replaced/swapped if they keep the same base configuration (attachment formats, # drawbuffers...) I.e. Same configuration as captured FBO settings from stateobject This is what happens when resizing: 
Detach & Destroy resource  create new ones  attach them to FBO(s) 
Frame-Buffer Objects And State Objects
33 
Siggraph Asia 2014 Combine multiple token buffers & states into a CommandList object glCreateCommandListsNV(N, &list) glCommandListSegmentsNV(list, segs) Convenient for concatenation glListDrawCommandsStatesClientNV(list, segment, token-buffer-ptrs, token-buffer- sizes, states, FBOs, count) glCompileCommandListNV(list); glCallCommandListNV(list) 
Command-List Object 
System mem. 
VBO address … 
VBO address … 
EBO address … 
Uniform Matrix 
Uniform Material 
DrawElements 
… 
Uniform Material 
DrawElements 
FBO 
State Object 
System mem. 
VBO address … 
VBO address … 
EBO address … 
Uniform Matrix 
Uniform Material 
DrawElements 
… 
FBO 
State Object 
System mem. 
VBO address … 
VBO address … 
Uniform Matrix 
Uniform Material 
DrawArrays 
… 
Uniform Material 
DrawArrays 
FBO 
State Object 
… 
Command-List Segment #0 Segment #1
34 
Siggraph Asia 2014 
Command-List Object Less flexibilty compared to token buffer Token buffers taken from client pointers (CPU memory) Driver will optimize these token buffers and related state-objects Optimize State transitions; token cmds for the GPU Front-End Resulting command-list is immutable If any resource pointer changed, the command-list must be recompiled But still rather flexible possible to change content of referenced buffers! (matrices, materials, vertices...) Animation; skinning... No need for command-list recompilation
35 
Siggraph Asia 2014 Command-List doesn’t pretend to solve any OpenGL multi-threading Multi-threading can help if complex CPU work going-on update of Vertex / Element buffers: transform. Matrices; skinning… Update of Token Buffers: adding/removing characters in a game scene… OpenGL & Multithreading issues: OpenGL is bad at sharing its access from various threads Make-Current or Multiple Contexts aren’t good solutions Prefer to keep context ownership to one main thread 
OpenGL And Multi-threading ?
36 
Siggraph Asia 2014 Token Buffers and Vertex/Elements/Uniform Buffers (VBO/UBO) Token Buffer Objects; VBOs and UBOs owned by OpenGL Multi-threading can be used for data update if: Work on CPU system memory; then push data back to OpenGL (glBufferSubData) Or Request for a pointer (glMapBuffers); then work with it from the thread State Objects: State Object and their 'Capture' must be handled in OpenGL context No way to do it on a separate threads that don’t have OpenGL context 
Multi-threading And Command-List
37 
Siggraph Asia 2014 
Multi-threading Use-Case Example #1 
Thread #0 
OpenGL Ctxt 
Thread #1 
Thread #1 
Submit 
States 
PushAttrib() 
set states & and capture 
PopAttribs() 
Get State ID 
PushAttrib() 
set states & and capture 
PopAttribs() 
Submit States 
unmap Token buf. 
ask for 
Token buf. 
Ptr 
Build state list 
ask for 
Token buf. 
Ptr 
unmap 
Token buf. 
Build token cmds 
Get State ID 
Build token cmds 
Build token cmds 
ask for 
Token buf. 
Ptr 
unmap Token buf. 
ask for 
Token buf. 
Ptr 
... 
ask for 
Token buf. 
Ptr 
Build token 
cmds 
unmap Token buf. 
Build token cmds 
Build state 
list
38 
Siggraph Asia 2014 
Multi-threading Use-Case Example #2 
Split state capture from Token-buffer creation 
Thread 1 
Thread 2 
Thread 3 
Build Token buffers 
in CPU memory 
Thread 0 
OpenGL 
Ctxt 
Walk through whole scene 
and build State-Objects 
Receive Token-Buffers 
and issue them 
as Token Buffer Objects 
to OpenGL
39 
Siggraph Asia 2014 
Migration Steps All rendering via FBO (statecapture doesn‘t support default backbuffer) No legacy states in GLSL use custom uniforms no built-in uniforms (gl_ModelView...) use glVertexAttribPointer (no glTexCoordPointer etc.) No classic uniforms, pack them all in UBO ARB_bindless_texture for texturing Bindless Buffers ARB_vertex_attrib_binding combined with NV_vertex_buffer_unified_memory (VBUM)... Organize for StateObject reuse state capture expensive: Capture them at init. time; Factorize their use in the scene
40 
Siggraph Asia 2014 
Migration Tips Vertex Attributes and bindless VBO http://on-demand.gputechconf.com/siggraph/2014/presentation/SG4117- OpenGL-Scene-Rendering-Techniques.pdf (slide 11-16) GLSL 
// classic attributes 
... 
normal = gl_Normal; 
gl_Position = gl_Vertex; 
// generic attributes 
// ideally share this definition across C and GLSL 
#define VERTEX_POS 0 
#define VERTEX_NORMAL 1 
in layout(location= VERTEX_POS) vec4 attr_Pos; 
in layout(location= VERTEX_NORMAL) vec3 attr_Normal; 
... 
normal = attr_Normal; 
gl_Position = attr_Pos;
41 
Siggraph Asia 2014 
Migration Tips UBO Parameter management http://on-demand.gputechconf.com/siggraph/2014/presentation/SG4117- OpenGL-Scene-Rendering-Techniques.pdf ( 18-27, 44-47) Ideally group by frequency of change 
// classic uniforms 
uniform samplerCube viewEnvTex; 
uniform vec4 materialColor; 
uniform sampler2D materialTex; 
... 
// UBO usage, bindless texture inside UBO, grouped by change layout(std140, binding=0, commandBindableNV) uniform view { samplerCube texViewEnv; }; layout(std140, binding=1, commandBindableNV) uniform material { vec4 materialColor; sampler2D texMaterialColor; }; ...
42 
Siggraph Asia 2014 OpenGL Devtech 'Proviz' samples soon available on GitHub ! check https://github.com/nvpro-samples on a regular basis working on ramping-up https://developer.nvidia.com/samples & https://developer.nvidia.com/professional-graphics command-lists examples to come here this Dec. 2014 Many other samples will be pushed here as standalone repositories Beta Driver for Command-Lists: Release 346 for Dec.2014 Special Thanks to Pierre Boudier and Christoph Kubisch Questions/Feedback: tlorach@nvidia.com / ckubisch@nvidia.com 
Thanks!

Mais conteúdo relacionado

Mais procurados

Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunElectronic Arts / DICE
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsElectronic Arts / DICE
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and MoreMark Kilgard
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The SurgeMichele Giacalone
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3guest11b095
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)Johan Andersson
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Philip Hammer
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsJohan Andersson
 

Mais procurados (20)

Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666Siggraph2016 - The Devil is in the Details: idTech 666
Siggraph2016 - The Devil is in the Details: idTech 666
 
Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)Rendering Technologies from Crysis 3 (GDC 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
 
DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3DirectX 11 Rendering in Battlefield 3
DirectX 11 Rendering in Battlefield 3
 
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The RunFive Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
 
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
 
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)Graphics Gems from CryENGINE 3 (Siggraph 2013)
Graphics Gems from CryENGINE 3 (Siggraph 2013)
 
Masked Occlusion Culling
Masked Occlusion CullingMasked Occlusion Culling
Masked Occlusion Culling
 
Frostbite on Mobile
Frostbite on MobileFrostbite on Mobile
Frostbite on Mobile
 
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open ProblemsHPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
OpenGL 3.2 and More
OpenGL 3.2 and MoreOpenGL 3.2 and More
OpenGL 3.2 and More
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Screen Space Reflections in The Surge
Screen Space Reflections in The SurgeScreen Space Reflections in The Surge
Screen Space Reflections in The Surge
 
Secrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics TechnologySecrets of CryENGINE 3 Graphics Technology
Secrets of CryENGINE 3 Graphics Technology
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
A Bit More Deferred Cry Engine3
A Bit More Deferred   Cry Engine3A Bit More Deferred   Cry Engine3
A Bit More Deferred Cry Engine3
 
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09) 	 Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
 
Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2Bindless Deferred Decals in The Surge 2
Bindless Deferred Decals in The Surge 2
 
The Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next StepsThe Rendering Pipeline - Challenges & Next Steps
The Rendering Pipeline - Challenges & Next Steps
 

Semelhante a OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead

Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02RubnCuesta2
 
GTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdGTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdMark Kilgard
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVMJohn Lee
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityMark Kilgard
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Prabindh Sundareson
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareDaniel Blezek
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsPrabindh Sundareson
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyMark Kilgard
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoonmochimedia
 

Semelhante a OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead (20)

Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02Commandlistsiggraphasia2014 141204005310-conversion-gate02
Commandlistsiggraphasia2014 141204005310-conversion-gate02
 
GTC 2009 OpenGL Barthold
GTC 2009 OpenGL BartholdGTC 2009 OpenGL Barthold
GTC 2009 OpenGL Barthold
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Android RenderScript on LLVM
Android RenderScript on LLVMAndroid RenderScript on LLVM
Android RenderScript on LLVM
 
NVIDIA's OpenGL Functionality
NVIDIA's OpenGL FunctionalityNVIDIA's OpenGL Functionality
NVIDIA's OpenGL Functionality
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
 
OpenGL 4 for 2010
OpenGL 4 for 2010OpenGL 4 for 2010
OpenGL 4 for 2010
 
General Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics HardwareGeneral Purpose Computing using Graphics Hardware
General Purpose Computing using Graphics Hardware
 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
 
OpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI PlatformsOpenGL ES based UI Development on TI Platforms
OpenGL ES based UI Development on TI Platforms
 
Gpu
GpuGpu
Gpu
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
NVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and TransparencyNVIDIA Graphics, Cg, and Transparency
NVIDIA Graphics, Cg, and Transparency
 
Unite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platformsUnite 2013 optimizing unity games for mobile platforms
Unite 2013 optimizing unity games for mobile platforms
 
FGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie TycoonFGS 2011: Making A Game With Molehill: Zombie Tycoon
FGS 2011: Making A Game With Molehill: Zombie Tycoon
 
FIR filter on GPU
FIR filter on GPUFIR filter on GPU
FIR filter on GPU
 
GPU - how can we use it?
GPU - how can we use it?GPU - how can we use it?
GPU - how can we use it?
 

Último

Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 

Último (20)

Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 

OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead

  • 1. Siggraph Asia 2014 Tristan Lorach Manager of Devtech for Professional Visualization group OPENGL NVIDIA "COMMAND-LIST": "APPROACHING ZERO DRIVER OVERHEAD"
  • 2. 2 Siggraph Asia 2014 GPU Scalability GPUs are extremely powerful But only if used correctly Mix of Application and Graphic API (Driver) responsibility Increase amount of work per batch (Job) Minimize CPUGPU interactions Lower Memory traffic Lower API calls: Batch things together Factorize Data Re-use data uploaded to Video memory (instancing…)
  • 3. 3 Siggraph Asia 2014 Use of the GPU Games ~1500 to 3000 Drawcalls for a scene Intensive use of Multi-layer image processing over the scene Heavy shaders CAD/FCC/professional applications Hard to batch user’s works (CAD applications) ~10,000 to… 300,000 Drawcalls for a scene: heavy CPU workload for our driver Shaders simpler than games (but catching-up these days...) Post-Processing more and more used (SSAO)
  • 4. 4 Siggraph Asia 2014 Use of the GPU New Graphic APIs are trying to address these concerns Khronos (Announced it at Siggraph Vancouver 2014) Metal Mantle DX12 All propose better ways to issue commands and render-states But can we improve OpenGL to go the same path ? Yes: NV_command_list
  • 5. 5 Siggraph Asia 2014 OpenGL OpenGL is ~25 years old (!!) 1.x: fixed function 2.x: Shading language (GLSL) 3.x/4.x: more as a massive compute system More graphic stages (Geom.Shader; Tessellation) High flexibility & efficiency on memory accesses High programability Design "flaws": still “State Machine”; ServerClient; poor multi-threading management 1992 2012
  • 6. 6 Siggraph Asia 2014 Challenge of Issuing Commands Issuing drawcalls and state changes can be a real bottleneck 650,000 Triangles 68,000 Parts ~ 10 Triangles per part 3,700,000 Triangles 98 000 Parts ~ 37 Triangles per part 14,338,275 Triangles/lines 300,528 drawcalls (parts) ~ 48 Triangles per part App + driver GPU GPU idle CPU Excessive Work from App & Driver On CPU ! ! courtesy of PTC
  • 7. 7 Siggraph Asia 2014 Big Picture – Typical Case Vertex Puller (IA) Vertex Shader TCS (Tessellation) TES (Tessellation) Tessellator Geometry Shader Transform Feedback Rasterization Fragment Shader Per-Fragment Ops Framebuffer Tr. Feedback buffer Uniform Block Texture Fetch Image Load/Store Atomic Counter Shader Storage Element buffer (EBO) Draw Indirect Buffer Vertex Buffer (VBO) Front-End (decoder) Cmd bundles OpenGL Driver Application Push-Buffer (FIFO) cmds FBO resources (Textures / RB) 64 bits pointers Handles (IDs) Id  64 bits Addr. OpenGL Commands OpenGL resources
  • 8. 8 Siggraph Asia 2014 Application Token-buffer OpenGL Driver Big Picture Vertex Puller (IA) Vertex Shader TCS (Tessellation) TES (Tessellation) Tessellator Geometry Shader Transform Feedback Rasterization Fragment Shader Per-Fragment Ops Framebuffer Tr. Feedback buffer Uniform Block Texture Fetch Image Load/Store Atomic Counter Shader Storage Element buffer (EBO) Draw Indirect Buffer Vertex Buffer (VBO) Front-End (decoder) Cmd bundles Push-Buffer (FIFO) cmds FBO resources (Textures / RB) OpenGL Commands resources 64 bits Pointers (bindless) 64 bits GPU Address Offload cmd bundles creation to the App.
  • 9. 9 Siggraph Asia 2014 Application Command-list Token-buffer OpenGL Driver Big Picture Vertex Puller (IA) Vertex Shader TCS (Tessellation) TES (Tessellation) Tessellator Geometry Shader Transform Feedback Rasterization Fragment Shader Per-Fragment Ops Framebuffer Tr. Feedback buffer Uniform Block Texture Fetch Image Load/Store Atomic Counter Shader Storage Element buffer (EBO) Draw Indirect Buffer Vertex Buffer (VBO) Front-End (decoder) Cmd bundles Push-Buffer (FIFO) cmds FBO resources (Textures / RB) OpenGL Commands Token-buffers + state objects resources 64 bits Pointers (bindless) State Object More work for FE – but fast !
  • 10. 10 Siggraph Asia 2014 Demo – Quadro K5000 CAD Car model 14,338,275 Primitives 300,528 drawcalls 348,862 attribute update 12,004 uniform update Set of CAD models together 29,344,075 Primitives 26,144 drawcalls 18,371 attribute update 13,632 uniform update
  • 11. 11 Siggraph Asia 2014 DemoS – Quadro K5000 CAD Car model K5000: 920,363,200 primitives/S 19,290,668 drawcalls/S
  • 12. 12 Siggraph Asia 2014 DemoS – Quadro K5000 Set of CAD models together K5000 1,211,684,096 primitives/S 1,079,545 drawcalls/S
  • 13. 13 Siggraph Asia 2014 DemoS – Maxwell CAD Car model K5000: 920,363,200 primitives/S 19,290,668 drawcalls/S Maxwell GM204 (GeForce): 1,782,257,920 primitives/S 37,355,848 drawcalls/S Drawcall time : 8 Micro- seconds on CPU (!) Set of CAD models together K5000 1,211,684,096 primitives/S 1,079,545 drawcalls/S Maxwell GM204 (GeForce) 3,012,795,392 primitives/S 2,684,239 drawcalls/S Drawcall time: 42 Micro- seconds on CPU !
  • 14. 14 Siggraph Asia 2014 More Performances Another test: always using bindless UBO and VBO CPU-Emulation: Token stream executed as CPU "switch-case" Strategies Un-grouped: ~ 1 MB, 79,000 Tokens, ~68,000 drawcalls Material-grouped: ~300 KB, 22,000 Tokens, ~11,000 drawcalls Strategy Draw time Material-grouped 0.73 HW TOKEN-Buffers 0.56 Technique GPU (ms) 1.4 1.14 x ~0 BIG x CPU (ms) Draw time un-grouped 2.4 1.1 GPU (ms) 4.5 1.6 x ~0 BIG x CPU (ms) CPU-emulated TOKEN-buffers glBindBufferRange + drawcalls 0.73 1.6 3.5 7.2 Preliminary results K6000
  • 15. 15 Siggraph Asia 2014 More Performances 5 000 shader changes: toggling between two shaders in „shaded & edges“ 5 000 fbo changes: similar as above but with fbo toggle instead of shader Almost no additional cost compared to rendering without fbo changes Timing GPU (ms) CPU (ms) CPU-emulated TOKEN-buffers 12.7 15.1 2.9 4.3 x 0.4 37 x HW TOKEN-buffers 2.8 4.5 x 0.005 BIG x Command-Lists Timer GPU (ms) CPU (ms) CPU-emulated TOKEN-buffers 60.0 60.0 1.8 33 x 0.9 66 x HW TOKEN-buffers 1.7 35 x 0.022 BIG x Command-Lists Preliminary results on K5000
  • 16. 16 Siggraph Asia 2014 GPU Virtual Memory Bindless Technology What is it about? Work from native GPU pointers/handles (NVIDIA pioneered this technology) A lot less CPU work (memory hopping, validation...) Allow GPU to use flexible data structures Bindless Buffers Vertex & Global memory since Tesla Generation (CUDA capable) Bindless Textures Since Kepler Bindless Constants (UBO) New driver feature, support for Fermi and above Bindless plays a central role for Command-List Vertex Puller (IA) Vertex Shader TCS (Tessellation) TES (Tessellation) Tessellator Geometry Shader Transform Feedback Rasterization Fragment Shader Per-Fragment Ops Framebuffer Uniform Block Texture Fetch Element buffer (EBO) Vertex Buffer (VBO) Front-End (decoder) Push buffer 64 bits address Attr.s Idx Uniform Block Send Ptrs
  • 17. 17 Siggraph Asia 2014 #define UBA UNIFORM_BUFFER_ADDRESS_NV UpdateBuffers(); glEnableClientState(UNIFORM_BUFFER_UNIFIED_NV); glBufferAddressRangeNV (UBA, 0, addrView, viewSize); foreach (obj in scene) { ... // glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); glBufferAddressRangeNV(UBA, 1, addrMatrices + obj.matrixOffset, maSize); foreach ( batch in obj.primitive_group_material) { // glBindBufferRange (UBO, 2, uboMaterial, batch.materialOffset, mtlSize); glBufferAddressRangeNV(UBA, 2, addrMaterial + batch.materialOffset, mtlSize); ... } } Example On Using Bindless UBO New pointer for UBO#1 updated per Object New pointer for UBO#2 updated per Primitive group pointer for UBO#0 updated once for all objects regular UBO binds are now ignored!
  • 18. 18 Siggraph Asia 2014 Pointers must be aligned So do Ptr Offsets glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &offsetAlignment); Normally: 256 bytes  gaps between each item Try to fit as much data as possible… Not the case if passing an index for array access (MDI + “Base-instance”) But requires special GLSL code (indexing) Bindless And Memory Alignement GPU Virtual Memory 64 bits address Uniform Material 0 Uniform Material 1 Uniform Material 2 … glBufferAddressRangeNV (UBA, 2, addrMaterial + batch.materialOffset , mtlSize); 256b Uniform array materials[] Material 0 Material 1 Material 2 … Uniform …
  • 19. 19 Siggraph Asia 2014 NV_command_list Key Concepts 1.Tokenized Rendering: some state changes and draw commands are encoded into binary data stream Depends on bindless technology 2.State Objects Whole OpenGL States (program, blending...) captured as an object Allows pre-validation of state combinations, later reuse of objects is very fast 3.Command List Object „Display-list“ paradigm but more flexible: buffer are outside (referenced by address), so content can still be modified (matrics, vertices...)
  • 20. 20 Siggraph Asia 2014 UpdateBuffers(); glBindBufferBase (UBO, 0, uboView); foreach (obj in scene) { // redundancy filter for these (if (used != last)... glBindVertexBuffer (0, obj.geometry->vbo, 0, vtxSize); glBindBuffer (ELEMENT, obj.geometry->ibo); glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); // iterate over cached material groups foreach ( batch in obj.materialGroups) { glBindBufferRange (UBO, 2, uboMaterial, batch.materialOffset, mtlSize); glMultiDrawElements (...); } } Typical Scene Drawing Example ~ 2 500 api drawcalls ~11 000 drawcalls ~55 triangles per call
  • 21. 21 Siggraph Asia 2014 UpdateBuffers(); glBindBufferBase (UBO, 0, uboView); foreach (obj in scene) { // redundancy filter for these (if (used != last)... glBindVertexBuffer (0, obj.geometry->vbo, 0, vtxSize); glBindBuffer (ELEMENT, obj.geometry->ibo); glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); // iterate over all parts individually foreach ( part in obj.parts) { if (part.material != lastMaterial){ glBindBufferRange (UBO, 2, uboMaterial, part.materialOffset, mtlSize); } glDrawElements (...); } } Typical Scene Drawing Example ~68 000 drawcalls ~10 triangles per call
  • 22. 22 Siggraph Asia 2014 Scene Drawing Converted To Token Buffer glDrawCommandsNV (GL_TRIANGLES, tokenBuffer , offsets[], sizes[], count); // {0}, {bufferSize}, 1 foreach (obj in scene) { glBufferAddressRangeNV(VERTEX.., 0, obj.geometry->addrVBO, ...); glBufferAddressRangeNV(ELEMENT..., 0, obj.geometry->addrIBO, ...); glBufferAddressRangeNV(UNIFORM..., 1, addrMatrices ...); foreach ( batch in obj.materialGroups) { glBufferAddressRangeNV(UNIFORM, 2, addrMaterial ...); glMultiDrawElements(...) } } becomes a single drawcall replaces 80k calls to GL (for graphic-card model) Token buffer Set Attr#0 on VBO address … Set Attr#1 on VBO address … Set Elements on EBO address … Uniform Matrix on UBO address … Uniform Material on UBO address … DrawElements Uniform Material on UBO address … DrawElements Obj#1 Obj#2 … For all objects …
  • 23. 23 Siggraph Asia 2014 Tokens-buffers are tightly packed structs in linear memory Token Buffer Structures TokenUbo { GLuint header; UniformAddressCommandNV { GLushort index; GLushort stage; GLuint64 address; } cmd; } TokenVbo { GLuint header; AttributeAddressCommandNV { GLuint index; GLuint64 address; } cmd; } TokenIbo { GLuint header; ElementAddressCommandNV { GLuint64 address; GLuint typeSizeInByte; } cmd; } TokenDrawElements { Gluint header; DrawElementsCommandNV { GLuint count; GLuint firstIndex; GLuint baseVertex; } cmd; } Token buffer Set Attr#0 on VBO address … Set Attr#1 on VBO address … Set Elements on EBO address … Uniform Matrix on UBO address … Uniform Material on UBO address … DrawElements Uniform Material on UBO address … DrawElements … Uniform Material on UBO address … DrawElements Uniform Material on UBO address … DrawElements Uniform Material on UBO address … DrawElements Uniform Material on UBO address … DrawElements Uniform Material on UBO address … DrawElements
  • 24. 24 Siggraph Asia 2014 Tokenized Rendering Token Headers are GPU commands, followed by arguments FRONTFACE_COMMAND_NV BLEND_COLOR_COMMAND_NV STENCIL_REF_COMMAND_NV LINE_WIDTH_COMMAND_NV POLYGON_OFFSET_COMMAND_NV ALPHA_REF_COMMAND_NV VIEWPORT_COMMAND_NV SCISSOR_COMMAND_NV ELEMENT_ADDRESS_COMMAND_NV ATTRIBUTE_ADDRESS_COMMAND_NV UNIFORM_ADDRESS_COMMAND_NV DRAW_ELEMENTS_COMMAND_NV DRAW_ARRAYS_COMMAND_NV DRAW_ELEMENTS_STRIP_COMMAND_NV DRAW_ARRAYS_STRIP_COMMAND_NV DRAW_ELEMENTS_INSTANCED_COMMAND_NV DRAW_ARRAYS_INSTANCED_COMMAND_NV TERMINATE_SEQUENCE_COMMAND_NV NOP_COMMAND_NV Final list might change Render States Index Buffer; Position Attribute; Normals; Texcoords... Matrices; Material data... Draw Commands ! Special Commands (see later)
  • 25. 25 Siggraph Asia 2014 Tokenized Rendering Example: UBO Address assignment structure FRONTFACE_COMMAND_NV BLEND_COLOR_COMMAND_NV STENCIL_REF_COMMAND_NV LINE_WIDTH_COMMAND_NV POLYGON_OFFSET_COMMAND_NV ALPHA_REF_COMMAND_NV VIEWPORT_COMMAND_NV SCISSOR_COMMAND_NV ELEMENT_ADDRESS_COMMAND_NV ATTRIBUTE_ADDRESS_COMMAND_NV UNIFORM_ADDRESS_COMMAND_NV DRAW_ELEMENTS_COMMAND_NV DRAW_ARRAYS_COMMAND_NV DRAW_ELEMENTS_STRIP_COMMAND_NV DRAW_ARRAYS_STRIP_COMMAND_NV DRAW_ELEMENTS_INSTANCED_COMMAND_NV DRAW_ARRAYS_INSTANCED_COMMAND_NV TERMINATE_SEQUENCE_COMMAND_NV NOP_COMMAND_NV TokenUbo { GLuint header; UniformAddressCommandNV { GLushort index; GLushort stage; GLuint64 address; } cmd; } Example Real Token to be queried header = glGetCommandHeaderNV( UNIFORM_ADDRESS_COMMAND_NV, Sizeof(TokenUbo) ) One Shading stage to target (not ‘program’) Stage = glGetStageIndexNV(S) (S: STAGE_VERTEX|GEOMETRY|FRAGMENT|TESS_CONTROL/EVAL) Hardware Index *not* Uniform Program index
  • 26. 26 Siggraph Asia 2014 TOKENIZED RENDERING Allows rendering of a mix of LIST, STRIP, FAN, LOOP modes in one go Pass „basic type“ as Token, use „mode“ for more in "Instanced" drawing TokenDrawElements { Gluint header; DrawElementsCommandNV { GLuint count; GLuint firstIndex; GLuint baseVertex; } cmd; } TokenDrawElementsInstanced { Gluint header; DrawElementsInstancedCommandNV { GLuint mode; GLuint count; GLuint instanceCount; GLuint firstIndex; GLuint baseVertex; GLuint baseInstance; } cmd; } LIST, STRIP, FAN, LOOP types (GL_LINE_LOOP...) DRAW_ELEMENTS_COMMAND_NV DRAW_ARRAYS_COMMAND_NV DRAW_ELEMENTS_STRIP_COMMAND_NV DRAW_ARRAYS_STRIP_COMMAND_NV glGetCommandHeaderNV() DRAW_ELEMENTS_INSTANCED_COMMAND_NV DRAW_ARRAYS_INSTANCED_COMMAND_NV glGetCommandHeaderNV()
  • 27. 27 Siggraph Asia 2014 TERMINATE_SEQUENCE: Occlusion Culling Now overcomes deficit of previous methods http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering- techniques.pdf Enqueue all tokens for entire object Shader to Check Visibility; Shader for SCAN operation; Add TERMINATE_SEQUENCE_COMMAND_NV to end Technique Draw time GROUPED Timing GPU (ms) CPU (ms) 6.3 2.6 x 0.9 6 x CULL Old Bindless MDI (*) CULL NEW TOKEN native 0.2 27 x glBindBufferRange + drawcalls 16.8 5.4 6.2 2.7 x 6.3 0.85 x CULL Old Readback (stalls pipe) (*) K5000, Preliminary results, (*) taken from slightly differnt framework HW TOKEN-buffers ~0 BIG x No instancing! (synthetic test, data replication) SCENE: materials: 138 objects: 13,032 geometries: 3,312 parts: 789,464 triangles: 29,527,840 vertices: 27,584,376 16.8 5.3 3.1 x
  • 28. 28 Siggraph Asia 2014 Just a little... “commandBindableNV” turns binding IDs to Hardware IDs Texturing must go through bindless ARB_bindless_texture Regular Texture binding Turned Off Changing UBO Ptr Addresses == texture changes (here ‘myBuf’) Does Shaders Need Special Code ? #extension GL_ARB_bindless_texture : require #extension GL_NV_command_list : enable #if GL_NV_command_list layout(commandBindableNV) uniform; #endif layout(std140,binding=2) uniform myBuf { sampler2D mySampler; //or: in int whichSampler; }; layout(std140,binding=1) uniform samplers { sampler2D allSamplers[NUM_TEXTURES]; }; … c = texture(mySampler, tc); //or c = texture(allSamplers [ whichSampler ] , tc); … Example:
  • 29. 29 Siggraph Asia 2014 State Objects StateObject Encapsulates majority of state (fbo format, active shader, blend, depth ...) State Object immutable: gives more control over validation cost no bindings captured: bindless used instead; textures passed via UBO Primitive type is a state: participates to important validation work Cannot be put in the token buffer glCaptureStateNV(stateobject, GL_TRIANGLES ); Primitive type as argument: because it is part of states
  • 30. 30 Siggraph Asia 2014 Render entire scenes with different shaders/fbos... in one go Driver caches state transitions ! Draw With State Objects glDrawCommandsStatesNV( tokenBuffer, offsets[], sizes[], states[], fbos[], count); Token buffer VBO address … VBO address … EBO address … Uniform Matrix Uniform Material DrawElements Uniform Material DrawElements … EBO address … Uniform Matrix Uniform Material DrawElements Uniform Material DrawElements FBO 1 FBO 2 fbos[] FBO 1 FBO 1 State Obj 1 State Obj 2 State Obj 1 State Obj 4 Count = 4 States[] offsets[] sizes[0] sizes[1] sizes[2] sizes[3]
  • 31. 31 Siggraph Asia 2014 Driver bakes „Diff“ between states and apply only the difference Pseudo Code of glDrawCommandsStatesNV for i < count { if (i == 0) set state from states[i]; else set state transition states[i-1] to states[i] if (fbo[i]) { // fbo[i] must be compatible with states[i].fbo glBindFramebuffer( fbo[i] ) } else glBindFramebuffer( states[i].fbo ) ProcessCommandSequence(...tokenBuffer, offsets[i], sizes[i]) }
  • 32. 32 Siggraph Asia 2014 Rendering must always happen in a FBO Backbuffer Ptr. Address is un-reliable (dynamic re-alloc when resizing…) Use glBlitFramebuffer() for FBO  final Backbuffer result FBO Resources Must be Resident (glMakeTextureHandleResidentARB()) Resources can be replaced/swapped if they keep the same base configuration (attachment formats, # drawbuffers...) I.e. Same configuration as captured FBO settings from stateobject This is what happens when resizing: Detach & Destroy resource  create new ones  attach them to FBO(s) Frame-Buffer Objects And State Objects
  • 33. 33 Siggraph Asia 2014 Combine multiple token buffers & states into a CommandList object glCreateCommandListsNV(N, &list) glCommandListSegmentsNV(list, segs) Convenient for concatenation glListDrawCommandsStatesClientNV(list, segment, token-buffer-ptrs, token-buffer- sizes, states, FBOs, count) glCompileCommandListNV(list); glCallCommandListNV(list) Command-List Object System mem. VBO address … VBO address … EBO address … Uniform Matrix Uniform Material DrawElements … Uniform Material DrawElements FBO State Object System mem. VBO address … VBO address … EBO address … Uniform Matrix Uniform Material DrawElements … FBO State Object System mem. VBO address … VBO address … Uniform Matrix Uniform Material DrawArrays … Uniform Material DrawArrays FBO State Object … Command-List Segment #0 Segment #1
  • 34. 34 Siggraph Asia 2014 Command-List Object Less flexibilty compared to token buffer Token buffers taken from client pointers (CPU memory) Driver will optimize these token buffers and related state-objects Optimize State transitions; token cmds for the GPU Front-End Resulting command-list is immutable If any resource pointer changed, the command-list must be recompiled But still rather flexible possible to change content of referenced buffers! (matrices, materials, vertices...) Animation; skinning... No need for command-list recompilation
  • 35. 35 Siggraph Asia 2014 Command-List doesn’t pretend to solve any OpenGL multi-threading Multi-threading can help if complex CPU work going-on update of Vertex / Element buffers: transform. Matrices; skinning… Update of Token Buffers: adding/removing characters in a game scene… OpenGL & Multithreading issues: OpenGL is bad at sharing its access from various threads Make-Current or Multiple Contexts aren’t good solutions Prefer to keep context ownership to one main thread OpenGL And Multi-threading ?
  • 36. 36 Siggraph Asia 2014 Token Buffers and Vertex/Elements/Uniform Buffers (VBO/UBO) Token Buffer Objects; VBOs and UBOs owned by OpenGL Multi-threading can be used for data update if: Work on CPU system memory; then push data back to OpenGL (glBufferSubData) Or Request for a pointer (glMapBuffers); then work with it from the thread State Objects: State Object and their 'Capture' must be handled in OpenGL context No way to do it on a separate threads that don’t have OpenGL context Multi-threading And Command-List
  • 37. 37 Siggraph Asia 2014 Multi-threading Use-Case Example #1 Thread #0 OpenGL Ctxt Thread #1 Thread #1 Submit States PushAttrib() set states & and capture PopAttribs() Get State ID PushAttrib() set states & and capture PopAttribs() Submit States unmap Token buf. ask for Token buf. Ptr Build state list ask for Token buf. Ptr unmap Token buf. Build token cmds Get State ID Build token cmds Build token cmds ask for Token buf. Ptr unmap Token buf. ask for Token buf. Ptr ... ask for Token buf. Ptr Build token cmds unmap Token buf. Build token cmds Build state list
  • 38. 38 Siggraph Asia 2014 Multi-threading Use-Case Example #2 Split state capture from Token-buffer creation Thread 1 Thread 2 Thread 3 Build Token buffers in CPU memory Thread 0 OpenGL Ctxt Walk through whole scene and build State-Objects Receive Token-Buffers and issue them as Token Buffer Objects to OpenGL
  • 39. 39 Siggraph Asia 2014 Migration Steps All rendering via FBO (statecapture doesn‘t support default backbuffer) No legacy states in GLSL use custom uniforms no built-in uniforms (gl_ModelView...) use glVertexAttribPointer (no glTexCoordPointer etc.) No classic uniforms, pack them all in UBO ARB_bindless_texture for texturing Bindless Buffers ARB_vertex_attrib_binding combined with NV_vertex_buffer_unified_memory (VBUM)... Organize for StateObject reuse state capture expensive: Capture them at init. time; Factorize their use in the scene
  • 40. 40 Siggraph Asia 2014 Migration Tips Vertex Attributes and bindless VBO http://on-demand.gputechconf.com/siggraph/2014/presentation/SG4117- OpenGL-Scene-Rendering-Techniques.pdf (slide 11-16) GLSL // classic attributes ... normal = gl_Normal; gl_Position = gl_Vertex; // generic attributes // ideally share this definition across C and GLSL #define VERTEX_POS 0 #define VERTEX_NORMAL 1 in layout(location= VERTEX_POS) vec4 attr_Pos; in layout(location= VERTEX_NORMAL) vec3 attr_Normal; ... normal = attr_Normal; gl_Position = attr_Pos;
  • 41. 41 Siggraph Asia 2014 Migration Tips UBO Parameter management http://on-demand.gputechconf.com/siggraph/2014/presentation/SG4117- OpenGL-Scene-Rendering-Techniques.pdf ( 18-27, 44-47) Ideally group by frequency of change // classic uniforms uniform samplerCube viewEnvTex; uniform vec4 materialColor; uniform sampler2D materialTex; ... // UBO usage, bindless texture inside UBO, grouped by change layout(std140, binding=0, commandBindableNV) uniform view { samplerCube texViewEnv; }; layout(std140, binding=1, commandBindableNV) uniform material { vec4 materialColor; sampler2D texMaterialColor; }; ...
  • 42. 42 Siggraph Asia 2014 OpenGL Devtech 'Proviz' samples soon available on GitHub ! check https://github.com/nvpro-samples on a regular basis working on ramping-up https://developer.nvidia.com/samples & https://developer.nvidia.com/professional-graphics command-lists examples to come here this Dec. 2014 Many other samples will be pushed here as standalone repositories Beta Driver for Command-Lists: Release 346 for Dec.2014 Special Thanks to Pierre Boudier and Christoph Kubisch Questions/Feedback: tlorach@nvidia.com / ckubisch@nvidia.com Thanks!

Notas do Editor

  1. New APIs are meant to increase the efficiency of drivers by lowering their cost on CPU and thus shifting the workload over the GPU. This can translate to a better use of the GPU. But it can also translate to simply lowering the pressure over the CPU.