Add `multi-draw-indirect` feature. by mrshannon · Pull Request #1949 · gpuweb/gpuweb

mrshannon · 2021-07-15T19:29:44Z

Indirect drawing with multiple indirect drawing commands is a common technique for drawing complex scenes that would otherwise be infeasible due to either an excessive number of CPU issued draw calls or scene complexity that cannot be built by the CPU alone. This is done by:

Executing multiple draws with a single API call.
Allowing the GPU to generate both geometry and the draws necessary to render it.
Culling out unnecessary draw calls on the GPU in more complex scenes than CPU culling could achieve.

This PR addresses adding a multi-draw-indirect feature. In particular it addresses adding:

multiDrawIndirect and multiDrawIndexedIndirect methods on GPURenderEncoderBase.
- Allows submitting multiple draws with a single API call (multi-draw).
- Allows the GPU to determine the number of draw calls (draw count).
- Use cases:
  - GPU derived scene data
  - GPU based culling
  - GPU based LOD
  - Efficient execution of complex scenes with a large number of draws
Non-zero firstInstance for drawIndirect, drawIndexedIndirect, multiDrawIndirect, and multiDrawIndexedIndirect.
- This is the only available per draw input, without rebinds, that is readable in the shader.
- Use cases:
  - Select instance stride vertex data
  - Index into per object or per draw data in storage buffers
  - Multi material, single API call, rendering

Compatibility

The required backend features to implement multi-draw-indirect are available on:

Newer Apple devices (~2016+)
All DX12 devices
All Vulkan capable desktops (with up to date drivers)
30% of Android devices

See the sections below for details.

Vulkan

Multi-Draw

Requires the 0 or 1 restriction on the drawCount argument of vkCmdDrawIndirect and vkCmdDrawIndexIndirect to be relaxed to any non-negative integer. This requires the multiDrawIndirect feature which is supported on:

99% of desktop GPUs
63% of Android devices

NOTE: The stride argument will always be set for tight packing, in order to maintain compatibility with DX12.

Draw Count

Requires the vkCmdDrawIndirectCount and vkCmdDrawIndexedIndirectCount functions which are provided by either the drawIndirectCount feature of Vulkan 1.2 or one of the following extensions:

VK_AMD_draw_indirect_count
VK_KHR_draw_indirect_count

Because drawIndirectCount was introduced in driver updates the statistics at https://vulkan.gpuinfo.org cannot be relied upon. The following is based on the oldest card that supports drawIndirectCount from each manufacturer, if newer cards dropped support for drawIndirectCount that is not captured here.

Intel integrated cards (that support Vulkan) support drawIndirectCount.
NVIDIA cards going back to Kepler support drawIndirectCount.
AMD cards going back to the HD 8000 series support drawIndirectCount.

For Android:

drawIndirectCount is supported on 100% of devices that support Vulkan 1.2.
drawIndirectCount is supported, as an extension, on 28% of devices that do not support Vulkan 1.2.

Non-zero `firstInstance`

Requires the firstInstance property of the VkDrawIndirectCommand and VkDrawIndexedIndirectCommand to be non-zero. This requires the drawIndirectFirstInstance feature which is supported on:

99% of desktop GPUs
64% of Android devices

DX12

All required features are core to DX12.

Multi-Draw

Uses ExecuteIndirect where the MaxCommandCount argument is greater than 1 and the pArgumentBuffer argument points to a GPU buffer containing an array of D3D12_DRAW_ARGUMENTS or D3D12_DRAW_INDEXED_ARGUMENTS.

NOTE: The binary layout of these structs are compatible with Vulkan.

Draw Count

Uses ExecuteIndirect where the pCountBuffer argument is not NULL.

Non-zero `firstInstance`

This is the StartInstanceLocation of the D3D12_DRAW_ARGUMENTS or D3D12_DRAW_INDEXED_ARGUMENTS structures. Has native support for values greater than 0.

Metal

Multi-Draw

Can be emulated with Indirect Command Buffers (ICBs) and an extra compute shader invocation to translate from the Vulkan-like indirect draw buffer to an ICB.

Requires

iOS 12.0+
macOS 10.14+
MTLGPUFamilyMac2

Non-zero `firstInstance`

Natively supported with the baseInstance argument.

Draw Count

Don't record commands past this count in the ICB and use optimizedIndirectCommandBuffer.

Requires

iOS 12.0+
macOS 10.14+
MTLGPUFamilyMac2

Preview | Diff

kvark

This looks much detailed, thank you for the proposal!

We'll probably not expose this on Metal for some time, but having the ICB backing it is an interesting concept.

I think, just polyfilling it entirely on base WebGPU takes a similar amount of effort:

copy the indirect arguments into a temporary buffer
run a compute shader, which will read the count value from the counts buffer, and then it will zero out the instance count on all of the indirect argument entries (in the temp copies) that are behind the read count value.
record the maxDrawCount consecutive drawIndirect invocations, advancing the offset in each, and pointing to the temporary indirect buffer.

Given that it's roughly the same complexity as Metal's ICB workaround, I'm not seeing the latter to be feasible. We might as well polyfill this on all the platforms (that don't support it natively), or not involve any compute-based polyfill at all (leaving Apple platforms behind, and asking users to implement this polyfill on their side).

kvark · 2021-07-16T00:09:19Z

spec/index.bs


-    undefined drawIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset);
-    undefined drawIndexedIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset);
+    undefined drawIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset,


Did you consider adding the extra methods (e.g. vkCmdDrawIndirectCount) instead of overloading the existing ones?

I considered adding multiDrawIndirect and multiDrawIndexedIndirect. No other feature added methods so I was not sure what would be preferred. I can modify the PR if that is what the committee would prefer. I see 3 options:

Add the drawCount argument to the existing methods and make new methods with drawCountBuffer and drawCountOffset.

Put all multi-draw features in a new set of methods, multiDrawIndirect and multiDrawIndexedIndirect.

Add both multiDrawIndirect and multiDrawIndexedIndirect, and multiDrawIndirectCount and multiDrawIndexedIndirectCount.

DX12 does not separate the GPU derived count vs CPU derived count, but Vulkan does.

kvark · 2021-07-16T00:11:05Z

spec/index.bs

-    undefined drawIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset);
-    undefined drawIndexedIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset);
+    undefined drawIndirect(GPUBuffer indirectBuffer, GPUSize64 indirectOffset,
+                     optional GPUSize32 maxDrawCount = 1,


can this number be anything, or do we need an extra limit?
Glancing at gpuinfo, some Android devices report it as 1.

Yes, I think we would need another limit (but maybe not for Android), maxIndirectDrawCount seems to only be 1 for devices that do not support multiDrawIndirect.

kvark · 2021-07-16T00:15:38Z

spec/index.bs

+                        - |drawCountBuffer| is `null`, unless the {{GPUFeatureName/"multi-draw-indirect"}} [=feature=] is enabled.
+                        - |drawCountBuffer| is [$valid to use with$] |this|.
+                        - |drawCountBuffer|.{{GPUBuffer/[[usage]]}} contains {{GPUBufferUsage/INDIRECT}}.
+                        - |drawCountOffset| + sizeof([=indirect draw count=]) &le;


Are we taking sizeof of the number here?

It will always be 32-bit, I could just specify 4 bytes.

Kangz · 2021-07-16T13:47:06Z

+1 on this PR being nicely detailed, and the functionality already being polyfillable with compute shaders that copy the draw argument data in a buffer with maxCount drawIndirect (and zero-out unused draws).

Because of the compute shader validation that needs to happen I don't think we'll be able to implement this any time soon (that's why during OT Chromium won't have drawIndexedIndirect and dispatchIndirect enabled), but we could roll it out to more and more hardware gradually.

mrshannon · 2021-07-16T16:28:29Z

I think, just polyfilling it entirely on base WebGPU takes a similar amount of effort:

copy the indirect arguments into a temporary buffer

run a compute shader, which will read the count value from the counts buffer, and then it will zero out the instance count on all of the indirect argument entries (in the temp copies) that are behind the read count value.

record the maxDrawCount consecutive drawIndirect invocations, advancing the offset in each, and pointing to the temporary indirect buffer.

Given that it's roughly the same complexity as Metal's ICB workaround, I'm not seeing the latter to be feasible. We might as well polyfill this on all the platforms (that don't support it natively), or not involve any compute-based polyfill at all (leaving Apple platforms behind, and asking users to implement this polyfill on their side).

On Metal this works because firstInstance can be non-zero. On Vulkan, everywhere firstInstance can be non-zero you also have multiDrawIndirect, though not necessarily drawIndirectCount. Though you can get around drawIndirectCount by zeroing out the instanceCount, either for the user in a compute shader or having the user do it themselves.

So this brings the question, should:

multiDrawIndirect
drawIndirectFirstInstance
drawIndirectCount

all be separate features, and let the user zero out instanceCount and/or polyfill themselves.

The issue with this is it does not fit well with our current model of ask for what you need and either get it or get nothing, because at first a user might ask for all of it. Then fallback on zeroing out instanceCount (less overhead if the user does this) and then need to ask for multiDrawIndirect and drawIndirectFirstInstance. Then on Metal that fails as well and they fall back to polyfill and ask for drawIndirectFirstInstance. This does not even consider that the requestAdapter call could have failed for other reasons.

kvark · 2021-07-16T19:24:32Z

It sounds to me that we can split the drawIndirectFirstInstance out of this feature, so that the user can polyfill the rest (and first instance semantic is truly orthogonal to the rest).

mrshannon · 2021-07-19T14:27:22Z

It sounds to me that we can split the drawIndirectFirstInstance out of this feature, so that the user can polyfill the rest (and first instance semantic is truly orthogonal to the rest).

I will split that off, but what are the thoughts on overloading the existing draws vs adding new ones.

kainino0x · 2021-07-19T21:05:38Z

Editors chatted and we think the multidraw calls are too different in shape and functionality - so preference for using a different name instead of overloading with optional arguments.

kvark · 2021-07-19T22:11:26Z

I will split that off, but what are the thoughts on overloading the existing draws vs adding new ones.

We talked about it some more with the editors, and we think it would really help to know how much the firstInstance feature is correlated with "multi-draw" in the native APIs.
The users can polyfill the "multi-draw", while they can't polyfill "firstInstance" efficiently, but their code can expose API with it's own constraints, it doesn't have to be exactly WebGPU API.

mrshannon · 2021-07-19T22:17:52Z

how much the firstInstance feature is correlated with "multi-draw" in the native APIs.

On Vulkan firstInstance is available everywhere multi-draw is (less than 1% difference), though its two separate features. It's only the drawIndirectCount that has less support. On Metal (without some sort of translation to ICBs) firstInstance is always available, but multi-draw is never available. On DX12 everything is always available. So separating firstInstance is only for Metal.

kvark · 2021-07-21T16:50:57Z

less than 1% difference

just to confirm, are we sure that the set of hardware is different by a small margin, or is it just the total percent/number that is different?

mrshannon · 2021-07-22T18:50:54Z

just to confirm, are we sure that the set of hardware is different by a small margin, or is it just the total percent/number that is different?

multiDrawIndirect (63%) is mostly useless without drawIndirectFirstInstance (64%) which leads me to conclude that there are 1% of devices that have drawIndirectFirstInstance but not multiDrawIndirect. I guess there could be feature disparity but I don't know what the point would be as each of the draws would render the same instance.

litherum · 2021-07-24T08:38:20Z

Why are we considering adding a new feature ostensibly for performance without any performance data?

WebGL and WebGPU have very different CPU overhead characteristics. We can't use the existence of this extension in WebGL as motivation for it in WebGPU.

mrshannon · 2021-07-26T03:14:09Z

Why are we considering adding a new feature ostensibly for performance without any performance data?

WebGL and WebGPU have very different CPU overhead characteristics. We can't use the existence of this extension in WebGL as motivation for it in WebGPU.

I just did some benchmarking using Vulkan. I have primarily been an OpenGL user and thus my assumption on the performance increase was wrong, I had expected at least an order of magnitude based on my OpenGL experience. By reusing the command buffer and using non-zero firstInstance I was able to achieve 0% to 6% performance increase (depending on GPU) with drawCount > 1, in comparison to using drawCount = 1. This was with a scene containing ~25,000 draws.

Next I simulated culling out 20,000 of those calls on the GPU and did the culling by setting instanceCount to 0. This resulted in a 5% to 48% performance improvement with drawCount = 24,576 over drawCount = 1. Adding in drawIndirectCount (instead of zeroing instanceCount) gained another 11% to 13% improvement. NOTE: The weaker the GPU the less the improvements are.

Therefore, in cases where pipelines and bind groups are consistent between frames (so render bundles can be used) and the number of draws is fairly consistent, a user could polyfill using firstInstance > 0 and render bundles with minimal performance loss. But with varied final draw numbers the performance of true multi-draw goes up significantly. However, this is only when the command buffer can be reused (render bundles). If the command buffer must be built every frame there is certainly CPU overhead to calling drawIndirect 25,000 times vs. calling multiDrawIndirect once.

I do not know what the performance loss would be with dynamic uniform/storage offset changes between indirect draws in the render bundle (which would be required without firstInstance > 0). I can't benchmark this because it is somewhat implementation dependent (argument buffers etc) and is no longer in WebGPU so I can't benchmark it that way.

mrshannon · 2021-07-26T03:26:55Z

Now that multi-draw-indirect is extracted into new methods I think first-instance-indirect should probably be a separate feature. This is because it seems strange that multi-draw-indirect changes the behavior of methods it does not add.

github-actions · 2021-07-26T19:51:21Z

Previews, as seen when this build job started (b80d5d9):
WebGPU | IDL
WGSL
Explainer

kainino0x · 2021-07-26T22:36:38Z

We can't use the existence of this extension in WebGL as motivation for it in WebGPU.

This extension doesn't exist in WebGL, which doesn't have indirect draws at all. WebGL has non-indirect multi-draw as an extension.

However the presence of this feature in Vulkan is some evidence of its value.

kainino0x · 2021-07-26T22:38:03Z

Now that multi-draw-indirect is extracted into new methods I think first-instance-indirect should probably be a separate feature. This is because it seems strange that multi-draw-indirect changes the behavior of methods it does not add.

This makes sense, though if we find that it's unnecessary to expose them separately, we could give it a more generic name like "draw-indirect2" or something.

litherum · 2021-08-02T18:31:11Z

Supported on all versions which are still maintained.

The Metal Feature Set Tables indicates that support on Mac is limited to MTLGPUFamilyMac2 and is unavailable on MTLGPUFamilyMac1. Are you indicating that you're considering MTLGPUFamilyMac1 to be unmaintained?

kainino0x · 2021-08-02T19:19:23Z

Resolution:

Tentatively two features: multi-draw-indirect, and first-instance-indirect.
Think about browsers emulating feature(s) in the future, but ~assume no emulation for now.
Make sure the out-of-bounds behavior is specified for this.

kainino0x · 2021-08-02T21:14:17Z

Supported on all versions which are still maintained.

The Metal Feature Set Tables indicates that support on Mac is limited to MTLGPUFamilyMac2 and is unavailable on MTLGPUFamilyMac1. Are you indicating that you're considering MTLGPUFamilyMac1 to be unmaintained?

Just realized this was a confusing snip. From context, it's clear that "versions which are still maintained" was referring to OS releases, not hardware. The hardware requirement may have been overlooked.

mrshannon · 2021-08-02T21:42:37Z

The Metal Feature Set Tables indicates that support on Mac is limited to MTLGPUFamilyMac2 and is unavailable on MTLGPUFamilyMac1. Are you indicating that you're considering MTLGPUFamilyMac1 to be unmaintained?

Corrected in PR comment, was unfamiliar with how Apple lists capabilities. Metal docs only list OS version.

kainino0x · 2021-08-09T19:49:24Z

Resolution: accepted, merge after #2022 has landed and this is rebased over it.

kainino0x · 2022-08-30T17:59:37Z

superseded by #2315

Add multi-draw-indirect feature.

c16910b

kvark reviewed Jul 16, 2021

View reviewed changes

kainino0x added the for webgpu editors meeting label Jul 19, 2021

kainino0x removed the for webgpu editors meeting label Jul 19, 2021

Extract multi-draw-indirect into new methods.

b80d5d9

mrshannon mentioned this pull request Aug 6, 2021

Add indirect-first-instance feature. #2022

Merged

rconde01 mentioned this pull request Nov 16, 2021

Add multi draw indirect feature #2315

Draft

kainino0x added for webgpu editors meeting copyediting Pure editorial stuff (copyediting, *.bs file syntax, etc.) and removed for webgpu editors meeting labels Aug 25, 2022

kainino0x closed this Aug 30, 2022

kainino0x mentioned this pull request Oct 24, 2023

Investigation: multi-draw-indirect #4349

Closed

kainino0x linked an issue Oct 24, 2023 that may be closed by this pull request

Investigation: multi-draw-indirect #4349

Closed

gpuweb deleted a comment from fwadnjar Oct 26, 2023

kainino0x mentioned this pull request Mar 26, 2024

DrawIndirectCount #1354

Open

vertver mentioned this pull request Apr 29, 2024

[RFE] Draw Indirect Count NVIDIA-RTX/NRI#65

Closed

Conversation

mrshannon commented Jul 15, 2021 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Compatibility

Vulkan

Multi-Draw

Draw Count

Non-zero firstInstance

DX12

Multi-Draw

Draw Count

Non-zero firstInstance

Metal

Multi-Draw

Non-zero firstInstance

Draw Count

Uh oh!

kvark left a comment

Choose a reason for hiding this comment

Uh oh!

kvark Jul 16, 2021

Choose a reason for hiding this comment

Uh oh!

mrshannon Jul 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kvark Jul 16, 2021

Choose a reason for hiding this comment

Uh oh!

mrshannon Jul 16, 2021

Choose a reason for hiding this comment

Uh oh!

kvark Jul 16, 2021

Choose a reason for hiding this comment

Uh oh!

mrshannon Jul 19, 2021

Choose a reason for hiding this comment

Uh oh!

Kangz commented Jul 16, 2021

Uh oh!

mrshannon commented Jul 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kvark commented Jul 16, 2021

Uh oh!

mrshannon commented Jul 19, 2021

Uh oh!

kainino0x commented Jul 19, 2021

Uh oh!

kvark commented Jul 19, 2021

Uh oh!

mrshannon commented Jul 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kvark commented Jul 21, 2021

Uh oh!

mrshannon commented Jul 22, 2021

Uh oh!

litherum commented Jul 24, 2021

Uh oh!

mrshannon commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrshannon commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 26, 2021

Uh oh!

kainino0x commented Jul 26, 2021

Uh oh!

kainino0x commented Jul 26, 2021

Uh oh!

litherum commented Aug 2, 2021

Uh oh!

kainino0x commented Aug 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kainino0x commented Aug 2, 2021

Uh oh!

mrshannon commented Jul 15, 2021 •

edited by pr-preview bot

Loading

Non-zero `firstInstance`

Non-zero `firstInstance`

Non-zero `firstInstance`

mrshannon Jul 16, 2021 •

edited

Loading

mrshannon commented Jul 16, 2021 •

edited

Loading

mrshannon commented Jul 19, 2021 •

edited

Loading

mrshannon commented Jul 26, 2021 •

edited

Loading

mrshannon commented Jul 26, 2021 •

edited

Loading

kainino0x commented Aug 2, 2021 •

edited

Loading

kainino0x commented Aug 30, 2022 •

edited

Loading