Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 22 additions & 17 deletions _release-content/release-notes/partial_bindless_metal.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,41 @@
---
title: Partial Bindless on Metal and Reduced Bind Group Overhead
title: Partial Bindless and Reduced Bind Group Overhead
authors: ["@holg"]
pull_requests: [23436]
---

Cross-platform game engines must constantly navigate real differences in platform APIs.
Bevy's goal is to let users write a single application and ship it everywhere —
Windows, Mac, Linux, mobile — with confidence that it will just work.
That's a tough promise to live up to: rendering complex scenes on Mac and iOS was markedly slower.
In an ideal world, Bevy users could write a single application and ship it everywhere, with every last one of the messy cross-platform differences beautifully abstracted away.
That can be a bit hard though.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably omit this sentence, it doesn't really do anything imo

In this particular case, we found that rendering complex scenes on Mac and iOS was markedly slower than it should have been.

Looking into it, the lack of bindless rendering support was to blame.
Comment on lines +9 to +11
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are fine, but feels a little handwavey? like "we found" and "looking into it" just feel wishywashy haha

it would come across stronger if there was more concrete background like "In previous versions of Bevy user reports and benchmarks of complex scenes on mac and iOS showed much worse performance than other platforms. After investigation of the graphics pipeline used we found that it wasn't taking advantage of bindless rendering where it could." (continues into line 12)

Bindless rendering is how modern engines handle scenes with many different materials efficiently: shaders index into shared pools of textures and buffers rather than rebinding them per draw call.
Bindless is not just a performance optimization — it's how modern renderers are structured.

Metal (Apple's GPU API) supports texture binding arrays but not buffer binding arrays.
Bevy required both to enable bindless, which previously excluded Metal entirely — even for materials that never use buffer arrays.
If you were shipping on Mac or iOS, your game was running on a slower, fundamentally different code path.
Both Metal (Apple's GPU API) and DX12 (an older Windows API) have partial bindless support:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dx12 is also a modern api

Suggested change
Both Metal (Apple's GPU API) and DX12 (an older Windows API) have partial bindless support:
Both Metal (Apple's GPU API) and DX12 (a Windows graphics API) have partial bindless support:

they permit texture binding arrays but not buffer binding arrays.
Historically, Bevy required both to enable bindless, which excluded Metal entirely, even for materials that never use buffer arrays.

Most materials, including `StandardMaterial`, only use `#[data(...)]`, textures, and samplers — they never needed buffer array support.
Bevy now checks what each material actually needs;
if it only needs texture arrays, it gets bindless on Metal.
Materials using `#[uniform(..., binding_array(...))]` still fall back to non-bindless on Metal.
Most materials, including `StandardMaterial`, do not need buffer array support.
To ensure those materials take the fast path, Bevy now checks the actual needs of each material.
If you only need texture arrays, your material can be rendered efficiently across Bevy's desktop platforms.
If you use `#[uniform(..., binding_array(...))]`, expect unusually poor performance when using Metal or DX12.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel "poor performance" is a bit too harsh.


Two correctness bugs were fixed in the process.
The sampler limit check was testing the wrong metric: `max_samplers_per_shader_stage` counts binding slots, but the relevant limit is `max_binding_array_sampler_elements_per_shader_stage`, the array element count a mismatch that could silently exceed hardware limits.
Bevy now also skips creating binding array slots for resource types a material doesn't use, staying within Metal's hard 31 argument buffer slot limit and reducing overhead on all platforms.
We've also fixed two important correctness bugs in the process.
First, we discovered that the sampler limit check was testing the wrong metric: `max_samplers_per_shader_stage` counts binding slots, but the relevant limit is `max_binding_array_sampler_elements_per_shader_stage`, the array element count (a mismatch that could silently exceed hardware limits).
Copy link
Copy Markdown
Member

@beicause beicause Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems max_binding_array_sampler_elements_per_shader_stage is larger than max_samplers_per_shader_stage so it's unlikely to exceed.

Suggested change
First, we discovered that the sampler limit check was testing the wrong metric: `max_samplers_per_shader_stage` counts binding slots, but the relevant limit is `max_binding_array_sampler_elements_per_shader_stage`, the array element count (a mismatch that could silently exceed hardware limits).
First, we discovered that the sampler limit check was testing the wrong metric: `max_samplers_per_shader_stage` counts binding slots, but the relevant limit is `max_binding_array_sampler_elements_per_shader_stage`, the array element count (a mismatch that could incorrectly disable bindless).

Second, Bevy now also skips creating binding array slots for resource types a material doesn't use, staying within Metal's hard 31 argument buffer slot limit and reducing overhead on all platforms.

Benchmarked on Bistro Exterior (698 materials), 5-minute runs:

| GPU | Avg FPS improvement | Min FPS improvement | Memory |
| ------------------------ | ------------------- | ------------------- | ----------- |
| Apple M2 Max (Metal) | +18% | +77% | −57 MB RAM |
| NVIDIA 5060 Ti | +84% | +174% | Same |
| Intel i360P | +15% | Same | Same |
| AMD Vega 8 / Ryzen 4800U | Same | Same | −88 MB VRAM |
| Intel i360P | +15% | Same | Same |
| Intel Iris XE | Same | Same | Same |

[Bistro] is a demanding, fairly realistic scene.
While bindless limitations remain frustrating, especially on Mac where Vulkan isn't an option,
it's lovely to see those performance gains, and to know that Bevy itself is no longer artificially holding our users back.

[Bistro]: https://developer.nvidia.com/orca/amazon-lumberyard-bistro
Loading