Support memory sharing in simulation environments by mawad-amd · Pull Request #539 · ROCm/iris

mawad-amd · 2026-06-18T21:29:16Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

In simulation mode (FFM), replace the N-buffers-on-one-GPU hack with POSIX shared memory (shm_open + mmap). Each rank gets a slice of a shared region, enabling real cross-process memory sharing. FFM SVM mode means GPU VA = host VA, so mmap'd addresses are dereferenceable by GPU kernels. Validated with standalone prototype on gfx1260-ffm container. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

- memory_pool stays as CPU tensor (no .to(device)) so data_ptr() returns the mmap host VA — valid for FFM SVM dereference - establish_peer_access creates local mmap views for each peer's slice, stores references to prevent GC - symmetric_heap._refresh_peer_access_torch now calls establish_peer_access in sim mode instead of using raw allgather bases (which are remote VAs, invalid in this process) Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

In FFM SVM mode, CPU VA = GPU VA. memory_pool is backed by shm mmap (CPU tensor) but get_device() returns cuda:N so iris device checks pass and examples work unchanged. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Triton AMD driver validates pointers via hipPointerGetAttribute before kernel launch. CPU tensor pointers from shm mmap fail this check. hipHostRegister marks the shm region as device-accessible, making the check pass without patching Triton. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds shared-memory–backed allocation in simulation mode so ranks can share/peer-access a symmetric heap without per-rank distinct buffers.

Changes:

Use POSIX shared memory (shm_open + mmap) for the simulated heap and expose per-rank views.
Populate heap_bases in simulation from allocator-computed bases after establishing peer access.
Add cleanup for SHM/MMAP resources and HIP host registration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
iris/host/memory/symmetric_heap.py	In simulation, establishes peer access and sources `heap_bases` from allocator-provided base pointers.
iris/host/memory/allocators/torch_allocator.py	Implements `shm_open`/`mmap`-backed simulation heap, creates per-rank tensor views, and adds cleanup + device reporting behavior.

Same semantics as example 31 but uses one unified kernel that branches on cur_rank instead of separate producer/consumer kernels. Produces a single kernel dispatch per rank for downstream capture tools. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

mawad-amd and others added 6 commits June 18, 2026 14:07

Apply Ruff auto-fixes

3d29562

Return CUDA device in sim mode for device check compatibility

122b80b

In FFM SVM mode, CPU VA = GPU VA. memory_pool is backed by shm mmap (CPU tensor) but get_device() returns cuda:N so iris device checks pass and examples work unchanged. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Apply Ruff auto-fixes

6251890

Copilot AI review requested due to automatic review settings June 18, 2026 21:29

mawad-amd requested review from BKP and neoblizz as code owners June 18, 2026 21:29

github-actions Bot added in-progress We are working on it iris Iris project issue labels Jun 18, 2026

Copilot AI reviewed Jun 18, 2026

View reviewed changes

Comment thread iris/host/memory/allocators/torch_allocator.py

Comment thread iris/host/memory/allocators/torch_allocator.py

Comment thread iris/host/memory/allocators/torch_allocator.py

mawad-amd and others added 3 commits June 18, 2026 14:55

Use gloo backend in sim mode for FFM compatibility

1975233

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Use gloo backend always in example 33

dc9aae7

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support memory sharing in simulation environments #539

Support memory sharing in simulation environments #539
mawad-amd wants to merge 9 commits into
mainfrom
muhaawad/mmap

mawad-amd commented Jun 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mawad-amd commented Jun 18, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants