Support memory sharing in simulation environments #539
Open
mawad-amd wants to merge 9 commits into
Open
Conversation
In simulation mode (FFM), replace the N-buffers-on-one-GPU hack with POSIX shared memory (shm_open + mmap). Each rank gets a slice of a shared region, enabling real cross-process memory sharing. FFM SVM mode means GPU VA = host VA, so mmap'd addresses are dereferenceable by GPU kernels. Validated with standalone prototype on gfx1260-ffm container. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
- memory_pool stays as CPU tensor (no .to(device)) so data_ptr() returns the mmap host VA — valid for FFM SVM dereference - establish_peer_access creates local mmap views for each peer's slice, stores references to prevent GC - symmetric_heap._refresh_peer_access_torch now calls establish_peer_access in sim mode instead of using raw allgather bases (which are remote VAs, invalid in this process) Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
In FFM SVM mode, CPU VA = GPU VA. memory_pool is backed by shm mmap (CPU tensor) but get_device() returns cuda:N so iris device checks pass and examples work unchanged. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Triton AMD driver validates pointers via hipPointerGetAttribute before kernel launch. CPU tensor pointers from shm mmap fail this check. hipHostRegister marks the shm region as device-accessible, making the check pass without patching Triton. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds shared-memory–backed allocation in simulation mode so ranks can share/peer-access a symmetric heap without per-rank distinct buffers.
Changes:
- Use POSIX shared memory (
shm_open+mmap) for the simulated heap and expose per-rank views. - Populate
heap_basesin simulation from allocator-computed bases after establishing peer access. - Add cleanup for SHM/MMAP resources and HIP host registration.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| iris/host/memory/symmetric_heap.py | In simulation, establishes peer access and sources heap_bases from allocator-provided base pointers. |
| iris/host/memory/allocators/torch_allocator.py | Implements shm_open/mmap-backed simulation heap, creates per-rank tensor views, and adds cleanup + device reporting behavior. |
Same semantics as example 31 but uses one unified kernel that branches on cur_rank instead of separate producer/consumer kernels. Produces a single kernel dispatch per rank for downstream capture tools. Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist