Skip to content

Support memory sharing in simulation environments #539

Open
mawad-amd wants to merge 9 commits into
mainfrom
muhaawad/mmap
Open

Support memory sharing in simulation environments #539
mawad-amd wants to merge 9 commits into
mainfrom
muhaawad/mmap

Conversation

@mawad-amd

Copy link
Copy Markdown
Collaborator

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

mawad-amd and others added 6 commits June 18, 2026 14:07
In simulation mode (FFM), replace the N-buffers-on-one-GPU hack with
POSIX shared memory (shm_open + mmap). Each rank gets a slice of a
shared region, enabling real cross-process memory sharing.

FFM SVM mode means GPU VA = host VA, so mmap'd addresses are
dereferenceable by GPU kernels. Validated with standalone prototype
on gfx1260-ffm container.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
- memory_pool stays as CPU tensor (no .to(device)) so data_ptr()
  returns the mmap host VA — valid for FFM SVM dereference
- establish_peer_access creates local mmap views for each peer's
  slice, stores references to prevent GC
- symmetric_heap._refresh_peer_access_torch now calls
  establish_peer_access in sim mode instead of using raw allgather
  bases (which are remote VAs, invalid in this process)

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
In FFM SVM mode, CPU VA = GPU VA. memory_pool is backed by shm mmap
(CPU tensor) but get_device() returns cuda:N so iris device checks
pass and examples work unchanged.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Triton AMD driver validates pointers via hipPointerGetAttribute
before kernel launch. CPU tensor pointers from shm mmap fail this
check. hipHostRegister marks the shm region as device-accessible,
making the check pass without patching Triton.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 18, 2026 21:29
@mawad-amd mawad-amd requested review from BKP and neoblizz as code owners June 18, 2026 21:29
@github-actions github-actions Bot added in-progress We are working on it iris Iris project issue labels Jun 18, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds shared-memory–backed allocation in simulation mode so ranks can share/peer-access a symmetric heap without per-rank distinct buffers.

Changes:

  • Use POSIX shared memory (shm_open + mmap) for the simulated heap and expose per-rank views.
  • Populate heap_bases in simulation from allocator-computed bases after establishing peer access.
  • Add cleanup for SHM/MMAP resources and HIP host registration.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
iris/host/memory/symmetric_heap.py In simulation, establishes peer access and sources heap_bases from allocator-provided base pointers.
iris/host/memory/allocators/torch_allocator.py Implements shm_open/mmap-backed simulation heap, creates per-rank tensor views, and adds cleanup + device reporting behavior.

Comment thread iris/host/memory/allocators/torch_allocator.py
Comment thread iris/host/memory/allocators/torch_allocator.py
Comment thread iris/host/memory/allocators/torch_allocator.py
mawad-amd and others added 3 commits June 18, 2026 14:55
Same semantics as example 31 but uses one unified kernel that branches
on cur_rank instead of separate producer/consumer kernels. Produces a
single kernel dispatch per rank for downstream capture tools.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-progress We are working on it iris Iris project issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants