-
Notifications
You must be signed in to change notification settings - Fork 609
[ROCM] Fix buffer_load_lds support for gfx950 #2248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
benenzhu
wants to merge
36
commits into
tile-ai:main
Choose a base branch
from
benenzhu:feat-c-remove_vmcnt0
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
e7ac334
[ROCm] Expose HIP kernel n_regs / n_spills on JITKernel via clang rem…
benenzhu 7f594d1
fix for final merge
benenzhu ad72ada
fix for final merge
benenzhu 23be555
fix for final merge
benenzhu f276441
add a flag to save temp files
benenzhu 600b516
add a flag to save temp files
benenzhu f0fd44f
M2: add cp_async_gs_lds_with_rsrc<N> device templates for gfx950
benenzhu adb4364
M3: register ptx_cp_async_lds / ptx_make_buffer_resource / ptx_cp_asy…
benenzhu e38d5fd
M4: HIP codegen handlers for buffer_load_lds ops + hoisted-resource A…
benenzhu 1053bcb
M5: route eligible 16B copies to ptx_cp_async_lds on gfx950
benenzhu 7ac1d96
M6: HoistBufferResource Python pass + pipeline wiring (gfx950 only)
benenzhu 3c82e3b
M7: layout SwizzleDelta API + ROCm swizzle-swap in lower_tile_op
benenzhu ae29f93
M8a: re-enable MergeSharedMemoryAllocations in OptimizeForTarget
benenzhu d101b38
M8b: integrate ptx_cp_async_lds with vec-loop folding + buf-merge + c…
benenzhu 22fd402
M5-harden: replace XOR-call gate with real affine lane-contiguity proof
benenzhu 91878e4
M9: VisitExpr_(CallNode) swizzle-swap on tl::ptx_cp_async_lds (real f…
benenzhu 4736c94
M6.5: AMD vmcnt wait-count scaling in HoistBufferResource
benenzhu c3c0f9a
add a flag to save temp files
benenzhu e6ff3f0
M9-safe: downgrade ptx_cp_async_lds to ptx_cp_async when swap can't p…
benenzhu 79c6580
add a flag to save temp files
benenzhu ff6cb4e
format file
benenzhu 1007a03
M10: chunk-block-aware binding via early CBA hook in InferLayout
benenzhu 0a864a8
M11: always emit ptx_cp_async_lds; let M9 swap or downgrade
benenzhu df611d3
docs: annotate FullBank/makeGemmABLayout with inline derivations
benenzhu c5b1973
Revert "docs: annotate FullBank/makeGemmABLayout with inline derivati…
benenzhu 781eb74
lower_ptx_async_copy: drop dead IsLdsLaneContiguous + dst_check_index
benenzhu 7163b44
parallel: drop redundant CBA call in ComputePlanCandidate
benenzhu 3c4d1d2
Merge branch 'main' into feat-c-remove_vmcnt0
benenzhu 03343e2
parallel: apply clang-format
benenzhu e9d7bdb
review fixes for ptx_cp_async_lds family
benenzhu 3d95235
parallel: gate early CBA hook on fragment validation
benenzhu 7f8b103
parallel: scope early CBA hook to gfx950 only
benenzhu c78c587
zz
benenzhu 2718f49
drop bare ptx_cp_async_lds codegen, fall back to sync cp_async_gs
benenzhu 25f61a2
clean code
benenzhu a0312d5
clean code
benenzhu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Submodule tvm
updated
from 0be336 to 8435b8
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.