[AMD][ROCm] Fix CI failures on gfx950, gfx1100, gfx1151, and gfx1201 by zhangnju · Pull Request #2326 · tile-ai/tilelang

zhangnju · 2026-06-03T15:06:18Z

HI @LeiWang1999

I tried to run CI on AMD machine based on the latest codes, and found some cases failed.

This PR includes the below bug fixes:

tvm_mfma intrinsic broken after tirx migration (tilelang/language/tir/op.py)

After the TIR→tirx migration, tirx.Call enforces strict type checking on all arguments. The six string arguments in tvm_mfma (shape, layouts, dtypes) were no longer implicitly coerced, producing TypeError: Mismatched type on argument #2. Fix by wrapping each string argument in tvm.tirx.StringImm(str(...)).

cumsum/cummax wrong results on Wave32 GPUs (gfx11/gfx12) (src/backend/common/op/scan.h, src/tl_templates/hip/scan.h)

The scan lowering emitted CumSum/CumMax{1D,2D}<threads>::run, which hard-codes SEG=64. On Wave32 architectures __shfl_up/down is bounded by the 32-thread warp width, so SEG=64 silently crosses warp boundaries and corrupts ~50% of results. Fix by:

Switching the emitted symbol from ::run to ::run_auto.
Adding run_auto methods to CumSum1D/2D and CumMax1D/2D wrapper structs. run_auto queries __builtin_amdgcn_wavefrontsize() at kernel launch time and dispatches to SEG=32 (Wave32) or SEG=64 (Wave64) accordingly.

Autotune scalar-input validation bypassed by disk cache (tilelang/autotuner/tuner.py)
_validate_input_supply_requirements was called after the disk-cache lookup, so a cache hit would return a kernel without ever checking that scalar inputs were supplied via set_autotune_inputs. Move the validation before the cache lookup so it is unconditional.
tfloat32 tests incorrectly running on ROCm (test_tilelang_kernel_gemm.py, test_tilelang_language_eager_jit.py)
tfloat32 is unsupported on some ROCm target, AMD doesn't advise customers to use it, so we don't need to run it on AMD CI test

test_gemm_f32f32f32_nn/nt: change decorator from @requires_cuda_or_cdna to @requires_cuda.
test_jit2_gemm_ptr: exclude T.tfloat32 from the dtype list when not on CUDA to prevent a par_compile failure.

RDNA-specific tests running on CDNA (tilelang/testing/__init__.py, test_tilelang_rocm_target.py)

Three tests validating RDNA device-model behavior were marked @requires_rocm, so they ran on CDNA (gfx950) and failed trivially.

Add a requires_rdna decorator that skips on non-RDNA targets.
Update the tests to match the current implementation: gfx12 is now a supported RDNA generation, so replace stale RDNA(gfx1200) rejection assertions with monkeypatched generation=10 scenarios, and fix match strings from "gfx11 targets only" to "gfx11/gfx12 targets only".

Test result :

Ran the full CI test suite on all four machines:

GPU	Passed	Skipped
gfx950 (MI355X, Wave64)	754	555
gfx1201 (R9700, Wave32)	616	443
gfx1151 (Strix Halo, Wave32)	612	429
gfx1100 (RX 7900 XTX, Wave32)	612	469

Remaining skips are CUDA-only tests (TMA, tfloat32) and RDNA-only tests on non-RDNA machines, all expected.

Summary by CodeRabbit

New Features
- Added "auto" scan entrypoints for CUDA and HIP; cumulative-scan wrappers gain run_auto.
- New TMEM store helpers and thread-sync fences for tcgen05; exported store/fence utilities.
- Added pow_of_int helper and RDNA test gating decorator.
Bug Fixes
- Auto-tuner validates scalar-input requirements earlier.
- Fallbacks added where GPU transform hooks may be missing.
- AMD gfx950 target construction normalized.
Tests
- Multiple tests retargeted/gated for CUDA or RDNA; expanded RDNA generation coverage.

github-actions · 2026-06-03T15:06:34Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2026-06-03T15:06:34Z

📝 Walkthrough

Walkthrough

This PR converts HIP target strings to dicts, adds run_auto scan entrypoints and lowers, updates MFMA/tir handling, adds RDNA detection and test gating, guards CUDA transforms, changes autotuner validation/fast-path timing, and extends CuTeDSL with TMEM stores, pow_of_int, and reduce.run_auto.

Changes

Platform, Scan, MFMA, Tests, and Autotuner Updates

Layer / File(s)	Summary
HIP target dictionary specification migration `examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_cdna4.py`, `testing/python/amd/test_tilelang_mxfp4_gfx950.py`	Example and test HIP target construction switched from CLI-style string (`"hip -mcpu=gfx950"`) to structured dict (`{kind: "hip", mcpu: "gfx950"}`) used to build `tvm.target.Target`.
Scan run_auto API and lowering `src/tl_templates/cuda/scan.h`, `src/tl_templates/hip/scan.h`, `src/backend/common/op/scan.h`	Added `run_auto` overloads for InclusiveScan/CumSum/CumMax (CUDA and HIP templates) and updated shared-scan GPU lowering to call the `::run_auto` external symbol instead of `::run`.
Test gating and platform adaptations `testing/python/issue/test_tilelang_issue_2123.py`, `testing/python/kernel/test_tilelang_kernel_gemm.py`, `testing/python/language/`, `testing/python/components/`, `testing/python/jit/*`	Wrap CUDA-specific imports/tests with availability checks, add `@requires_cuda` markers to CUDA-only tests, restrict some GEMM tests to CUDA, and make some dtype parametrizations conditional on detected target.
RDNA detection and tests `tilelang/testing/__init__.py`, `testing/python/target/test_tilelang_rocm_target.py`	Add `requires_rdna` decorator and RDNA-focused tests; consolidate RDNA detection helpers and gate RDNA tests to gfx11/gfx12.
CUDA transform FFI guards `tilelang/cuda/transform/__init__.py`	Guard FFI API calls for `ProducerConsumerWarpSpecialized` and `LowerBlackwell2SM`, returning identity pass when _ffi_api symbol is missing.
Autotuner validation timing and fast-path `tilelang/autotuner/tuner.py`	Move scalar input supply validation earlier in `AutoTuner.run()` (before cache lookup) and add a fast-path in `AutoTuneImpl.__call__` to bypass autotuning when caller supplies all tunable config keys.
MFMA dtype mapping and tir arg wrapping `tilelang/rocm/intrinsics/mfma_macro_generator.py`, `tilelang/language/tir/op.py`	Recognize `custom[tfloat32]` in MFMA dtype abbreviation mapping and wrap MFMA string-like arguments as `tvm.tirx.StringImm(str(...))` for `tvm_mfma` call construction.
Target helper re-exports `tilelang/utils/target.py`	Re-export determine_target and target detection predicates from this central module for use across tests and tooling.
CuTeDSL TMEM store and fences `tilelang/contrib/cutedsl/gemm_tcgen05.py`	Add TMEM store helpers with segmented stores, store fences, and before/after thread-sync fence functions; update `__all__` and public wrappers.
CuTeDSL math helper `tilelang/contrib/cutedsl/math.py`	Add `pow_of_int(exp)` helper and export it.
CuTeDSL reduce run_auto `tilelang/contrib/cutedsl/reduce.py`	Add `run_auto` JIT entrypoints to `CumSum1D/CumSum2D/CumMax1D/CumMax2D` delegating to existing `run`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

tile-ai/tilelang#2262: Related scan run_auto integration and template changes.
tile-ai/tilelang#2319: Related CuTeDSL cumulative scan changes.

Suggested reviewers

lucifer1004
cherichy

Poem

🐰 From strings to tidy dicts I hop with glee,
run_auto scans hum in segments for me,
MFMA learns a tiny custom tune,
RDNA tests sleep under gfx12's moon,
CuTeDSL stores march, pow_of_int counts each rune.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly addresses the main objective: fixing CI failures across AMD ROCm GPU variants (gfx950, gfx1100, gfx1151, gfx1201). Changes span scan operations, autotuner, test gating, and target-specific handling.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@testing/python/issue/test_tilelang_issue_2123.py`:
- Around line 10-16: The broad except Exception around importing
CUDAPassPipelineBodyPrologue and tilelang.cuda._ffi_api should be narrowed to
only catch import-related errors; replace the generic except with an except that
catches ImportError and ModuleNotFoundError so that failures inside the imported
modules (e.g., syntax/attribute errors) surface instead of being masked. Update
the try/except around the CUDAPassPipelineBodyPrologue and _cuda_ffi_api imports
that set _has_cuda_transforms to False on failure to only catch
ImportError/ModuleNotFoundError while leaving other exceptions to propagate.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 32b3e6ce-dead-47e1-ac9e-3fc515fcdb98

📥 Commits

Reviewing files that changed from the base of the PR and between 3d95d65 and 104c3ab.

📒 Files selected for processing (12)

examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_cdna4.py
src/backend/common/op/scan.h
src/tl_templates/hip/scan.h
testing/python/amd/test_tilelang_mxfp4_gfx950.py
testing/python/issue/test_tilelang_issue_2123.py
testing/python/kernel/test_tilelang_kernel_gemm.py
testing/python/language/test_tilelang_language_eager_jit.py
testing/python/target/test_tilelang_rocm_target.py
tilelang/autotuner/tuner.py
tilelang/language/tir/op.py
tilelang/rocm/intrinsics/mfma_macro_generator.py
tilelang/testing/__init__.py

coderabbitai · 2026-06-03T15:13:02Z

+try:
+    from tilelang.cuda.pipeline import CUDAPassPipelineBodyPrologue
+    import tilelang.cuda._ffi_api as _cuda_ffi_api
+
+    _has_cuda_transforms = hasattr(_cuda_ffi_api, "LowerBlackwell2SM")
+except Exception:
+    _has_cuda_transforms = False


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Narrow the exception handling to catch only import-related exceptions.

The bare except Exception: clause catches all exceptions, which could mask unexpected errors during import such as syntax errors or attribute errors in the imported modules. This makes debugging harder if the CUDA modules have actual defects.

🛡️ Proposed fix to narrow exception handling

try: from tilelang.cuda.pipeline import CUDAPassPipelineBodyPrologue import tilelang.cuda._ffi_api as _cuda_ffi_api _has_cuda_transforms = hasattr(_cuda_ffi_api, "LowerBlackwell2SM") -except Exception: +except (ImportError, AttributeError): _has_cuda_transforms = False

🧰 Tools

🪛 Ruff (0.15.15)

[warning] 15-15: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@testing/python/issue/test_tilelang_issue_2123.py` around lines 10 - 16, The broad except Exception around importing CUDAPassPipelineBodyPrologue and tilelang.cuda._ffi_api should be narrowed to only catch import-related errors; replace the generic except with an except that catches ImportError and ModuleNotFoundError so that failures inside the imported modules (e.g., syntax/attribute errors) surface instead of being masked. Update the try/except around the CUDAPassPipelineBodyPrologue and _cuda_ffi_api imports that set _has_cuda_transforms to False on failure to only catch ImportError/ModuleNotFoundError while leaving other exceptions to propagate.

LeiWang1999 · 2026-06-04T06:43:05Z

thanks @zhangnju it's likely this commit breaks ci.

zhangnju · 2026-06-04T15:14:22Z

thanks @zhangnju it's likely this commit breaks ci.

let me check it. Thanks for your info.

Two CUDA CI failures: 1. cuda/scan.h: Add run_auto to all scan structs (InclusiveScan1D/2D, CumSum1D/2D, CumMax1D/2D). PR tile-ai#2262 made codegen always emit ::run_auto but forgot to add the method to the CUDA templates (HIP had it). On CUDA warp size is fixed at 32, so run_auto delegates to run<T, 32>. 2. autotuner/tuner.py: Skip autotuning and scalar-input validation when the caller already supplies all config keys explicitly. PR tile-ai#2084 added validation in AutoTuner.run() that fires even when the user calls with fixed params (e.g. block_M=64, ...) and no set_autotune_inputs context, causing test_example_mha_fwd_varlen to fail with ValueError on max_seqlen_q.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

testing/python/language/test_tilelang_language_eager_jit.py (1)

79-87: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The product(...) iterator is consumed before the runtime assertions loop.

prod is exhausted by the compile list comprehension, so the later verification loop does not run any cases.

💡 Proposed fix

-    prod = product(in_dtypes, [T.float32])
+    prod = list(product(in_dtypes, [T.float32]))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testing/python/language/test_tilelang_language_eager_jit.py` around lines 79
- 87, The product(...) iterator is consumed by the list comprehension in the
gemm_ptr.par_compile call; convert the iterator to a reusable list before using
it in both places (e.g., change prod = product(...) to prod =
list(product(in_dtypes, [T.float32]))), then use that prod in
gemm_ptr.par_compile and in the subsequent for in_dtype, out_dtype in prod loop
so the verification loop actually iterates; reference symbols: prod, product,
gemm_ptr.par_compile, and the for in_dtype, out_dtype in prod loop.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tilelang/autotuner/tuner.py`:
- Around line 1286-1289: The fast-path skip reads tunable keys from only
configs[0], which can wrongly skip tuning when later configs add keys; update
the logic in the tuner (where self.configs, config_keys, mode and kwargs are
used) to compute config_keys as the union of keys from all configs (e.g.,
iterate over self.configs if it's a list) instead of using only configs[0], then
perform the issubset check against kwargs as before so the skip condition is
correct for all provided configs.

---

Outside diff comments:
In `@testing/python/language/test_tilelang_language_eager_jit.py`:
- Around line 79-87: The product(...) iterator is consumed by the list
comprehension in the gemm_ptr.par_compile call; convert the iterator to a
reusable list before using it in both places (e.g., change prod = product(...)
to prod = list(product(in_dtypes, [T.float32]))), then use that prod in
gemm_ptr.par_compile and in the subsequent for in_dtype, out_dtype in prod loop
so the verification loop actually iterates; reference symbols: prod, product,
gemm_ptr.par_compile, and the for in_dtype, out_dtype in prod loop.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 44057fbc-277c-413c-8ef8-392d8223ae69

📥 Commits

Reviewing files that changed from the base of the PR and between 104c3ab and 75efebc.

📒 Files selected for processing (13)

examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_cdna4.py
src/backend/common/op/scan.h
src/tl_templates/cuda/scan.h
src/tl_templates/hip/scan.h
testing/python/amd/test_tilelang_mxfp4_gfx950.py
testing/python/issue/test_tilelang_issue_2123.py
testing/python/kernel/test_tilelang_kernel_gemm.py
testing/python/language/test_tilelang_language_eager_jit.py
testing/python/target/test_tilelang_rocm_target.py
tilelang/autotuner/tuner.py
tilelang/language/tir/op.py
tilelang/rocm/intrinsics/mfma_macro_generator.py
tilelang/testing/__init__.py

✅ Files skipped from review due to trivial changes (1)

src/backend/common/op/scan.h

🚧 Files skipped from review as they are similar to previous changes (8)

testing/python/kernel/test_tilelang_kernel_gemm.py
tilelang/rocm/intrinsics/mfma_macro_generator.py
testing/python/amd/test_tilelang_mxfp4_gfx950.py
src/tl_templates/hip/scan.h
tilelang/testing/init.py
examples/dequantize_gemm/example_dequant_gemm_bf16_mxfp4_cdna4.py
tilelang/language/tir/op.py
testing/python/target/test_tilelang_rocm_target.py

coderabbitai · 2026-06-11T09:38:25Z

+            configs = self.configs
+            config_keys = set(configs[0].keys()) if isinstance(configs, list) and configs else set()
+            if config_keys and config_keys.issubset(kwargs.keys()):
+                if mode == "lazy":


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fast-path skip condition only inspects the first config and can bypass tuning incorrectly.

The skip decision derives tunable keys from only configs[0]. If later configs contain additional keys, autotuning is skipped even when not all tunables are supplied.

💡 Proposed fix

- configs = self.configs - config_keys = set(configs[0].keys()) if isinstance(configs, list) and configs else set() + configs = self.configs + if isinstance(configs, list) and configs: + config_keys = set().union( + *(cfg.keys() for cfg in configs if isinstance(cfg, dict)) + ) + else: + config_keys = set() if config_keys and config_keys.issubset(kwargs.keys()):

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/autotuner/tuner.py` around lines 1286 - 1289, The fast-path skip reads tunable keys from only configs[0], which can wrongly skip tuning when later configs add keys; update the logic in the tuner (where self.configs, config_keys, mode and kwargs are used) to compute config_keys as the union of keys from all configs (e.g., iterate over self.configs if it's a list) instead of using only configs[0], then perform the issubset check against kwargs as before so the skip condition is correct for all provided configs.

- tcgen05_st_32dp{32,64,128,256}bNx: mirror TMEM load functions for store direction - tcgen05_before/after_thread_sync: SM100 TMEM ordering fences around __syncthreads - pow_of_int: Python backend for tl::pow_of_int<N> call_extern translation - CumSum1D/2D, CumMax1D/2D: add run_auto wrappers with @cute.jit

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tilelang/contrib/cutedsl/math.py`:
- Around line 243-260: The pow_of_int helper must reject negative exponents; add
validation at the start of pow_of_int (or inside the returned _pow) to raise a
clear error (e.g., ValueError) when exp < 0 so callers don't get silent
incorrect results—refer to the pow_of_int function and the inner _pow closure
and ensure the check runs before attempting the exp==0 or loop logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 948b40b5-e211-4a73-85e8-ca35473a0be1

📥 Commits

Reviewing files that changed from the base of the PR and between 91d9a27 and 653cdf8.

📒 Files selected for processing (3)

tilelang/contrib/cutedsl/gemm_tcgen05.py
tilelang/contrib/cutedsl/math.py
tilelang/contrib/cutedsl/reduce.py

coderabbitai · 2026-06-11T13:59:19Z

+def pow_of_int(exp: int):
+    """Return a function that raises its argument to the integer power `exp`.
+
+    Mirrors tl::pow_of_int<exp> from math.cc — the C++ codegen emits
+    tl::pow_of_int<N> as a call_extern, which the CuTeDSL codegen translates
+    to tl.pow_of_int(N)(base). On CUDA/HIP the op is lowered by FLowerIntrinsic
+    before reaching call_extern; for the CuTeDSL Python backend it reaches here.
+    """
+
+    def _pow(base):
+        if exp == 0:
+            return type(base)(1)
+        result = base
+        for _ in range(exp - 1):
+            result = result * base
+        return result
+
+    return _pow


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add validation for negative exponents to avoid silent incorrect results.

If exp < 0, range(exp - 1) returns an empty iterator, causing the function to return base unchanged instead of raising an error or computing 1/base^|exp|. Since this helper mirrors tl::pow_of_int<N> which expects a compile-time constant, consider validating non-negative input.

🛡️ Proposed fix to validate exponent

def pow_of_int(exp: int): """Return a function that raises its argument to the integer power `exp`. Mirrors tl::pow_of_int<exp> from math.cc — the C++ codegen emits tl::pow_of_int<N> as a call_extern, which the CuTeDSL codegen translates to tl.pow_of_int(N)(base). On CUDA/HIP the op is lowered by FLowerIntrinsic before reaching call_extern; for the CuTeDSL Python backend it reaches here. """ + if exp < 0: + raise ValueError(f"pow_of_int requires non-negative exponent, got {exp}") def _pow(base): if exp == 0: return type(base)(1) result = base for _ in range(exp - 1): result = result * base return result return _pow

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def pow_of_int(exp: int):

"""Return a function that raises its argument to the integer power `exp`.

Mirrors tl::pow_of_int<exp> from math.cc — the C++ codegen emits

tl::pow_of_int<N> as a call_extern, which the CuTeDSL codegen translates

to tl.pow_of_int(N)(base). On CUDA/HIP the op is lowered by FLowerIntrinsic

before reaching call_extern; for the CuTeDSL Python backend it reaches here.

"""

def _pow(base):

if exp == 0:

return type(base)(1)

result = base

for _ in range(exp - 1):

result = result * base

return result

return _pow

def pow_of_int(exp: int):

"""Return a function that raises its argument to the integer power `exp`.

Mirrors tl::pow_of_int<exp> from math.cc — the C++ codegen emits

tl::pow_of_int<N> as a call_extern, which the CuTeDSL codegen translates

to tl.pow_of_int(N)(base). On CUDA/HIP the op is lowered by FLowerIntrinsic

before reaching call_extern; for the CuTeDSL Python backend it reaches here.

"""

if exp < 0:

raise ValueError(f"pow_of_int requires non-negative exponent, got {exp}")

def _pow(base):

if exp == 0:

return type(base)(1)

result = base

for _ in range(exp - 1):

result = result * base

return result

return _pow

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tilelang/contrib/cutedsl/math.py` around lines 243 - 260, The pow_of_int helper must reject negative exponents; add validation at the start of pow_of_int (or inside the returned _pow) to raise a clear error (e.g., ValueError) when exp < 0 so callers don't get silent incorrect results—refer to the pow_of_int function and the inner _pow closure and ensure the check runs before attempting the exp==0 or loop logic.

zhangnju · 2026-06-11T14:57:55Z

thanks @zhangnju it's likely this commit breaks ci.

@LeiWang1999 CUDA CI issues have been fixed now.

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

zhangnju added 4 commits June 11, 2026 09:28

Fix CI failures on gfx950, gfx1100, gfx1151, and gfx1201

8cbd6d2

update codes for format checking

a5bd6a0

update codes for format checking

75efebc

zhangnju force-pushed the rocm_ci branch from aea12e5 to 75efebc Compare June 11, 2026 09:28

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

zhangnju added 2 commits June 11, 2026 09:50

fix the CI issue on gfx950

91d9a27

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

Conversation

zhangnju commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

LeiWang1999 commented Jun 4, 2026

Uh oh!

zhangnju commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

zhangnju commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangnju commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading