Skip to content

[Main] Numerical fix for moe single grouped weight with fp8 fp4 primary weight and grad norm spikes#5487

Open
zhongbozhu wants to merge 18 commits into
NVIDIA:mainfrom
zhongbozhu:main_fix_single_weight
Open

[Main] Numerical fix for moe single grouped weight with fp8 fp4 primary weight and grad norm spikes#5487
zhongbozhu wants to merge 18 commits into
NVIDIA:mainfrom
zhongbozhu:main_fix_single_weight