[Main] Numerical fix for moe single grouped weight with fp8 fp4 primary weight and grad norm spikes#5487
Open
zhongbozhu wants to merge 18 commits into
Open
[Main] Numerical fix for moe single grouped weight with fp8 fp4 primary weight and grad norm spikes#5487zhongbozhu wants to merge 18 commits into
zhongbozhu wants to merge 18 commits into
Loading