[Dev] Numerical fix for moe single grouped weight with fp8 fp4 primary weight and grad norm spikes#5464
Open
zhongbozhu wants to merge 19 commits into
Open
[Dev] Numerical fix for moe single grouped weight with fp8 fp4 primary weight and grad norm spikes#5464zhongbozhu wants to merge 19 commits into
zhongbozhu wants to merge 19 commits into
Loading