Releases: zerfoo/ztensor
Releases · zerfoo/ztensor
v1.19.0
20 Jun 23:58
Compare
Sorry, something went wrong.
No results found
1.19.0 (2026-06-20)
Features
compute: CPU dropout op with deterministic Philox mask (BPB.3a) (d82e8aa )
cuda: GB10 GPU dropout kernel mirroring the CPU Philox mask (BPB.3a) (611772b )
Bug Fixes
cuda: pass dropout p/invKeep as int bit patterns (purego ABI) (a25e902 )
cuda: reinterpret dropout p/invKeep bits host-side (not __uint_as_float) (d00cc0c )
v1.18.0
17 Jun 08:03
Compare
Sorry, something went wrong.
No results found
1.18.0 (2026-06-17)
Features
compute: on-device bf16 broadcast + scalar ops (capture-safe) (d4fe1ba )
Bug Fixes
gradcheck: pin CrossAttention fwd intermediates across arena reset (ed96180 )
v1.17.1
17 Jun 06:56
Compare
Sorry, something went wrong.
No results found
1.17.1 (2026-06-17)
Bug Fixes
compute: keep bf16 GPUStorage reshape on-device (capture-safe) (508f01d )
v1.17.0
17 Jun 06:42
Compare
Sorry, something went wrong.
No results found
1.17.0 (2026-06-17)
Features
compute: native bf16 GPU transpose kernels (capture-safe) (25f5981 )
v1.16.0
17 Jun 05:01
Compare
Sorry, something went wrong.
No results found
1.16.0 (2026-06-17)
Features
compute: native bf16 transpose-variant GEMMs (NT/TN) for the bf16 backward (2f63239 )
v1.15.0
17 Jun 00:11
Compare
Sorry, something went wrong.
No results found
1.15.0 (2026-06-17)
Features
compute: generic bf16 bulk weight upload (device-residency) (419564c )
compute: native bf16 GPU axis reductions (Sum/ReduceSum/ReduceMean) (98ad5e3 )
compute: native bf16 GPU Rsqrt (close non-f32 CPU fallback) (45bea61 )
v1.14.0
16 Jun 23:13
Compare
Sorry, something went wrong.
No results found
1.14.0 (2026-06-16)
Features
graph: bf16/fp16 cases for Parameter.AddGradient + ClearGradient (cfa1b45 )
v1.13.0
16 Jun 22:46
Compare
Sorry, something went wrong.
No results found
1.13.0 (2026-06-16)
Features
compute: bf16 dispatch + parity tests for fused norm kernels (ADR 075 L4) (4d1466a )
compute: native bf16 GPU elementwise + AdamW dispatch (ADR 075 L4) (897d1d6 )
compute: tiny-matrix batched GEMM kernel for small attention shapes (ADR 075 L3) (6554b8c )
gpuapi: bf16 fused norm methods on KernelRunner (ADR 075 L4) (f3db3ca )
gpuapi: bf16 kernel methods on KernelRunner (ADR 075 L4) (c6a4d3d )
kernels: bf16 forward-only fused norm CUDA kernels (ADR 075 L4) (c62c6e7 )
kernels: bf16 GPU elementwise + AdamW CUDA kernels (ADR 075 L4) (dc3ed14 )
v1.12.0
16 Jun 07:57
Compare
Sorry, something went wrong.
No results found
1.12.0 (2026-06-16)
Features
compute: fused on-device AdamW kernel (ADR 070 end-state, ADR 075 L1) (8787214 )
Bug Fixes
adamw kernel: pass f64 scalars as integer-register bit patterns (purego ABI) (17ca699 )
v1.11.1
13 Jun 01:49
Compare
Sorry, something went wrong.
No results found
1.11.1 (2026-06-13)
Bug Fixes
compute: N-D transpose kernel params are device-resident, engine-owned (58fc331 )
kernels: remove global --use_fast_math; selective __expf in softmax only (T3.1) (1fd2e89 )