Releases · zerfoo/ztensor · GitHub

20 Jun 23:58

v1.19.0 Latest

Latest

1.19.0 (2026-06-20)

Features

compute: CPU dropout op with deterministic Philox mask (BPB.3a) (d82e8aa)
cuda: GB10 GPU dropout kernel mirroring the CPU Philox mask (BPB.3a) (611772b)

Bug Fixes

cuda: pass dropout p/invKeep as int bit patterns (purego ABI) (a25e902)
cuda: reinterpret dropout p/invKeep bits host-side (not __uint_as_float) (d00cc0c)

Assets 2

17 Jun 08:03

v1.18.0

1.18.0 (2026-06-17)

Features

compute: on-device bf16 broadcast + scalar ops (capture-safe) (d4fe1ba)

Bug Fixes

gradcheck: pin CrossAttention fwd intermediates across arena reset (ed96180)

Assets 2

17 Jun 06:56

v1.17.1

1.17.1 (2026-06-17)

Bug Fixes

compute: keep bf16 GPUStorage reshape on-device (capture-safe) (508f01d)

Assets 2

17 Jun 06:42

v1.17.0

1.17.0 (2026-06-17)

Features

compute: native bf16 GPU transpose kernels (capture-safe) (25f5981)

Assets 2

17 Jun 05:01

v1.16.0

1.16.0 (2026-06-17)

Features

compute: native bf16 transpose-variant GEMMs (NT/TN) for the bf16 backward (2f63239)

Assets 2

17 Jun 00:11

v1.15.0

1.15.0 (2026-06-17)

Features

compute: generic bf16 bulk weight upload (device-residency) (419564c)
compute: native bf16 GPU axis reductions (Sum/ReduceSum/ReduceMean) (98ad5e3)
compute: native bf16 GPU Rsqrt (close non-f32 CPU fallback) (45bea61)

Assets 2

16 Jun 23:13

v1.14.0

1.14.0 (2026-06-16)

Features

graph: bf16/fp16 cases for Parameter.AddGradient + ClearGradient (cfa1b45)

Assets 2

16 Jun 22:46

v1.13.0

1.13.0 (2026-06-16)

Features

compute: bf16 dispatch + parity tests for fused norm kernels (ADR 075 L4) (4d1466a)
compute: native bf16 GPU elementwise + AdamW dispatch (ADR 075 L4) (897d1d6)
compute: tiny-matrix batched GEMM kernel for small attention shapes (ADR 075 L3) (6554b8c)
gpuapi: bf16 fused norm methods on KernelRunner (ADR 075 L4) (f3db3ca)
gpuapi: bf16 kernel methods on KernelRunner (ADR 075 L4) (c6a4d3d)
kernels: bf16 forward-only fused norm CUDA kernels (ADR 075 L4) (c62c6e7)
kernels: bf16 GPU elementwise + AdamW CUDA kernels (ADR 075 L4) (dc3ed14)

Assets 2

16 Jun 07:57

v1.12.0

1.12.0 (2026-06-16)

Features

compute: fused on-device AdamW kernel (ADR 070 end-state, ADR 075 L1) (8787214)

Bug Fixes

adamw kernel: pass f64 scalars as integer-register bit patterns (purego ABI) (17ca699)

Assets 2

13 Jun 01:49

v1.11.1

1.11.1 (2026-06-13)

Bug Fixes

compute: N-D transpose kernel params are device-resident, engine-owned (58fc331)
kernels: remove global --use_fast_math; selective __expf in softmax only (T3.1) (1fd2e89)

Assets 2