Skip to content

Releases: zerfoo/ztensor

v1.19.0

20 Jun 23:58

Choose a tag to compare

1.19.0 (2026-06-20)

Features

  • compute: CPU dropout op with deterministic Philox mask (BPB.3a) (d82e8aa)
  • cuda: GB10 GPU dropout kernel mirroring the CPU Philox mask (BPB.3a) (611772b)

Bug Fixes

  • cuda: pass dropout p/invKeep as int bit patterns (purego ABI) (a25e902)
  • cuda: reinterpret dropout p/invKeep bits host-side (not __uint_as_float) (d00cc0c)

v1.18.0

17 Jun 08:03

Choose a tag to compare

1.18.0 (2026-06-17)

Features

  • compute: on-device bf16 broadcast + scalar ops (capture-safe) (d4fe1ba)

Bug Fixes

  • gradcheck: pin CrossAttention fwd intermediates across arena reset (ed96180)

v1.17.1

17 Jun 06:56

Choose a tag to compare

1.17.1 (2026-06-17)

Bug Fixes

  • compute: keep bf16 GPUStorage reshape on-device (capture-safe) (508f01d)

v1.17.0

17 Jun 06:42

Choose a tag to compare

1.17.0 (2026-06-17)

Features

  • compute: native bf16 GPU transpose kernels (capture-safe) (25f5981)

v1.16.0

17 Jun 05:01

Choose a tag to compare

1.16.0 (2026-06-17)

Features

  • compute: native bf16 transpose-variant GEMMs (NT/TN) for the bf16 backward (2f63239)

v1.15.0

17 Jun 00:11

Choose a tag to compare

1.15.0 (2026-06-17)

Features

  • compute: generic bf16 bulk weight upload (device-residency) (419564c)
  • compute: native bf16 GPU axis reductions (Sum/ReduceSum/ReduceMean) (98ad5e3)
  • compute: native bf16 GPU Rsqrt (close non-f32 CPU fallback) (45bea61)

v1.14.0

16 Jun 23:13

Choose a tag to compare

1.14.0 (2026-06-16)

Features

  • graph: bf16/fp16 cases for Parameter.AddGradient + ClearGradient (cfa1b45)

v1.13.0

16 Jun 22:46

Choose a tag to compare

1.13.0 (2026-06-16)

Features

  • compute: bf16 dispatch + parity tests for fused norm kernels (ADR 075 L4) (4d1466a)
  • compute: native bf16 GPU elementwise + AdamW dispatch (ADR 075 L4) (897d1d6)
  • compute: tiny-matrix batched GEMM kernel for small attention shapes (ADR 075 L3) (6554b8c)
  • gpuapi: bf16 fused norm methods on KernelRunner (ADR 075 L4) (f3db3ca)
  • gpuapi: bf16 kernel methods on KernelRunner (ADR 075 L4) (c6a4d3d)
  • kernels: bf16 forward-only fused norm CUDA kernels (ADR 075 L4) (c62c6e7)
  • kernels: bf16 GPU elementwise + AdamW CUDA kernels (ADR 075 L4) (dc3ed14)

v1.12.0

16 Jun 07:57

Choose a tag to compare

1.12.0 (2026-06-16)

Features

  • compute: fused on-device AdamW kernel (ADR 070 end-state, ADR 075 L1) (8787214)

Bug Fixes

  • adamw kernel: pass f64 scalars as integer-register bit patterns (purego ABI) (17ca699)

v1.11.1

13 Jun 01:49

Choose a tag to compare

1.11.1 (2026-06-13)

Bug Fixes

  • compute: N-D transpose kernel params are device-resident, engine-owned (58fc331)
  • kernels: remove global --use_fast_math; selective __expf in softmax only (T3.1) (1fd2e89)