Change the repository type filter
All
Repositories list
32 repositories
webgpu-sorting
PublicHigh-performance GPU sorting library using WebGPU compute shaders (Bitonic Sort, Radix Sort) with TypeScript API, live demo, and comprehensive documentationhetero-paged-infer
PublicHigh-Performance LLM Inference Engine with PagedAttention & Continuous Batching in Rustbuild-your-own-tools
Publicmini-opencv
PublicCUDA-accelerated GPU image processing library. 30-50x faster than CPU OpenCV. High-performance operators for computer vision: convolution, morphology, filters, …modern-ai-kernels
PublicTensorCraft-HPC: A header-only C++/CUDA kernel library for learning high-performance AI operators with progressive optimization pathsn-body
PublicHigh-performance N-body particle simulation with Barnes-Hut algorithm, GPU acceleration, and real-time visualizationaurora-signal
Publicmini-image-pipe
PublicGPU-accelerated image processing pipeline with DAG scheduling, CUDA operators, and multi-stream executionfq-compressor-rust
PublicHigh-performance FASTQ compressor with block-indexed archive format, random access support, and multiple compression modescuflash-attn
PublicCUDA C++ FlashAttention reference implementation - O(N) memory, FP32/FP16, forward/backwardmini-inference-engine
PublicCUDA GEMM optimization tutorial and mini inference engine with progressive kernels, benchmarks, and OpenSpec docstiny-llm
PublicCUDA-native C++ Transformer inference engine with W8A16 quantization, KV cache management, and optimized CUDA kernelsgpu-spmv
PublicHigh-Performance CUDA Sparse Matrix-Vector Multiplication Library • 70%+ bandwidth utilization • 4 optimized kernels • Spec-Driven Developmentray-tracer
PublicGPU-accelerated ray tracer with BVH acceleration and path tracing support- 从编译器优化到 GPU 内核开发 — AI 基础设施工程师综合学习资源 | TVM, ONNX Runtime, CUTLASS, Triton
gpu-fft
PublicHigh-performance GPU-accelerated FFT library for JavaScript/TypeScript using WebGPU compute shaders. Zero runtime dependencies, dual GPU/CPU paths, TypeScript-f…particle-fluid-sim
PublicHigh-performance WebGPU particle fluid simulation with compute shaderstiny-dl-inference
PublicZero-dependency WebGPU deep learning inference engine (~50KB vs TensorFlow.js ~2MB)llm-speed
PublicCUDA kernels for LLM inference: FlashAttention forward, Tensor Core GEMM, and PyTorch bindingshpc-ai-optimization-lab
PublicCUDA kernel optimization lab: GEMM, FlashAttention, quantization, and GPU performance learning.- C++17 DAG scheduler for heterogeneous CPU/GPU workloads - production-ready with CPU-only validation path
cuda-kernel-academy
Publicai-inference-hpc
Publictriton-fused-ops
PublicFused Triton kernels for Transformer inference: RMSNorm+RoPE, Gated MLP, FP8 GEMM — CPU-testable references, autotuning, and benchmarkingdiy-flash-attention
PublicLearn Triton by building FlashAttention from scratch — V2 kernels, persistent threads, mask DSL, profiling toolkit, bilingual docsbwa-rust
PublicMemory-safe BWA-MEM style single-end DNA aligner in Rustchatroom
PublicTeaching-oriented real-time chat app with Go, React, PostgreSQL, WebSocket, observability, and OpenSpec-driven workflow.bookmarks-manager
Public本地优先的浏览器书签管理工具 - 导入、去重、搜索、备份、导出,全程本地处理,无需上传- 📚 秘密知识之书中文版 - A curated collection of tools, manuals, cheatsheets, and resources for SysAdmins, DevOps, Pentesters and Security Researchers. Chinese translation…
go-live
Public轻量级 WebRTC SFU 服务器 | Lightweight WebRTC SFU Server - WHIP/WHEP streaming, recording, observability with Go + Pion WebRTC
ProTip! When viewing an organization's repositories, you can use the
props. filter to filter by custom property.