All

32 repositories

webgpu-sorting
Public
High-performance GPU sorting library using WebGPU compute shaders (Bitonic Sort, Radix Sort) with TypeScript API, live demo, and comprehensive documentation
benchmark sorting typescript
benchmark sorting typescript parallel-computing gpu-computing radix-sort bitonic-sort compute-shader webgpu compute-shaders
TypeScript
•
MIT License
•0•0•0•0•Updated Jul 1, 2026Jul 1, 2026
hetero-paged-infer
Public
High-Performance LLM Inference Engine with PagedAttention & Continuous Batching in Rust
rust machine-learning high-performance
rust machine-learning high-performance inference transformer gpu-computing production-ready systems-programming inference-engine serving
Rust
•
MIT License
•0•2•0•4•Updated Jun 29, 2026Jun 29, 2026
build-your-own-tools
Public
用 Rust/Go 重写 CLI 工具的学习仓库: dos2unix, gzip, htop
rust golang tui
rust golang tui system-programming learning-project cli-tools
Rust
•
Other
•0•3•0•15•Updated Jun 29, 2026Jun 29, 2026
mini-opencv
Public
CUDA-accelerated GPU image processing library. 30-50x faster than CPU OpenCV. High-performance operators for computer vision: convolution, morphology, filters, …
opencv cmake computer-vision
opencv cmake computer-vision cpp high-performance parallel-computing cuda image-processing nvidia gpu-acceleration
C++
•
MIT License
•0•1•0•2•Updated Jun 25, 2026Jun 25, 2026
modern-ai-kernels
Public
TensorCraft-HPC: A header-only C++/CUDA kernel library for learning high-performance AI operators with progressive optimization paths
deep-learning cpp hpc
deep-learning cpp hpc cuda header-only high-performance-computing gpu-computing gemm hppc normalization
C++
•
MIT License
•0•0•0•0•Updated Jun 24, 2026Jun 24, 2026
n-body
Public
High-performance N-body particle simulation with Barnes-Hut algorithm, GPU acceleration, and real-time visualization
visualization opengl cpp
visualization opengl cpp simulation physics realtime parallel-computing cuda physics-engine high-performance-computing
C++
•
MIT License
•0•1•0•1•Updated Jun 22, 2026Jun 22, 2026
aurora-signal
Public
Lightweight WebRTC Signaling Server (Go): Room Management, Role-Based Auth, Redis Scaling & Prometheus Metrics | 轻量级 WebRTC 信令服务器（Go），支持房间管理、角色权限、Redis 水平扩展与 Pr…
go docker redis
go docker redis golang websocket webrtc prometheus signaling-server
Go
•
MIT License
•0•0•0•0•Updated Jun 22, 2026Jun 22, 2026
mini-image-pipe
Public
GPU-accelerated image processing pipeline with DAG scheduling, CUDA operators, and multi-stream execution
cmake computer-vision pipeline
cmake computer-vision pipeline cpp hpc parallel-computing cuda image-processing video-processing gpu-acceleration
C++
•
MIT License
•0•0•0•1•Updated Jun 22, 2026Jun 22, 2026
fq-compressor-rust
Public
High-performance FASTQ compressor with block-indexed archive format, random access support, and multiple compression modes
rust bioinformatics compression
rust bioinformatics compression genomics sequencing zstd command-line-tool fastq cli-tool
Rust
•
GNU General Public License v3.0
•0•3•0•0•Updated May 28, 2026May 28, 2026
cuflash-attn
Public
CUDA C++ FlashAttention reference implementation - O(N) memory, FP32/FP16, forward/backward
c-plus-plus machine-learning deep-learning
c-plus-plus machine-learning deep-learning cpp hpc gpu cuda pytorch nvidia transformer
Cuda
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
mini-inference-engine
Public
CUDA GEMM optimization tutorial and mini inference engine with progressive kernels, benchmarks, and OpenSpec docs
deep-learning cpp hpc
deep-learning cpp hpc cuda inference nvidia matrix-multiplication high-performance-computing educational cpp17
C++
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
tiny-llm
Public
CUDA-native C++ Transformer inference engine with W8A16 quantization, KV cache management, and optimized CUDA kernels
cmake cpp cuda
cmake cpp cuda nvidia transformer quantization kv-cache tensor-core llm-inference w8a16
C++
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
gpu-spmv
Public
High-Performance CUDA Sparse Matrix-Vector Multiplication Library • 70%+ bandwidth utilization • 4 optimized kernels • Spec-Driven Development
cpp graph-algorithms hpc
cpp graph-algorithms hpc linear-algebra cuda pagerank nvidia scientific-computing high-performance-computing cpp17
C++
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
ray-tracer
Public
GPU-accelerated ray tracer with BVH acceleration and path tracing support
cplusplus cpp graphics
cplusplus cpp graphics rendering monte-carlo computer-graphics path-tracer cuda raytracer raytracing
Cuda
•
MIT License
•0•1•0•0•Updated May 25, 2026May 25, 2026
ai-system-optimization-series
Public
从编译器优化到 GPU 内核开发 — AI 基础设施工程师综合学习资源 | TVM, ONNX Runtime, CUTLASS, Triton
performance-engineering tutorial deep-learning
performance-engineering tutorial deep-learning compiler hpc optimization cuda high-performance-computing triton cutlass
Python
•
MIT License
•0•1•0•0•Updated May 25, 2026May 25, 2026
gpu-fft
Public
High-performance GPU-accelerated FFT library for JavaScript/TypeScript using WebGPU compute shaders. Zero runtime dependencies, dual GPU/CPU paths, TypeScript-f…
typescript browser signal-processing
typescript browser signal-processing dsp image-processing fft gpu-computing webgpu compute-shaders fourier-transform
TypeScript
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
particle-fluid-sim
Public
High-performance WebGPU particle fluid simulation with compute shaders
typescript fluid-simulation compute-shader
typescript fluid-simulation compute-shader webgpu particle-simulation compute-shaders vite wgsl real-time-graphics openspec
TypeScript
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
tiny-dl-inference
Public
Zero-dependency WebGPU deep learning inference engine (~50KB vs TensorFlow.js ~2MB)
machine-learning typescript browser
machine-learning typescript browser deep-learning neural-network wasm inference mnist tensor gpu-computing
TypeScript
•
MIT License
•0•3•0•0•Updated May 25, 2026May 25, 2026
llm-speed
Public
CUDA kernels for LLM inference: FlashAttention forward, Tensor Core GEMM, and PyTorch bindings
deep-learning gpu cuda
deep-learning gpu cuda inference pytorch nvidia gemm llm tensor-core flashattention
Python
•
MIT License
•0•1•0•0•Updated May 25, 2026May 25, 2026
hpc-ai-optimization-lab
Public
CUDA kernel optimization lab: GEMM, FlashAttention, quantization, and GPU performance learning.
cuda high-performance-computing cuda-kernels
cuda high-performance-computing cuda-kernels gpu-computing gemm cpp20 gpu-programming ai-inference tensor-core nanobind
Cuda
•
MIT License
•0•1•0•0•Updated May 25, 2026May 25, 2026
heterogeneous-task-scheduler
Public
C++17 DAG scheduler for heterogeneous CPU/GPU workloads - production-ready with CPU-only validation path
cpp cuda cpp17
cpp cuda cpp17 gpu-computing task-scheduler memory-pool task-graph heterogeneous-computing dag-execution
C++
•
MIT License
•0•4•0•0•Updated May 25, 2026May 25, 2026
cuda-kernel-academy
Public
Systematic CUDA kernel engineering from SGEMM fundamentals to reusable kernels, advanced optimization, and inference components
education tutorial cplusplus
education tutorial cplusplus hpc cuda gemm inference-engine gpu-programming sgemm cuda-programming
C++
•
MIT License
•0•0•0•0•Updated May 25, 2026May 25, 2026
ai-inference-hpc
Public
AI Inference & HPC Lab - High-Performance Computing for AI Inference
0•0•0•0•Updated May 25, 2026May 25, 2026
triton-fused-ops
Public
Fused Triton kernels for Transformer inference: RMSNorm+RoPE, Gated MLP, FP8 GEMM — CPU-testable references, autotuning, and benchmarking
python acceleration deep-learning
python acceleration deep-learning high-performance cuda inference pytorch nvidia triton gpu-computing
Python
•
MIT License
•0•4•0•0•Updated May 25, 2026May 25, 2026
diy-flash-attention
Public
Learn Triton by building FlashAttention from scratch — V2 kernels, persistent threads, mask DSL, profiling toolkit, bilingual docs
tutorial cuda pytorch
tutorial cuda pytorch triton educational attention-mechanism gpu-programming forward-pass flash-attention kernel-optimization
Python
•
MIT License
•0•4•0•0•Updated May 25, 2026May 25, 2026
bwa-rust
Public
Memory-safe BWA-MEM style single-end DNA aligner in Rust
rust bioinformatics genomics
rust bioinformatics genomics sequencing smith-waterman bwa-mem sequence-alignment fm-index dna-alignment memory-safe
Rust
•
MIT License
•0•3•1•0•Updated May 22, 2026May 22, 2026
chatroom
Public
Teaching-oriented real-time chat app with Go, React, PostgreSQL, WebSocket, observability, and OpenSpec-driven workflow.
react go docker
react go docker real-time typescript websocket postgresql prometheus full-stack chat-application
Go
•
MIT License
•0•0•0•0•Updated May 22, 2026May 22, 2026
bookmarks-manager
Public
本地优先的浏览器书签管理工具 - 导入、去重、搜索、备份、导出，全程本地处理，无需上传
react typescript pwa
react typescript pwa bookmarks indexeddb deduplication bookmark-manager privacy-first local-first bookmark-cleanup
TypeScript
•0•4•0•0•Updated May 22, 2026May 22, 2026
the-book-of-secret-knowledge-zh
Public
📚 秘密知识之书中文版 - A curated collection of tools, manuals, cheatsheets, and resources for SysAdmins, DevOps, Pentesters and Security Researchers. Chinese translation…
linux shell docker
linux shell docker kubernetes cli security devops awesome networking sysadmin
Python
•
MIT License
•0•5•0•0•Updated May 22, 2026May 22, 2026
go-live
Public
轻量级 WebRTC SFU 服务器 | Lightweight WebRTC SFU Server - WHIP/WHEP streaming, recording, observability with Go + Pion WebRTC
go docker kubernetes
go docker kubernetes golang streaming real-time webrtc live-streaming media-server minio
Go
•
MIT License
•0•3•0•0•Updated May 22, 2026May 22, 2026

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AICL-Lab

All

All

32 repositories

webgpu-sorting

hetero-paged-infer

build-your-own-tools

mini-opencv

modern-ai-kernels

n-body

aurora-signal

mini-image-pipe

fq-compressor-rust

cuflash-attn

mini-inference-engine

tiny-llm

gpu-spmv

ray-tracer

ai-system-optimization-series

gpu-fft

particle-fluid-sim

tiny-dl-inference

llm-speed

hpc-ai-optimization-lab

heterogeneous-task-scheduler

cuda-kernel-academy

ai-inference-hpc

triton-fused-ops

diy-flash-attention

bwa-rust

chatroom

bookmarks-manager

the-book-of-secret-knowledge-zh

go-live

All

All

Repositories list

32 repositories