Xiaoyu Zhang BBuf

Hi there 👋

I'm BBuf (Xiaoyu Zhang), a Core Developer at SGLang and working at SkyworkAI.

I focus on LLM inference optimization, CUDA kernel engineering, and AI infrastructure — writing high-performance GPU code and pushing the boundaries of large model serving.

📝 I share technical deep-dives on my WeChat public account GiantPandaCV and 知乎专栏.

🚀 Open Source Contributions

Core Developer

SGLang is a high-performance serving framework for LLMs and multimodal models, powering 400,000+ GPUs worldwide. Trusted by xAI, NVIDIA, AMD, Google Cloud, Microsoft Azure and many more.

Cache-DiT is a PyTorch-native inference acceleration framework for Diffusion Transformer (DiT) models. Supports FLUX, HunyuanVideo, WAN2.1, Qwen-Image and 70+ models with train-free caching and hybrid parallelism.

📚 Learning Notes & Research

CUDA optimization notes covering kernels, CUTLASS/CuTe, Triton, CUDA-MODE course, LLM inference/training optimization (SGLang, vLLM, MoE, Flash Attention, etc.), and PyTorch internals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiaoyu Zhang BBuf

Achievements

Achievements

Block or report BBuf

Hi there 👋

🚀 Open Source Contributions

Core Developer

📚 Learning Notes & Research

📊 GitHub Stats

Pinned Loading

Uh oh!