I'm BBuf (Xiaoyu Zhang), a Core Developer at SGLang and working at SkyworkAI.
I focus on LLM inference optimization, CUDA kernel engineering, and AI infrastructure β writing high-performance GPU code and pushing the boundaries of large model serving.
π I share technical deep-dives on my WeChat public account GiantPandaCV and η₯δΉδΈζ .
SGLang is a high-performance serving framework for LLMs and multimodal models, powering 400,000+ GPUs worldwide. Trusted by xAI, NVIDIA, AMD, Google Cloud, Microsoft Azure and many more.
Cache-DiT is a PyTorch-native inference acceleration framework for Diffusion Transformer (DiT) models. Supports FLUX, HunyuanVideo, WAN2.1, Qwen-Image and 70+ models with train-free caching and hybrid parallelism.
CUDA optimization notes covering kernels, CUTLASS/CuTe, Triton, CUDA-MODE course, LLM inference/training optimization (SGLang, vLLM, MoE, Flash Attention, etc.), and PyTorch internals.





