Skip to content
View BBuf's full-sized avatar

Block or report BBuf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
BBuf/README.md

Hi there πŸ‘‹

I'm BBuf (Xiaoyu Zhang), a Core Developer at SGLang and working at SkyworkAI.

I focus on LLM inference optimization, CUDA kernel engineering, and AI infrastructure β€” writing high-performance GPU code and pushing the boundaries of large model serving.

πŸ“ I share technical deep-dives on my WeChat public account GiantPandaCV and ηŸ₯δΉŽδΈ“ζ .


πŸš€ Open Source Contributions

Core Developer

sglang

SGLang is a high-performance serving framework for LLMs and multimodal models, powering 400,000+ GPUs worldwide. Trusted by xAI, NVIDIA, AMD, Google Cloud, Microsoft Azure and many more.

cache-dit

Cache-DiT is a PyTorch-native inference acceleration framework for Diffusion Transformer (DiT) models. Supports FLUX, HunyuanVideo, WAN2.1, Qwen-Image and 70+ models with train-free caching and hybrid parallelism.


πŸ“š Learning Notes & Research

how-to-optim-algorithm-in-cuda

CUDA optimization notes covering kernels, CUTLASS/CuTe, Triton, CUDA-MODE course, LLM inference/training optimization (SGLang, vLLM, MoE, Flash Attention, etc.), and PyTorch internals.


πŸ“Š GitHub Stats

BBuf's github stats

Pinned Loading

  1. tvm_mlir_learn tvm_mlir_learn Public

    compiler learning resources collect.

    Python 2.7k 365

  2. how-to-optim-algorithm-in-cuda how-to-optim-algorithm-in-cuda Public

    how to optimize some algorithm in cuda.

    Cuda 2.8k 259

  3. sgl-project/sglang sgl-project/sglang Public

    SGLang is a high-performance serving framework for large language models and multimodal models.

    Python 24k 4.6k

  4. vipshop/cache-dit vipshop/cache-dit Public

    πŸ€— A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

    Python 1.1k 63