Popular repositories Loading
-
MegaQwen
MegaQwen Public🚀 Achieve faster Qwen3-0.6B inference with the MegaQwen CUDA megakernel, delivering 531 tok/s decode on RTX 3090—3.9x faster than HuggingFace.
Cuda
-
-
pogud.github.io
pogud.github.io Public🚀 Accelerate Qwen3-0.6B inference with MegaQwen, a custom CUDA megakernel achieving 531 tok/s on RTX 3090, 3.9x faster than existing frameworks.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.

