This repository contains the artifact for our ASPLOS 2026 paper: "RedFuser: An Automatic Operator Fusion Framework for Cascaded Reductions on AI Accelerators".
RedFuser is a novel framework for optimizing cascaded reductions in deep learning compilers. Built on top of Apache TVM, RedFuser introduces a series of compiler transformation passes that enable efficient fusion of reduction operations with other computations, particularly targeting modern GPU architectures.
- [2026-01]: RedFuser is now avaliable with flash-attention example.
- [2025-11]: 🎉RedFuser is accepted by ASPLOS 2026!
- flash-attention
- flash-decoding
- moe-routing
- fp8 quant+gemm
Please follow https://tvm.apache.org/docs/install/index.html to install.
For flash-attention example, see python/tvm/redfuser/example/flash_attention.py.
redfuser/
├── python/tvm/redfuser/ # RedFuser Python implementation
│ ├── transform/ # Core transformation passes
│ └── example/ # Example workloads
│ ...
RedFuser is licensed under the Apache License 2.0.
This project builds upon Apache TVM. We thank the TVM community for their excellent infrastructure and support.