continuous-batching

Star

Here are 5 public repositories matching this topic...

lumia431 / photon_infer

Star

A High-Performance LLM Inference Engine with vLLM-Style Continuous Batching

modern-cpp inference-engine ai-infra vllm llm-inference paged-attention continuous-batching

Updated Jan 2, 2026
C++

gty111 / gLLM

Star

gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling

pipeline-parallelism tensor-parallelism llm-serving llm-inference pagedattention continuous-batching qwen3 token-throttling chunked-prefill

Updated Jan 12, 2026
Python

maxime-dlabai / mlx-continuous-batching

Star

OpenAI-compatible server with continuous batching for MLX on Apple Silicon

macos inference text-generation mlx apple-silicon openai-api llm continuous-batching

Updated Dec 4, 2025
Python

Fork of OpenAI and Anthropic compatible server for Apple Silicon. Native MLX backend, 500+ tok/s. Run LLMs and vision-language models with continuous batching, MCP tool calling, and multimodal support.

inference-server mlx multimodal apple-silicon llm vllm local-ai continuous-batching tool-calling openai-compatible

Updated Feb 28, 2026
Python

nagababumo / Efficiently-Serving-LLMs

Star

batching lora quantization lorax low-rank-adaptation continuous-batching multi-lora

Updated Jun 19, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the continuous-batching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the continuous-batching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

continuous-batching

Here are 5 public repositories matching this topic...

lumia431 / photon_infer

gty111 / gLLM

maxime-dlabai / mlx-continuous-batching

swaylenhayes / vllm-mlx

nagababumo / Efficiently-Serving-LLMs

Improve this page

Add this topic to your repo