#

rlvr

Here are 39 public repositories matching this topic...

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

rlhf agentic rlvr

Updated Mar 3, 2026
Python

AgentsMeetRL

thinkwee / AgentsMeetRL

Awesome List for Agentic RL

agent awesome-list multiagent reinforcement llm rlhf large-language-model tool-learning agentic-workflow agentic-ai agentic-coding rlvr llm-age

Updated Feb 27, 2026
HTML

pat-jj / s3

[EMNLP'25] s3 - ⚡ Efficient & Effective Search Agent Training via RL for RAG (RLVR for Search with Minimal Data)

information-retrieval efficiency verifier rag large-language-models search-agent gpt-5 agentic-ai rlvr

Updated Nov 5, 2025
Python

thuml / RLVR-World

Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934

text-game video-generation robotic-manipulation video-prediction web-agent real2sim world-model webarena video-gpt grpo verl rlvr reinforcement-learning-with-verifiable-rewards

Updated Oct 28, 2025
Python

InternLM / CapRL

[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

image-captioning multi-modal caption-generation llm vision-language-model large-vision-language-models grpo rlvr

Updated Feb 8, 2026
Python

WooooDyy / BAPO

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.

rl reasoning llm rlvr

Updated Jan 29, 2026
Python

Tencent-Hunyuan / GradLoc

Implementation of GradLoc from the Tencent Hunyuan blog "Stabilizing RLVR via Token-level Gradient Diagnosis and Layerwise Clipping".

gradient llm hunyuan rlvr

Updated Feb 16, 2026
Python

tongjingqi / Awesome-Agent-RL

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

agent awesome reinforcement-learning rl awesome-list llm reward-model agentic-ai rlvr agent-training

Updated Sep 1, 2025

teilomillet / retrain

a Python library that uses Reinforcement Learning (RL) to train LLMs.

mcp rl llm deepseek rlvr

Updated Mar 1, 2026
Python

osoleve / glitchlings

Enemies for your LLM

nlp linguistics adversarial-data-augmentation rlvr

Updated Jan 20, 2026
Python

sileod / reasoning_core

A RL env with procedurally generated symbolic reasoning data

logic dataset dataset-generation reasoning llm grpo verifiers rlvr

Updated Mar 3, 2026
Python

RUC-GSAI / YuLan-SwarmIntell

🐝 SwarmBench: Benchmarking LLMs' Swarm Intelligence

benchmark swarm swarm-intelligence kilobots swarm-robotics llms-benchmarking rlvr

Updated May 21, 2025
Python

smiles724 / DeepSearch

This is the official code of DeepSearch [ICLR 2026]

llm reasoning-language-models rlvr

Updated Oct 22, 2025
Python

ScalingIntelligence / kernelbench-tinker

Tinker ↔ KernelBench Integration enabling RL for GPU Kernel Generation

rl tinker rlvr rl-infra

Updated Feb 27, 2026
Python

zli12321 / free-form-grpo

grpo to train long form QA and instructions with long-form reward model

reinforcement-learning-algorithms evaluation-framework reward-design rl-training long-form-text-generation qwen2-5 grpo rlvr

Updated Jul 17, 2025
Python

purbeshmitra / MOTIF

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

reinforcement-learning llm-training rlvr

Updated Jul 6, 2025
Python

HKUST-KnowComp / Reasoning-Embedding

The official repository of the paper "Do Reasoning Models Enhance Embedding Models?"

representation-learning manifold embedding reasoning rlvr

Updated Feb 20, 2026
Python

Miaow-Lab / RLVR-Linearity

[arXiv] "Not All Steps are Informative: On the Linearity of LLMs’ RLVR Training"

llm-reasoning grpo rlvr

Updated Feb 1, 2026
Python

purbeshmitra / semantic-soft-bootstrapping

A self-distillation based training method for long context reasoning in a single LLM without reinforcement learning

knowledge-distillation kl-divergence llm-training rlvr

Updated Jan 29, 2026
Python

LokaHQ / Trinity-Mini-DrugProt-Think

Trinity-Mini-DrugProt-Think

bioinformatics protein drug-discovery post-training llm rlvr

Updated Feb 23, 2026
HTML

Improve this page

Add a description, image, and links to the rlvr topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the rlvr topic, visit your repo's landing page and select "manage topics."