EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions Paper • 2606.23654 • Published 11 days ago • 79
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 17 days ago • 63
JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence Paper • 2606.14777 • Published 23 days ago • 208
Draft-OPD: On-Policy Distillation for Speculative Draft Models Paper • 2605.29343 • Published May 28 • 36
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16, 2025 • 62
Post-Trained MoE Can Skip Half Experts via Self-Distillation Paper • 2605.18643 • Published May 18 • 30
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published May 13 • 165
MinT: Managed Infrastructure for Training and Serving Millions of LLMs Paper • 2605.13779 • Published May 13 • 223
Rethinking OPD Collection This collection includes the models used in the paper "Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recip • 5 items • Updated 30 days ago • 3
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published May 8 • 70
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL Paper • 2604.28123 • Published May 1 • 49
MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction Paper • 2604.27393 • Published Apr 30 • 81
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published May 7 • 116
MiA-Signature: Approximating Global Activation for Long-Context Understanding Paper • 2605.06416 • Published May 7 • 57
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113