Brick-Composer: Using MLLMs for Assembly with Diverse Bricks Paper • 2606.05445 • Published 13 days ago • 7
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 12 days ago • 40
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 14 days ago • 13
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published 22 days ago • 19
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published 22 days ago • 19
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 110
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 22
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 22
PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning Paper • 2601.11957 • Published Jan 28 • 3
NarrativeTrack: Evaluating Video Language Models Beyond the Frame Paper • 2601.01095 • Published Jan 3 • 8
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Paper • 2602.21320 • Published Feb 24 • 12
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 12 items • Updated May 12 • 112