Brick-Composer: Using MLLMs for Assembly with Diverse Bricks Paper • 2606.05445 • Published 14 days ago • 7
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints Paper • 2606.05622 • Published 13 days ago • 40
Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues Paper • 2606.02754 • Published 15 days ago • 13
Advancing Creative Physical Intelligence in Large Multimodal Models Paper • 2605.26396 • Published 23 days ago • 19
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 110
CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing Paper • 2605.02910 • Published May 6 • 22
PEARL: Self-Evolving Assistant for Time Management with Reinforcement Learning Paper • 2601.11957 • Published Jan 28 • 3
NarrativeTrack: Evaluating Video Language Models Beyond the Frame Paper • 2601.01095 • Published Jan 3 • 8
Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Paper • 2602.21320 • Published Feb 24 • 12
AgentDoG Collection A Diagnostic Guardrail Framework for AI Agent Safety and Security • 12 items • Updated May 12 • 112
Multimodal Policy Internalization for Conversational Agents Paper • 2510.09474 • Published Oct 10, 2025 • 5
Where LLM Agents Fail and How They can Learn From Failures Paper • 2509.25370 • Published Sep 29, 2025 • 12
UserRL: Training Interactive User-Centric Agent via Reinforcement Learning Paper • 2509.19736 • Published Sep 24, 2025 • 12
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts Paper • 2509.04500 • Published Sep 2, 2025 • 5
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination Paper • 2502.16143 • Published Feb 22, 2025 • 6