15 66 32

bohan zeng

zbhpku

AI & ML interests

None yet

Recent Activity

upvoted a paper about 12 hours ago

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

upvoted a paper 1 day ago

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

upvoted a paper 7 days ago

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

View all activity

Organizations

None yet

upvoted a paper about 12 hours ago

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Paper • 2606.02564 • Published 2 days ago • 22

upvoted a paper 1 day ago

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

Paper • 2605.31336 • Published 5 days ago • 9

upvoted a paper 7 days ago

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Paper • 2605.26244 • Published 9 days ago • 38

upvoted a paper 12 days ago

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Paper • 2605.22012 • Published 13 days ago • 46

submitted a paper to Daily Papers 12 days ago

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Paper • 2605.22012 • Published 13 days ago • 46

upvoted a paper 14 days ago

Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos

Paper • 2605.18984 • Published 16 days ago • 22

upvoted a paper 19 days ago

VGGT-Edit: Feed-forward Native 3D Scene Editing with Residual Field Prediction

Paper • 2605.15186 • Published 20 days ago • 26

upvoted 2 papers 20 days ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Paper • 2605.10780 • Published 22 days ago • 33

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Paper • 2605.13062 • Published 21 days ago • 33

liked a dataset about 1 month ago

KlingTeam/HM-World

Updated Apr 22 • 655 • 7

upvoted a paper about 1 month ago

SketchVLM: Vision language models can annotate images to explain thoughts and guide users

Paper • 2604.22875 • Published Apr 23 • 35

upvoted 2 papers about 2 months ago

KAT-Coder-V2 Technical Report

Paper • 2603.27703 • Published Mar 29 • 12

OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation

Paper • 2604.11804 • Published Apr 13 • 72

commented 2 papers about 2 months ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203 •

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203 •

authored 5 papers about 2 months ago

RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark

Paper • 2509.24897 • Published Sep 29, 2025 • 46

Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks

Paper • 2510.19195 • Published Oct 22, 2025 • 11

DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Paper • 2601.19267 • Published Jan 27

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 50

bohan zeng

AI & ML interests

Recent Activity

Organizations

zbhpku's activity