Selected Publication
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models
Kaichen Zhang, Yifei Shen, Bo Li, Ziwei Liu
paper / code / Collections
ArXiv:2411.14982, International Conference on Computer Vision 2025
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Jinjie Ni, Yifan Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, Michael Shieh
project page / paper / code / Demo
ArXiv:2410.13754, International Conference on Learning Representations 2025
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li
project page / paper / code / Demo
ArXiv:2408.03326, Transactions on Machine Learning Research
Long Context Transfer from Language to Vision
Peiyuan Zhang*, Kaichen Zhang*, Bo Li*, Guangtao Zeng, Jingkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu
project page / paper / code / Demo
ArXiv:2406.16852, Transactions on Machine Learning Research
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Kaichen Zhang*, Bo Li*, Peiyuan Zhang*, Fanyi Pu*, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu
project page / paper / code /
ArXiv:2407.12772, North American Association for Computational Linguistics 2025 Findings
Llava-next: Stronger llms supercharge multimodal capabilities in the wild
Bo Li, Kaichen Zhang, Hao Zhang, Dong Guo, Renrui Zhang, Feng Li, Yuanhan Zhang, Ziwei Liu, Chunyuan Li
project page / code /
Technical Blog
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning
Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu
project page / paper / code /
ArXiv:2405.03272