Bootstrap
My HomePage

Selected Publication


LLaVA-OneVision: Easy Visual Task Transfer

Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li

project page / paper / code / Demo

ArXiv:2408.03326


Long Context Transfer from Language to Vision

Peiyuan Zhang*, Kaichen Zhang*, Bo Li*, Guangtao Zeng, Jingkang Yang, Yuanhan Zhang, Ziyue Wang, Haoran Tan, Chunyuan Li, Ziwei Liu

project page / paper / code / Demo

ArXiv:2406.16852


LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Kaichen Zhang*, Bo Li*, Peiyuan Zhang*, Fanyi Pu*, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu

project page / paper / code /

ArXiv:2407.12772


Llava-next: Stronger llms supercharge multimodal capabilities in the wild

Bo Li, Kaichen Zhang, Hao Zhang, Dong Guo, Renrui Zhang, Feng Li, Yuanhan Zhang, Ziwei Liu, Chunyuan Li

project page / code /

Technical Blog


WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu

project page / paper / code /

ArXiv:2405.03272