Performance of LMMs on English tasks
Rank | Method | Venue | Open-source | LLM Size | Average | Recognition | Referring | Spotting | Extraction | Parsing | Calculation | Understanding | Reasoning |
1 | Gemini-2.5-Pro🥇 | - | No | - | 59.3 | 70.9 | 45.8 | 13.4 | 93.7 | 26.9 | 84.6 | 75.8 | 63.0 |
2 | Llama-3.1-Nemotron-Nano-VL-8B-V1🥈 | - | Yes | 8B | 56.4 | 62.9 | 61.3 | 68.6 | 88.2 | 10.0 | 44.1 | 75.3 | 41.0 |
3 | Gemini1.5-Pro🥉 | Arxiv 2024 | Yes | No | 51.6 | 59.1 | 41.2 | 6.6 | 89.5 | 22.4 | 54.7 | 78.8 | 60.3 |
4 | GPT-4o | Arxiv 2024 | No | - | 47.6 | 58.6 | 23.4 | 0.0 | 87.4 | 23.1 | 51.6 | 74.4 | 62.3 |
5 | Claude3.5-sonnet | - | No | - | 47.5 | 52.9 | 24.9 | 2.5 | 86.9 | 23.8 | 61.4 | 74.4 | 53.0 |
6 | Step-1V | - | No | - | 46.8 | 56.7 | 27.4 | 2.6 | 86.3 | 33.3 | 42.6 | 76.6 | 48.7 |
7 | InternVL3-14B | - | Yes | 14B | 46.8 | 55.8 | 24.5 | 2.1 | 89.3 | 21.0 | 59.5 | 72.0 | 50.0 |
8 | Ovis2-8B | - | Yes | 7B | 46.1 | 54.2 | 20.9 | 0.0 | 83.6 | 24.2 | 54.7 | 74.1 | 57.3 |
9 | InternVL3-8B | - | Yes | 8B | 45.3 | 49.7 | 22.3 | 0.2 | 86.8 | 22.4 | 57.0 | 70.7 | 53.0 |
10 | GPT-4o-mini | - | No | - | 44.1 | 55.3 | 21.8 | 0.0 | 85.4 | 20.6 | 45.2 | 75.5 | 49.0 |
11 | SAIL-VL-1.6-8B | Arxiv 2025 | Yes | 8B | 43.1 | 56.7 | 24.1 | 2.2 | 79.3 | 22.8 | 45.4 | 69.2 | 45.3 |
12 | InternVL2.5-26B | Arxiv 2024 | Yes | 20B | 42.6 | 53.5 | 21.4 | 0 | 84.0 | 21.4 | 51.5 | 67.5 | 41.5 |
13 | Qwen2-Vl-7B | Arxiv 2024 | Yes | 8B | 42.3 | 47.0 | 42.0 | 1.5 | 90.2 | 13.7 | 36.4 | 71.1 | 36.6 |
14 | Qwen2.5-VL-7B | Arxiv 2025 | Yes | 8B | 41.8 | 51.5 | 24.5 | 3.1 | 64.8 | 13.1 | 53.3 | 78.6 | 45.5 |
15 | InternVL2-26B | SCIS 2024 | Yes | 20B | 41.8 | 56.0 | 21.2 | 0 | 80.5 | 23.9 | 40.3 | 72.1 | 40.7 |
16 | MiniCPM-o-2.6 | - | Yes | 8B | 41.6 | 54.1 | 24.7 | 0.3 | 74.4 | 17.6 | 39.2 | 75.7 | 47.0 |
17 | Deepseek-VL2-Small | Arxiv 2024 | Yes | 16B | 41.0 | 56.6 | 23.7 | 0 | 86.4 | 18.9 | 30.6 | 72.2 | 39.5 |
18 | InternVL2.5-8B | Arxiv 2024 | Yes | 8B | 40.5 | 48.9 | 21.2 | 0 | 82.1 | 20.3 | 41.2 | 67.8 | 42.3 |
19 | Pixtral-12B | Arxiv 2024 | Yes | 12B | 38.4 | 45.1 | 21.8 | 0 | 71.6 | 21.7 | 30.4 | 77.3 | 39.5 |
20 | Phi-4-MultiModal | Arxiv 2025 | Yes | 5.6B | 38.1 | 58.4 | 19.0 | 0 | 53.5 | 38.7 | 28.7 | 66.8 | 39.8 |
21 | Ovis1.6-3B | Arxiv 2024 | Yes | 3B | 38.0 | 48.5 | 19.5 | 0 | 69.2 | 20.7 | 22.1 | 74.6 | 49.5 |
22 | GLM-4v-9B | Arxiv 2024 | Yes | 9B | 37.1 | 52.7 | 20.6 | 0 | 79.4 | 15.9 | 21.5 | 74.7 | 32.0 |
23 | InternVL2-8B | SCIS 2024 | Yes | 8B | 36.1 | 43.0 | 21.6 | 0 | 70.2 | 19.2 | 35.6 | 65.9 | 33.6 |
24 | Molmo-7B | CVPR 2025 | Yes | 8B | 33.9 | 40.8 | 19.5 | 0 | 51.7 | 10.0 | 33.9 | 67.0 | 48.0 |
25 | XComposer2-4KHD | NIPS 2025 | Yes | 7B | 33.9 | 39.5 | 12.0 | 0 | 69.7 | 26.0 | 20.2 | 68.2 | 35.8 |
26 | LLaVA-OV-7B | Arxiv 2024 | Yes | 8B | 33.7 | 45.4 | 18.5 | 0 | 60.0 | 15.5 | 32.0 | 59.0 | 39.3 |
27 | MiniCPM-V-2.6 | Arxiv 2024 | Yes | 8B | 33.0 | 52.2 | 18.6 | 0.3 | 45.8 | 19.6 | 20.9 | 68.9 | 37.3 |
28 | Cambrian-1-8B | NIPS 2025 | Yes | 8B | 32.3 | 44.0 | 19.0 | 0 | 52.3 | 19.0 | 20.7 | 64.0 | 39.3 |
29 | Kimi-VL-A3B-16B | Arxiv 2025 | Yes | 16B | 32.1 | 49.1 | 13.5 | 0 | 28.8 | 21.9 | 37.6 | 69.4 | 36.2 |
30 | LLaVA-Next-8B | - | Yes | 8B | 28.5 | 41.4 | 17.0 | 0 | 49.0 | 12.9 | 16.1 | 60.9 | 30.5 |
31 | Idefics3-8B | NeurIPS 2024 Workshop | Yes | 8B | 26.0 | 37.4 | 13.0 | 0 | 28.9 | 19.4 | 21.1 | 65.4 | 21.8 |
32 | Eagle-X5-7B | ICLR 2025 | Yes | 8B | 25.7 | 34.6 | 18.5 | 0 | 9.7 | 18.5 | 24.0 | 63.1 | 37.0 |
33 | Qwen-VL-chat | Arxiv 2023 | Yes | 8B | 25.7 | 34.1 | 12.6 | 0.1 | 42.6 | 19.5 | 18.4 | 58.3 | 20.3 |
34 | Qwen-VL | Arxiv 2023 | Yes | 8B | 24.8 | 35.9 | 4.2 | 0 | 38.7 | 28.5 | 13.8 | 60.1 | 16.9 |
35 | Deepseek-VL-7B | Arxiv 2024 | Yes | 7B | 24.5 | 33.5 | 13.7 | 0 | 19.1 | 11.7 | 24.8 | 60.5 | 32.5 |
36 | Monkey | CVPR 2024 | Yes | 8B | 24.2 | 31.5 | 0.1 | 0 | 34.4 | 26.3 | 17.7 | 61.4 | 22.4 |
37 | DocOwl2 | Arxiv 2024 | Yes | 7B | 23.4 | 25.4 | 7.5 | 0 | 47.1 | 26.2 | 8.3 | 52.8 | 19.5 |
38 | TextMonkey | Arxiv 2024 | Yes | 8B | 23.4 | 39.8 | 1.6 | 0 | 27.6 | 24.8 | 10.2 | 62.3 | 21.2 |
39 | VILA1.5-8B | CVPR 2024 | Yes | 8B | 23.2 | 36.0 | 14.5 | 0 | 26.0 | 17.4 | 20.3 | 44.7 | 27.0 |
40 | EMU2-chat | CVPR 2024 | Yes | 37B | 20.2 | 34.3 | 0 | 0 | 20.4 | 21.3 | 20.3 | 47.1 | 18.3 |
41 | CogVLM-chat | NIPS 2024 | Yes | 7B | 19.9 | 40.8 | 0 | 0 | 1.6 | 18.6 | 10.9 | 60.2 | 26.8 |
42 | Yi-VL-6B | Arxiv 2024 | Yes | 6B | 19.7 | 31.1 | 4.0 | 0 | 23.4 | 22.5 | 18.1 | 43.0 | 15.5 |
43 | mPLUG-Owl3 | Arxiv 2024 | Yes | 8B | 16.5 | 34.9 | 17.0 | 0 | 12.0 | 14.9 | 24.1 | 50.7 | 25.5 |
44 | Janus-1.3B | CVPR 2025 | Yes | 1.3B | 14.3 | 32.6 | 0 | 0 | 12.0 | 14.9 | 24.1 | 50.7 | 25.5 |
45 | UReader | EMNLP finding 2023 | Yes | 7B | 14.1 | 20.9 | 0 | 0 | 0 | 20.7 | 11.3 | 39.0 | 20.8 |
46 | LLaVAR | Arxiv 2023 | Yes | 13B | 12.4 | 13.8 | 0 | 0 | 8.3 | 15.2 | 4.4 | 42.4 | 15.0 |
Performance of LMMs on Chinese tasks
Rank | Method | Venue | Open-source | LLM Size | Average | Recognition | Extraction | Parsing | Understanding | Reasoning |
1 | Gemini-2.5-Pro🥇 | - | No | - | 62.2 | 72.0 | 74.0 | 35.2 | 90.0 | 39.7 |
2 | Ovis2-8B🥈 | - | Yes | 7B | 56.0 | 61.0 | 67.7 | 43.6 | 82.0 | 25.6 |
3 | Gemini1.5-Pro🥉 | Arxiv 2024 | No | - | 55.5 | 71.4 | 63.8 | 30.5 | 82.0 | 29.9 |
4 | Kimi-VL-A3B-16B | Arxiv 2025 | Yes | 16B | 54.1 | 54.0 | 71.1 | 32.5 | 84.0 | 28.7 |
5 | Step-1V | - | No | - | 53.4 | 65.2 | 64.9 | 33.1 | 78.0 | 25.5 |
6 | InternVL3-14B | - | Yes | 14B | 52.8 | 62.1 | 59.5 | 33.2 | 80.0 | 29.2 |
7 | GLM-4v-9B | Arxiv 2024 | Yes | 9B | 51.7 | 60.6 | 65.2 | 32.4 | 82.0 | 18.2 |
8 | Qwen2.5-VL-7B | Arxiv 2025 | Yes | 8B | 49.5 | 24.4 | 78.9 | 33.1 | 82.0 | 29.0 |
9 | InternVL3-8B | - | Yes | 8B | 49.0 | 57.7 | 55.8 | 29.9 | 72.0 | 29.4 |
10 | Claude3.5-sonnet | - | No | - | 48.4 | 34.2 | 62.5 | 35.2 | 78.0 | 32.2 |
11 | DeepSeek-VL2-Small | Arxiv 2024 | Yes | 16B | 48.1 | 51.6 | 56.3 | 27.8 | 79.6 | 25.3 |
12 | MiniCPM-V-2.6 | Arxiv 2024 | Yes | 8B | 47.7 | 53.1 | 53.2 | 32.8 | 76.0 | 23.4 |
13 | MiniCPM-o-2.6 | - | Yes | 8B | 47.7 | 54.0 | 62.4 | 24.1 | 68.0 | 29.8 |
14 | GPT-4o | Arxiv 2024 | No | - | 45.7 | 41.7 | 52.1 | 29.0 | 76.0 | 29.4 |
15 | Qwen2-Vl-7B | Arxiv 2024 | Yes | 8B | 44.7 | 23.7 | 63.5 | 27.9 | 80.0 | 28.5 |
16 | InternVL2.5-8B | Arxiv 2024 | Yes | 8B | 42.8 | 42.8 | 47.9 | 27.3 | 80.0 | 23.5 |
17 | SAIL-VL-1.6-8B | Arxiv 2025 | Yes | 8B | 42.6 | 35.8 | 41.5 | 35.7 | 76.0 | 23.9 |
18 | InternVL2.5-26B | Arxiv 2024 | Yes | 20B | 41.9 | 40.2 | 42.7 | 25.6 | 74.0 | 27.0 |
19 | InternVL2-8B | SCIS 2024 | Yes | 8B | 41.3 | 35.2 | 42.8 | 26.1 | 78.0 | 24.4 |
20 | Llama-3.1-Nemotron-Nano-VL-8B-V1 | - | Yes | 8B | 40.1 | 38.2 | 54.9 | 26.6 | 66.0 | 14.8 |
21 | InternVL2-26B | SCIS 2024 | Yes | 20B | 38.1 | 20.4 | 50.7 | 29.0 | 76.0 | 14.5 |
22 | GPT-4o-mini | - | No | - | 37.4 | 20.0 | 53.6 | 27.9 | 66.0 | 19.6 |
23 | Phi-4-MultiModal | Arxiv 2025 | Yes | 5.6B | 37.3 | 30.5 | 40.5 | 42.7 | 56.0 | 16.9 |
24 | XComposer2-4KHD | NIPS 2025 | Yes | 8B | 32.4 | 12.9 | 38.6 | 37.5 | 60.0 | 13.1 |
25 | Ovis1.6-3B | Arxiv 2024 | Yes | 3B | 31.7 | 22.5 | 33.3 | 31.5 | 54.0 | 17.0 |
26 | Monkey | CVPR 2024 | Yes | 8B | 21.5 | 1.5 | 28.4 | 29.1 | 40.0 | 8.3 |
27 | TextMonkey | Arxiv 2024 | Yes | 8B | 21.5 | 10.5 | 15.2 | 30.2 | 44.0 | 7.6 |
28 | Cambrian-1-8B | NIPS 2025 | Yes | 8B | 18.5 | 2.4 | 19.8 | 26.7 | 36.0 | 7.6 |
29 | LLaVA-OV-7B | Arxiv 2024 | Yes | 8B | 17.4 | 5.4 | 13.6 | 20.3 | 34.0 | 13.6 |
30 | mPLUG-Owl3 | Arxiv 2024 | Yes | 8B | 16.5 | 1.6 | 27.4 | 27.3 | 16.0 | 10.0 |
31 | Pixtral-12B | Arxiv 2024 | Yes | 12B | 16.0 | 6.2 | 22.3 | 11.4 | 26.0 | 14.0 |
32 | Qwen-VL-chat | Arxiv 2023 | Yes | 8B | 16.5 | 9.1 | 3.6 | 18.9 | 44.0 | 7.1 |
33 | Idefics3-8B | NeurIPS 2024 Workshop | Yes | 8B | 15.6 | 2.9 | 29.0 | 12.3 | 26.0 | 7.9 |
34 | Qwen-VL | Arxiv 2023 | Yes | 8B | 15.6 | 4.3 | 0 | 30.6 | 38.0 | 5.1 |
35 | Molmo-7B | CVPR 2025 | Yes | 8B | 15.0 | 3.4 | 29.8 | 6.6 | 24.0 | 11.1 |
36 | DocOwl2 | Arxiv 2024 | Yes | 7B | 14.4 | 1.0 | 17.8 | 29.4 | 20.0 | 3.9 |
37 | Deepseek-VL-7B | Arxiv 2024 | Yes | 7B | 13.7 | 3.2 | 14.7 | 10.7 | 30.0 | 9.8 |
38 | CogVLM-chat | NIPS 2024 | Yes | 7B | 12.8 | 2.4 | 16.2 | 22.5 | 20.0 | 3.1 |
39 | Eagle-X5-7B | ICLR 2025 | Yes | 8B | 12.3 | 1.9 | 16.1 | 13.6 | 22.0 | 8.1 |
40 | VILA1.5-8B | CVPR 2024 | Yes | 8B | 11.0 | 1.4 | 9.1 | 22.2 | 16.0 | 6.4 |
41 | Yi-VL-6B | Arxiv 2024 | Yes | 6B | 10.4 | 1.6 | 6.4 | 28.8 | 10.0 | 5.3 |
42 | LLaVA-Next-8B | - | Yes | 8B | 9.2 | 2.8 | 0.9 | 14.9 | 20.0 | 7.4 |
43 | UReader | EMNLP finding 2023 | Yes | 7B | 9.0 | 0.3 | 2.0 | 28.1 | 12.0 | 2.4 |
44 | LLaVAR | Arxiv 2023 | Yes | 13B | 8.6 | 2.2 | 2.0 | 27.1 | 10.0 | 1.9 |
45 | EMU2-chat | CVPR 2024 | Yes | 37B | 8.2 | 1.2 | 3.0 | 29.3 | 4.0 | 3.6 |
46 | Janus-1.3B | CVPR 2025 | Yes | 1.3B | 7.5 | 4.1 | 2.2 | 10.4 | 14.0 | 6.7 |
We aim to update this benchmark every quarter. We sincerely welcome community contributions. If you have open-source models on Hugging Face or accessible APIs, sharing them with us would greatly help improve and expand the leaderboard. You can contact us at: ling_fu@hust.edu.cn
We have observed that some methods adopt absolute encoding for prompt inputs when tackling specialized tasks. For example, Qwen2.5VL uses a format like {"bbox_2d": [x1, y1, x2, y2], "text_content": "xxx"} for text spotting. After modifying the prompt accordingly, Qwen2.5VL-7B achieved a text spotting score of 51.6 on public data, showing a significant improvement compared to the default prompt currently used in OCRBench v2. We encourage you to share the evaluation results using prompts adapted to your model's input format. This will help us further improve and refine the leaderboard.