Performance of LMMs on English tasks of public data
Rank | Method | LLM Size | Recognition | Referring | Spotting | Extraction | Parsing | Calculation | Understanding | Reasoning | Average |
1 | Llama Nemotron Nano VL 8B🥇 | 8B | 70.2 | 69.1 | 61.8 | 81.4 | 39.2 | 31.9 | 73.1 | 54.7 | 60.2 |
2 | InternVL3-14B🥈 | 14B | 67.3 | 36.9 | 11.2 | 89.0 | 38.4 | 38.4 | 79.2 | 60.5 | 52.6 |
3 | Gemini-Pro🥉 | - | 61.2 | 39.5 | 13.5 | 79.3 | 39.2 | 47.7 | 75.5 | 59.3 | 51.9 |
4 | Qwen2-VL-7B | 7B | 72.1 | 47.9 | 17.5 | 82.5 | 25.5 | 25.4 | 78.4 | 61.5 | 51.4 |
5 | InternVL2.5-26B | 26B | 65.6 | 26.1 | 1.6 | 86.9 | 36.2 | 37.4 | 78.3 | 62.9 | 49.4 |
6 | InternVL3-8B | 8B | 68.6 | 30.4 | 8.8 | 85.3 | 34.0 | 27.1 | 77.5 | 60.3 | 49.0 |
7 | Ovis2-8B | 7B | 73.2 | 24.6 | 0.7 | 62.4 | 44.8 | 40.6 | 72.7 | 62.6 | 47.7 |
8 | InternVL2-26B | 26B | 63.4 | 26.1 | 0.0 | 76.8 | 37.8 | 32.3 | 79.4 | 58.9 | 46.8 |
9 | Step-1V | - | 67.8 | 31.3 | 7.2 | 73.6 | 37.2 | 27.8 | 69.8 | 58.6 | 46.7 |
9 | Qwen2.5-VL-7B | 7B | 68.8 | 25.7 | 1.2 | 80.2 | 30.4 | 38.2 | 73.2 | 56.2 | 46.7 |
10 | GPT-4o | - | 61.2 | 26.7 | 0.0 | 77.5 | 36.3 | 43.4 | 71.1 | 55.5 | 46.5 |