Listed here is the F1 score.
Low-Dy.: Low-Dynamic High-Dy.: High-Dynamic Multi-Sc.: Multi-Scene Multi-Su.: Multi-Subject
By default, this leaderboard is sorted by overall F1 score, with overall Recall score as a secondary sort key. To view other sorted results, please click on the corresponding cell.
Model | LLM Params |
Frames | Date | Overall (%) | Dynamic Element Type (%) | Visual Characteristic (%) | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Camera | Scene | Action | Attribute | Low-Dy. | High-Dy. | Multi-Sc. | Multi-Su. | ||||||
GPT-4o-0806
OpenAI |
- | 1/2 fps1* | 2024-08-06 | 58.5 | 61.3 | 66.4 | 48.0 | 57.8 | 58.2 | 58.7 | 58.1 | 55.5 | |
Gemini 1.5 Pro 002
|
- | 1/2 fps1* | 2024-05-24 | 57.4 | 60.7 | 63.3 | 46.3 | 56.0 | 58.7 | 56.7 | 57.0 | 53.3 | |
Gemini 1.5 Flash 002
|
- | 1/2 fps1* | 2024-05-24 | 55.7 | 59.6 | 65.1 | 42.9 | 55.2 | 56.0 | 55.5 | 55.9 | 55.9 | |
InternVL2-76B
Shanghai AI Lab |
72B | 32 | 2024-07-04 | 51.9 | 53.9 | 61.4 | 41.2 | 50.9 | 52.8 | 51.5 | 51.1 | 49.3 | |
Qwen2-VL-72B
Alibaba |
72B | 2 fps2* | 2024-08-30 | 51.7 | 54.0 | 52.8 | 42.6 | 48.5 | 55.7 | 49.7 | 48.0 | 43.3 | |
InternVL2-40B
Shanghai AI Lab |
34B | 32 | 2024-07-04 | 51.7 | 55.1 | 59.0 | 39.3 | 52.3 | 53.9 | 50.5 | 50.5 | 48.0 | |
MiniCPM-V 2.6
OpenBMB |
8B | 32 | 2024-08-06 | 51.7 | 56.0 | 60.6 | 38.8 | 50.2 | 53.0 | 51.0 | 51.7 | 49.0 | |
LLaVA-Video-7B
Bytedance & NTU S-Lab |
7B | 32 | 2024-09-30 | 51.0 | 50.4 | 58.9 | 37.8 | 53.1 | 52.2 | 50.3 | 50.0 | 45.8 | |
LLaVA-Video-72BSlowFast
Bytedance & NTU S-Lab |
72B | 32 | 2024-09-30 | 50.2 | 50.3 | 56.4 | 39.3 | 50.8 | 50.6 | 50.0 | 49.3 | 45.7 | |
LLaVA-OneVision-72B
Bytedance & NTU S-Lab |
72B | 32 | 2024-08-05 | 49.6 | 51.9 | 57.7 | 36.0 | 48.8 | 48.6 | 45.9 | 50.1 | 49.4 | |
LLaVA-OneVision-7B
Bytedance & NTU S-Lab |
7B | 32 | 2024-08-05 | 49.3 | 51.0 | 57.6 | 36.8 | 49.3 | 50.0 | 48.9 | 48.4 | 43.8 | |
InternVL2-26B
Shanghai AI Lab |
20B | 32 | 2024-07-04 | 49.0 | 51.6 | 58.7 | 37.0 | 49.1 | 49.4 | 48.9 | 48.4 | 45.8 | |
Qwen2-VL-7B
Alibaba |
7B | 2 fps2* | 2024-08-30 | 48.9 | 49.0 | 56.7 | 37.0 | 46.7 | 53.8 | 46.4 | 44.4 | 39.9 | |
Tarsier-34B
Bytedance |
34B | 32 | 2024-07-04 | 48.2 | 42.3 | 44.4 | 47.6 | 42.2 | 49.1 | 47.8 | 49.6 | 47.3 | |
Kangaroo
Meituan & UCAS |
8B | 32 | 2024-07-17 | 42.7 | 44.1 | 51.9 | 31.9 | 39.5 | 45.6 | 41.1 | 39.3 | 35.7 | |
InternVL2-8B
Shanghai AI Lab |
7B | 32 | 2024-07-04 | 40.8 | 41.7 | 44.7 | 30.0 | 42.3 | 44.5 | 38.9 | 38.4 | 35.2 | |
Tarsier-7B
Bytedance |
7B | 32 | 2024-07-04 | 38.6 | 34.8 | 33.1 | 36.2 | 33.3 | 46.5 | 34.5 | 35.8 | 33.2 | |
PLLaVA-34B
NUS & NYU & Bytedance |
34B | 16 | 2024-04-24 | 34.2 | 37.4 | 39.9 | 22.3 | 33.2 | 38.9 | 31.8 | 30.2 | 27.6 | |
LongVA
NTU S-Lab |
7B | 32 | 2024-06-24 | 31.8 | 32.5 | 40.6 | 22.0 | 28.4 | 37.3 | 29.0 | 27.6 | 23.7 | |
PLLaVA-13B
NUS & NYU & Bytedance |
13B | 16 | 2024-04-24 | 30.6 | 33.0 | 40.3 | 18.5 | 29.8 | 36.0 | 27.8 | 26.0 | 24.3 | |
PLLaVA-7B
NUS & NYU & Bytedance |
7B | 16 | 2024-04-24 | 27.4 | 28.9 | 36.6 | 16.5 | 25.3 | 32.7 | 24.7 | 22.8 | 22.5 |
Date: indicates the publication date of open-source models - indicates closed-source models
1* The videos are sampled at 2 fps when the video duration <16s, otherwise it is 1 fps.
2* The videos are sampled at 2 fps, and the upper limit is 64 frames.