profile image

Fanheng Kong

Contact Me

I am a second-year Ph.D. student in Data Mining Group, at Northeastern University, advised by Prof. Shi Feng. NEU DM Group belongs to Data Science and Big Data Technology Group led by Prof. Ge Yu.

My current research interests include LLM & Agent Evaluation and Agentic Coding. Previously, I focused on Multimodal LLMs, including multimodal understanding, multimodal embeddings, and unified multimodal models.

Currently, I'm a research intern at Qwen Team, Alibaba Group. I have also interned at Kuaishou Technology.

If you are interested in my research, you are welcome to communicate or collaborate with me via my email. If you are exploring collaboration opportunities in agentic coding training and evaluation, please feel free to reach out.

News

  • [03/2026] Two papers are accepted to ACL 2026.
  • [05/2025] Two papers are accepted to ACL 2025 (Main).
  • [03/2025] One paper is accepted to SCI-FM@ICLR 2025 (Oral) [Best Paper Award].
  • [12/2024] One paper is accepted to Neurocomputing.
  • [05/2024] One paper is accepted to ACL 2024 (Main).
  • [02/2024] One paper is accepted to COLING 2024 (Main).

UNITEGitHub stars

Universal Multimodal Embeddings

TUNAGitHub stars

Temporal Video Understanding Evaluation

StickerConvGitHub stars

Multimodal Empathetic Dialogue Agent, Dataset and MLLM

S-MambaGitHub stars

Is Mamba Effective for Time Series Forecasting?

PICA GitHub stars

LLM for the Emotional Domain

Awesome MultimodalGitHub stars

Awesome Multimodal Papers

Publications

(* equal contribution, † corresponding author)
Accepted
TUNA

TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos

Fanheng Kong, Jingyuan Zhang, Hongzhi Zhang, Shi Feng, Daling Wang, Linhao Yu, Xingguang Ji, Yu Tian, Victoria W., Fuzheng Zhang

ACL 2025 (Main)   Project Page   Paper (ACL)   Paper (arXiv)   Code   🤗 Dataset   GitHub stars

TUNA

Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search

Linhao Yu, Xinguang Ji, Yahui Liu, Fanheng Kong, Chenxi Sun, Jingyuan Zhang, Hongzhi Zhang, V. W., Fuzheng Zhang, Deyi Xiong

ACL 2025 (Main)   Paper (ACL)   Paper (arXiv)   Code   🤗 Dataset   GitHub stars

Capybara-VL

Data Metabolism: An Efficient Data Design Schema For Vision Language Model

Jingyuan Zhang, Hongzhi Zhang, Zhou Haonan, Chenxi Sun, Xingguang Ji, Jiakang Wang, Fanheng Kong, Yahui Liu, Qi Wang, Fuzheng Zhang

ICLR 2025 Open Science for Foundation Models Workshop Best Paper Award.   Paper (arXiv)   Code   GitHub stars

S-Mamba

Is Mamba Effective for Time Series Forecasting?

Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, Yifei Zhang

Neurocomputing.   Paper (Neurocomputing)   Paper (arXiv)   Code   GitHub stars

PEGS

StickerConv: Generating Multimodal Empathetic Responses from Scratch

Yiqun Zhang*, Fanheng Kong*, Peidong Wang*, Shuang Sun, Lingshuai Wang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song

ACL 2024 (Main).   Project Page   Paper (ACL)   Paper (arXiv)   Code   GitHub stars

TIGER

TIGER: A Unified Generative Model Framework for Multimodal Dialogue Response Generation

Fanheng Kong, Peidong Wang, Shi Feng, Daling Wang, Yifei Zhang

COLING 2024.   Paper   Code   GitHub stars

Others
TUNA

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval

Fanheng Kong, Jingyuan Zhang, Yahui Liu, Hongzhi Zhang, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Victoria W., Fuzheng Zhang, Guorui Zhou

Project Page   Paper (arXiv)   Code   🤗 Model&Dataset   GitHub stars