profile image

Fanheng Kong

Contact Me

I am a first year (2024-now) Ph.D. in Data Mining group, at Northeastern University, advised by Prof. Shi Feng and Prof. Darling Wang. NEU DM group belongs to Data Science and Big Data Technology Group led by Prof. Ge Yu.

My research interests are Multimodal Understanding and Generation, Multimodal LLM.

Currently, I'm an intern at Kuaishou Technology.

Previously, I got my undergraduate degree from the Department of Artificial Intelligence at Northeastern University and had postgraduate recommendation to our school.

If you are interested in my researches, you are welcome to communicate or collaborate with me via my email.

News

  • [05/2025] Two paper is accepted to ACL 2025 (Main) (CCF-A).
  • [03/2025] One paper is accepted to SCI-FM@ICLR 2025 (Oral) Best Paper Award.
  • [12/2024] One paper is accepted to Neurocomputing (JCR Q1).
  • [05/2024] One paper is accepted to ACL 2024 (Main) (CCF-A).
  • [02/2024] One paper is accepted to COLING 2024 (Main) (CCF-B).

UNITEGitHub stars

Universal Multimodal Embeddings

TUNAGitHub stars

Temporal Video Understanding Evaluation

StickerConvGitHub stars

Multimodal Empathetic Dialogue Agent, Dataset and MLLM

S-MambaGitHub stars

Is Mamba Effective for Time Series Forecasting?

PICA GitHub stars

LLM for the Emotional Domain

Awesome MultimodalGitHub stars

Awesome Multimodal Papers

Publications

(* equal contribution, † corresponding author)
Accepted
TUNA

TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos

Fanheng Kong, Jingyuan Zhang, Hongzhi Zhang, Shi Feng, Daling Wang, Linhao Yu, Xingguang Ji, Yu Tian, Victoria W., Fuzheng Zhang

ACL 2025 (Main)   Project Page   Paper (arXiv)   Code   🤗 Dataset   GitHub stars

TUNA

Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search

Linhao Yu, Xinguang Ji, Yahui Liu, Fanheng Kong, Chenxi Sun, Jingyuan Zhang, Hongzhi Zhang, V. W., Fuzheng Zhang, Deyi Xiong

ACL 2025 (Main)   Paper (arXiv)   Code   GitHub stars

Capybara-VL

Data Metabolism: An Efficient Data Design Schema For Vision Language Model

Jingyuan Zhang, Hongzhi Zhang, Zhou Haonan, Chenxi Sun, Xingguang Ji, Jiakang Wang, Fanheng Kong, Yahui Liu, Qi Wang, Fuzheng Zhang

ICLR 2025 First Workshop on Open Science for Foundation Models Best Paper.   Paper (arXiv)   Code   GitHub stars

S-Mamba

Is Mamba Effective for Time Series Forecasting?

Zihan Wang, Fanheng Kong, Shi Feng, Ming Wang, Xiaocui Yang, Han Zhao, Daling Wang, Yifei Zhang

Neurocomputing.   Paper (Neurocomputing)   Paper (arXiv)   Code   GitHub stars

PEGS

StickerConv: Generating Multimodal Empathetic Responses from Scratch

Yiqun Zhang*, Fanheng Kong*, Peidong Wang*, Shuang Sun, Lingshuai Wang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song

ACL 2024 (Main).   Project Page   Paper (ACL)   Paper (arXiv)   Code   GitHub stars

TIGER

TIGER: A Unified Generative Model Framework for Multimodal Dialogue Response Generation

Fanheng Kong, Peidong Wang, Shi Feng, Daling Wang, Yifei Zhang

COLING 2024 (Main).   Paper   Code   GitHub stars

Others
TUNA

Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval

Fanheng Kong, Jingyuan Zhang, Yahui Liu, Hongzhi Zhang, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Victoria W., Fuzheng Zhang, Guorui Zhou

Project Page   Paper (arXiv)   Code   🤗 Model&Dataset   GitHub stars