top of page

June 26, 2025
Our paper "SiLVR: A Simple Language-based Video Reasoning Framework" won the first place award at Multi-Discipline Lecture Understanding at CVPR 2025 Multimodal Video Agent Workshop. Congrats, Ce and Yan-Bo!

June 14, 2025
Emon's paper "Video ReCap: Recursive Captioning of Hour-Long Videos" won the Distinguished Egocentric Vision Paper Award at CVPR 2025 EgoVis workshop. Congrats, Emon!

June 14, 2025
Emon's developed state-space video model, BIMBA, won the first place in the EgoSchema challenge in CVPR 2025.

June 13, 2025
Our group presented four papers at CVPR 2025 in Nasvhille, TN.

June 12, 2025
Wrapped up with the two of our organized workshops, Transformers for Vision and Multimodal Video Agents, at CVPR 2025

June 11, 2025
Presented "SiLVR: A Simple Language-based Video Reasoning Framework" at the LOVEU workshop in CVPR 2025 in Nashville, TN.

June 3, 2025
Gave a talk "From Cooking to Basketball: Skill Learning in the Age of AI" at UNC's Friday Center.

March 24, 2025
Received SONY Focused Research Award. Thank you for supporting our research, SONY!

March 12, 2025
Gave a talk on sports-based video understanding using multimodal large language models at the University of Tubingen.

February 26, 2025
4 papers accepted to CVPR. Congrats Yulu, Mohaiminul, Tanveer and Ziyang!

December 20, 2024
We will be organizing our 4th Transformers for Vision workshop and a workshop on Multimodal Video Agents at CVPR 2025.

December 14, 2024
Gave an invited talk "Complex Video Understanding using Language ... and Video" in the first Video-Language Models workshop at NeurIPS 2024 in Vancouver.

October 29, 2024
Two papers accepted to WACV 2025. Congrats to Yan-Bo and Feng!

September 20, 2024
Our LLoVi paper was accepted to EMNLP 2024. Congrats, Ce!

September 12, 2024
Our proposed project on fine-grained object and activity localization in long videos was funded by Laboratory for Analytic Sciences (LAS) at NC State. Thanks for supporting our research!

July 10, 2024
Feng successfully defended his dissertation titled "Efficient Unimodal and Multimodal Video Understanding". Congrats, Feng!

July 1, 2024
Four of our papers were accepted to ECCV 2024. Congrats Mohaiminul, Yan-Bo, Feng and Tanveer!

June 19, 2024
Our group presented 3 papers at CVPR 2024 in Seattle.

June 18, 2024
We successfully wrapped up with our Transformers for Vision (T4V) workshop at CVPR 2024.

June 15, 2024
The Carolina Arts & Sciences magazine published a story on our research to develop AI models for video understanding.

March 12, 2024
Gave a talk at UNC’s School of Data Science and Society Generative AI Workshop.

February 16, 2024
The Well published an article on our work of using computer vision models to help Parkinson's Disease patients.

December 24, 2023
Our Transformers for Vision (T4V) workshop was accepted to CVPR 2023. Stay tuned for the updates!

November 30, 2023
Gave an invited talk on parameter-efficient text-to-video retrieval at SONY.

November 29, 2023
Announcing our participation in the Ego-Exo4D project, a collaboration between 14 universities and Meta to create a multimodal, multiview dataset to enhance AI's understanding of human skill.
bottom of page