top of page
UNC_Zoom_Backgrounds_C.jpg
gb.jpg

Gedas Bertasius

Assistant Professor

x-social-media-black-icon.png

I am an Assistant Professor in the Computer Science department at the University of North Carolina, Chapel Hill. Before joining UNC, I was a postdoctoral researcher at Facebook AI Research (FAIR) working with Lorenzo Torresani. I finished my Ph.D. at the University of Pennsylvania, advised by Jianbo Shi, and my undergraduate degree at Dartmouth College.

Research

I lead the Multimodal Video Perception (MVP) group at UNC. We develop foundational models for multimodal video understanding, enabling machines to comprehend, reason about, and interact with complex video, audio, and language data. Moving beyond perception, we ask: what spatiotemporal abstractions are needed for AI to truly grasp complex human behaviors over long horizons? Representative projects include TimeSformer, Video ReCap, LLoVi, BIMBA, VideoTree.

Video Recognition

Developing spatiotemporal models for automatic video analysis (e.g., TimeSformer, ViS4mer)

Multimodal AI

Building models that can learn from video, audio, and text (e.g., Video ReCap, LLoVi, BIMBA).

Perceptual Agents

Sports Analytics

Developing AI models that can assist people with daily tasks and skill learning (e.g., VidAssist, Ego-Exo4D, and ExAct).

Video for Robotics

Elevating strategic insights using state-of-the-art multimodal video models (e.g., BASKET).

Generative Video Modeling

Translating visual inputs into effective real-world actions (e.g., BOSS, ReBot, and ARCADE)

Enabling applications such as video-to-music generation and audio-visual editing (e.g., VMAs and AvED)

Recent News

Contact

Selected Projects

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang, Gedas Bertasius, Lorenzo Torresani

      CVPR 2025 (1st Place Winner at CVPR 2025 Ego4D EgoSchema Challenge)

[arxiv] [project page] [code] [model] [demo] [bibtex​​​

BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation

Yulu Pan, Ce Zhang, Gedas Bertasius

CVPR 2025

[arxiv] [project page] [code] [data] [bibtex​​​

A Simple LLM Framework for Long-Range Video Question-Answering

Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

EMNLP 2024

[arxiv] [code] [bibtex

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Gedas Bertasius, ... , Michael Wray

CVPR 2024

[arxiv] [project website] [blog] [video] [bibtex

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius

      CVPR 2024 (Egocentric Vision Distinguished Paper Award)

[arxiv] [project website] [code] [dataset[bibtex

VindLU: A Recipe for Effective Video-and-Language Pretraining

Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

CVPR 2023

[arxiv] [code] [bibtex

Is Space-Time Attention All You Need for Video Understanding?

Gedas Bertasius, Heng Wang, Lorenzo Torresani

      ICML 2021 (Top-5 Most Impactful ICML 2021 Paper)

[arxiv] [code] [talk] [slides] [blog] [VentureBeat] [SiliconAngle] [bibtex]

Contact

Prospective Graduate Students: I am recruiting motivated students in computer vision. Please email me a list of your prior publications and your CV.

Undergraduates at UNC: If you are interested in computer vision, especially its applications to sports, email me your CV and transcript with your GPA.

©2024 by Gedas Bertasius

bottom of page