top of page

A Simple LLM Framework for Long-Range Video Question-Answering

Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

arXiv

[arxiv] [code] [bibtex

Video ReCap: Recursive Captioning of Hour-Long Videos

Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius

CVPR 2024

[arxiv] [project website] [code] [dataset[bibtex

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Gedas Bertasius, ... , Michael Wray

CVPR 2024

[arxiv] [project website] [blog] [video] [bibtex

LoCoNet: Long-Short Context Network for Active Speaker Detection

Xizi Wang, Feng Cheng, Gedas Bertasius, David Crandall

CVPR 2024

[arxiv] [code] [bibtex

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal

ICCV 2023

[arxiv] [code] [bibtex

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers

Qin Liu, Zhenlin Xu, Gedas Bertasius, Marc Niethammer

ICCV 2023

[arxiv] [code] [bibtex

VindLU: A Recipe for Effective Video-and-Language Pretraining

Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius

CVPR 2023

[arxiv] [code] [bibtex

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius

CVPR 2023

[arxiv] [code] [project page] [bibtex

Efficient Movie Scene Detection using State-Space Transformers

Md Mohaiminul Islam, Mahmudul Hasan, Kishan Athrey, Tony Braskich, Gedas Bertasius

CVPR 2023

[arxiv] [bibtex

Improving Video Retrieval Using Multilingual Knowledge Transfer

Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

ECIR 2023 (Best Student Paper Award)

[arxiv]

Learning to Retrieve Videos by Asking Questions

Avinash Madasu, Junier Oliva, Gedas Bertasius

ACM Multimedia 2022

[arxiv] [bibtex]

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius

ECCV 2022 (Oral)

[arxiv] [code] [project page] [bibtex

Long Movie Clip Classification with State-Space Video Models

Md Mohaiminul Islam, Gedas Bertasius

ECCV 2022

[arxiv] [code] [bibtex]

Learning To Recognize Procedural Activities with Distant Supervision

Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani

CVPR 2022

[arxiv] [code] [project page] [bibtex]

Long-Short Temporal Contrastive Learning of Video Transformers

Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani

CVPR 2022

[arxiv] [bibtex]

Vx2Text: End-to-End Learning of Video-Based Text Generation from Multimodal Inputs

Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani

CVPR 2021

[arxiv] [VentureBeat] [bibtex]

Supervoxel Attention Graphs for Long-Range Video Modeling

Yang Wang, Gedas Bertasius, Tae-Hyun Oh, Abhinav Gupta, Minh Hoai, Lorenzo Torresani

WACV 2021

Attentive Action and Context Factorization
Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai

​BMVC 2020

[arxiv]

Classifying, Segmenting, and Tracking Objects in Video with Mask Propagation
Gedas Bertasius, Lorenzo Torresani

​CVPR 2020 (Best Paper Nominee)
Ranked 1st on YouTube-VIS Leaderboard and EPIC-Kitchens Detection Challenge.
[
arxiv] [talk] [slides] [bibtex]

Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani

NeurIPS 2019
Ranked 1st on PoseTrack Leaderboard for multi-frame pose estimation.
​[
arxiv] [poster] [code] [bibtex]

Object Detection in Video with Spatiotemporal Sampling Networks
Gedas Bertasius, Lorenzo Torresani and Jianbo Shi

​ECCV 2018
[arxiv] [results] [bibtex]

iccv17_baller_cover.gif

Am I a Baller? Basketball Performance Assessment from First-Person Videos
Gedas Bertasius, Stella X. Yu, Hyun Soo Park and Jianbo Shi

​ICCV 2017
[arxiv] [results] [bibtex

Unsupervised Learning of Important Objects from First-Person Videos
Gedas Bertasius, Hyun Soo Park, Stella X. Yu and Jianbo Shi

​ICCV 2017
[arxiv] [bibtex]

Convolutional Random Walk Networks for Semantic Image Segmentation
Gedas Bertasius, Lorenzo Torresani, Stella X. Yu and Jianbo Shi

​CVPR 2017
[arxiv]​​ [bibtex

First-Person Action-Object Detection with EgoNet
Gedas Bertasius, Hyun Soo Park, Stella X. Yu, and Jianbo Shi

​RSS 2017
[arxiv] [New Scientist Article] [Impact Article] [results[bibtex]

Local Perturb-and-MAP for Structured Prediction 
Gedas Bertasius, Qiang Liu, Lorenzo Torresani, and Jianbo Shi

​AISTATS 2017
[arxiv] ​[bibtex

Semantic Segmentation with Boundary Neural Fields
Gedas Bertasius, Jianbo Shi and Lorenzo Torresani

CVPR 2016
[arxiv] [code] [bibtex]

DeepEdge:  A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
Gedas Bertasius, Jianbo Shi, and Lorenzo Torresani

CVPR 2015 
[arxiv] [bibtex]

bottom of page