
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng, Xizi Wang, Jie Lei, David Crandall, Mohit Bansal, Gedas Bertasius
CVPR 2023

Vision Transformers are Parameter-Efficient Audio-Visual Learners
Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius
CVPR 2023
[arxiv] [code] [project page] [bibtex]

Efficient Movie Scene Detection using State-Space Transformers
Md Mohaiminul Islam, Mahmudul Hasan, Kishan Athrey, Tony Braskich, Gedas Bertasius
CVPR 2023

Improving Video Retrieval Using Multilingual Knowledge Transfer
Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal
ECIR 2023 (Best Student Paper Award)
[arxiv]

Learning to Retrieve Videos by Asking Questions
Avinash Madasu, Junier Oliva, Gedas Bertasius
ACM Multimedia 2022

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin, Jie Lei, Mohit Bansal, Gedas Bertasius
ECCV 2022 (Oral)
[arxiv] [code] [project page] [bibtex]

TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng, Gedas Bertasius
ECCV 2022

Long Movie Clip Classification with State-Space Video Models
Md Mohaiminul Islam, Gedas Bertasius
ECCV 2022

Learning To Recognize Procedural Activities with Distant Supervision
Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani
CVPR 2022
[arxiv] [code] [project page] [bibtex]

Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang, Gedas Bertasius, Du Tran, Lorenzo Torresani
CVPR 2022

Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius, Heng Wang, Lorenzo Torresani
ICML 2021
[arxiv] [code] [talk] [slides] [Facebook AI Blog] [VentureBeat] [SiliconAngle] [bibtex]

Vx2Text: End-to-End Learning of Video-Based Text Generation from Multimodal Inputs
Xudong Lin, Gedas Bertasius, Jue Wang, Shih-Fu Chang, Devi Parikh, Lorenzo Torresani
CVPR 2021
[arxiv] [VentureBeat] [bibtex]

Supervoxel Attention Graphs for Long-Range Video Modeling
Yang Wang, Gedas Bertasius, Tae-Hyun Oh, Abhinav Gupta, Minh Hoai, Lorenzo Torresani
WACV 2021

COBE: Contextualized Object Embeddings from Narrated Instructional Video
Gedas Bertasius, Lorenzo Torresani
NeurIPS 2020
[arxiv] [talk] [slides] [HowTo100M_BB pseudo annotations] [bibtex]

Attentive Action and Context Factorization
Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai
BMVC 2020
[arxiv]

Classifying, Segmenting, and Tracking Objects in Video with Mask Propagation
Gedas Bertasius, Lorenzo Torresani
CVPR 2020 (Best Paper Nominee)
Ranked 1st on YouTube-VIS Leaderboard and EPIC-Kitchens Detection Challenge.
[arxiv] [talk] [slides] [bibtex]

Learning Temporal Pose Estimation from Sparsely-Labeled Videos
Gedas Bertasius, Christoph Feichtenhofer, Du Tran, Jianbo Shi, Lorenzo Torresani
NeurIPS 2019
Ranked 1st on PoseTrack Leaderboard for multi-frame pose estimation.
[arxiv] [poster] [code] [bibtex]

Object Detection in Video with Spatiotemporal Sampling Networks
Gedas Bertasius, Lorenzo Torresani and Jianbo Shi
ECCV 2018
[arxiv] [results] [bibtex]

Egocentric Basketball Motion Planning from a Single First-Person Image
Gedas Bertasius, Aaron Chan and Jianbo Shi
CVPR 2018
[arxiv] [results] [MIT SSAC Poster] [bibtex]

Am I a Baller? Basketball Performance Assessment from First-Person Videos
Gedas Bertasius, Stella X. Yu, Hyun Soo Park and Jianbo Shi
ICCV 2017
[arxiv] [results] [bibtex]

Unsupervised Learning of Important Objects from First-Person Videos
Gedas Bertasius, Hyun Soo Park, Stella X. Yu and Jianbo Shi
ICCV 2017
[arxiv] [bibtex]

Convolutional Random Walk Networks for Semantic Image Segmentation
Gedas Bertasius, Lorenzo Torresani, Stella X. Yu and Jianbo Shi
CVPR 2017
[arxiv] [bibtex]

First-Person Action-Object Detection with EgoNet
Gedas Bertasius, Hyun Soo Park, Stella X. Yu, and Jianbo Shi
RSS 2017
[arxiv] [New Scientist Article] [Impact Article] [results] [bibtex]

Local Perturb-and-MAP for Structured Prediction
Gedas Bertasius, Qiang Liu, Lorenzo Torresani, and Jianbo Shi
AISTATS 2017
[arxiv] [bibtex]

Semantic Segmentation with Boundary Neural Fields
Gedas Bertasius, Jianbo Shi and Lorenzo Torresani
CVPR 2016
[arxiv] [code] [bibtex]

High-for-Low, Low-for-High: Efficient Boundary Detection from Deep Object Features and its Applications to High-Level Vision
Gedas Bertasius, Jianbo Shi, and Lorenzo Torresani
ICCV 2015
[arxiv] [code] [bibtex]

DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
Gedas Bertasius, Jianbo Shi, and Lorenzo Torresani
CVPR 2015
[arxiv] [bibtex]