Course Description

This is an advanced seminar course that will focus on the latest research on transformer models for visual recognition. The course will consist of research paper presentations and a semester-long course project. Topics will include vision transformers, MLP-based models, self-supervised learning, multi-modal learning, and various image and video-based applications.  Background in deep learning is required.

Administrative Information

Grading

  • Class Participation: 10%

  • Paper Critiques: 20%

  • Paper Presentations: 30%

  • Course Project: 40%

Course Policies

  • Class Participation: Please come to class prepared for a paper discussion with your peers.  Furthermore, please do not discuss the papers with your peers before the class. I'm interested in hearing your own opinion about the papers.

  • Late Submissions: The class is structured around a tight paper presentation schedule. Therefore, late assignments will not be accepted. 

  • Academic Integrity: For your presentations and projects, you are allowed to use materials from external sources. However,  you must clearly acknowledge those sources.