A detailed shot captures an open book resting on someone's lap. The open page features densely typed text, appearing to be from an academic book or conference agenda. A green tab indicates a section divider, possibly marking a significant segment such as a new chapter or topic. The person is sitting in a cushioned chair with charcoal-colored armrests and is wearing mesh knitted footwear. The background includes a carpeted floor with an abstract beige pattern, suggesting a conference or academic setting. A black backpack is partially visible behind the book, resting on the leg of the person. The overall scene gives an impression of an engaged event attendee absorbed in the material at hand. Text transcribed from the image: Maling St Foundation Raising state Video FAR Praction Detection in for Activity Arsha Nagrani wer Mini War Detection e diegober Ge for Action Detection in Ping Gu aber Temporal Action Limin Wang Shed a lemaking. Wang J3D me Bags: Kete in Zhang Chen Zhen with 18 Large Video-Language Models Himangi La Romes Shuming Li La mononparameters Memory Banks Improve Video Object Wang Za Pangalization with Behavioral L Gallego Bent Cameras Human Suman Ghosh Ignacio Juarez 30 Low power Continuous Marler ce bending of Egocentric Graphs for Long-Forma 3 Radi Antonio Fumari Ryde Min Subarna Tripathi A Facionle Guided Conceptual Reasoning and Uncertainty Languages Etion for Event-based Action Recognition and More 3 BACTL Zhou Zheng Hantu Lyu Lin Wang 38 Uncertainty aware Action Decoupling Transformer for Action Antopation, Hong Gun Nakul Agarwal Shao-Yuan Lo Kivonicon Lee. Quing 30 Error Detection in Egocentric Procedural Task Videos, Shih-Po Lee Z Lu Zekun Zhang Minh Hoai Ehsan Elhamifer Learning to Predict Activity Progress by Self-Supervised Video Alignment Gerard Donahue Ehsan Elhamitar 32 MaskCLR Attention-Guided Contrastive Learning for Robust Action Representation Learning, Mohamed Abdelfattah, Mariam Hassan Alexandre Alahi 383 Align Before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition, Yifei Chen, Dapeng Chen Ruin Liu Sai Zhou, Wenyuan Xue, Wei Peng 394 DIBS Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement, Hao Wu Huabin Liu Yu Qiao, Xiao Sun ses Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection, Yicheng Xiao, Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Zhuoyan Luo, Xu Li 396 Test-Time Zero-Shot Temporal Action Localization, Benedetta Liberatori Alessandro Conti, Paolo Rota, Yiming Wang, Elisa Ricci 397 Selective Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition, Filip Ilic, He Zhao, Thomas Pock, Richard P. Wildes 398 Step Differences in Instructional Video, Tushar Nagarajan, Lorenzo Torresani 399 Compositional Video Understanding with Spatiotemporal Structure-based Transformers, Hoyeoung Yun, Jinwoo Ahn, Minseo Kim Eun-Sol Kim 400 Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation, Ming Xu, Stephen Gould 401 FineParser: A Fine-grained Spatio-temporal Action Parser for 46/CVPR 2024 PROGRAM GUIDE MAIN CONFERENCE THURSDAY, JUNE 20 Human-centric Action Quality Assessment, plin Xu Sibo Yin 402 Part Zhao, Zishuo Wang, Yuxin ware Unified Retation of Language and for James Bailey Zero-shot Action Recognition, Angi Zhu, Qiuhong Ke, Mingming Training Free Token Merging for Light-weight Video Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, vid Transform Minhyuk Choi, Hyunwoo J. Kim 404 CPR-Coach: Recognizing Composite Error Actions based on Single-class Training, Shunli Wang, Shuaibing Wang, Dingkang Yang Mingcheng Li, Haopeng Kuang, Xiao Zhao, Liuzhen Su, Peng Zhai Lihua Zhang 405 Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly, Hang Du, Sicheng Zhang, Binzhu Xie, Guoshun Nan, Jiayang Zhang, Junrui Xu, Hangyu Liu, Sicong Leng, Jiangming Liu, Hehe Fan, Dajiu Huang, Jing Feng, Linli Chen, Can Zhang, Xuhuan Li, Hao Zhang, Jianhang Chen, Qimei Cui. Xiaofeng Tao 406 Detours for Navigating Instructional Videos, Kumar Ashutosh, Zihui Xue. Tushar Nagarajan, Kristen Grauman Planning of Instructional Videos, Kumaranage Ravindu Yasas Nagasinghe, Honglu Zhou, Malitha Gunawardhana, Martin Rengiang Min, Daniel Harari, Muhammad Haris Khan 407 Why Not Use Your Textbook? Knowledge-Enhanced Procedure Sanchez, Georgios Tzimiropoulos 408 Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization, loanna Ntinou, Enrique 409 TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression, Ho-Joong Kim, Jung-Ho Hong, Heejo Kong, Seong-Whan Lee 410 CSTA: CNN-based Spatiotemporal Attention for Video Summarization, Jaewon Son, Jaehun Park, Kwangsu Kim 411 PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition, Haosong Zhang, Mei Chee Leong, Liyuan Li, Weisi Lin 412 MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection, Jakub Micorek, Horst Possegger, Dominik Narnhofer, Horst Bischof, Mateusz Kozinski 413 Language Model Guided Interpretable Video Action Reasoning, Ning Wang, Guangming Zhu, HS Li, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun 415 414 OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition, Tongjia Chen, Hongshan Yu, Zhengeng Yang, Zechuan Li, Wei Sun, Chen Chen Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection, Zhiwei Yang, Jing Liu, Peng Wu 416 VideoGrounding-DINO: Towards Open-Vocabulary Spatio- Video Grounding, Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan 417 Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training, Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa Temporal 418 SnAG: Scalable and Accurate Video Grounding, Fangzhou Mu, Sicheng Mo, Yin Li 419 Learning Correlation Structures for Vision Transformers, Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho 420 Weakly-Supervised Audio-Visual Video Parsing with Prototype- based Pseudo-Labeling, Kranthi Kumar Rachavarapu, Kalyan Ramakrishnan, Rajagopalan A. N. 421 Matching Anything by Segmenting Anything, Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu 422 3D Feature Tracking via Event Camera, Siqi Li, Zhikuan Zhou, Zhou Xue, Yipeng Li, Shaoyi Du, Yue Gao 423 Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture, Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang 424 Towards Generalizable Multi-Object Tracking, Zheng Qin, Le Wang, Sanping Zhou, Panpan Fu, Gang Hua, Wei Tang 425 SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction, Conghao PROGRAM GUIDE Wong, Beihao Xia, Ziqian Zou, Yulong Wang Xe 426 Self-Supervised Multi-C Zijia Lu, Bing Shuai, Yant To forsving Zhenlin 427 UnSAMFlow: Unsupervised Optical Flow Anything Model, Shuai Yuan, Lei Luo, Zhuo Wa 428 RTracker: Recoverable Tracking via P Memory. Yuqing Huang, Xin Li, Zikun Zhou, w Rakesh Ranjan, Denis Demandol He, Ming-Hsuan Yang Da Tree Struck 429 ARTrackV2: Prompting Autoregressive Tracker h How to Describe, Yifan Bai, Zeyang Zhao, Yhong Gong 430 Endow SAM with Keen Eyes: Temporal-spatial P Zhu, Shuai Zheng. Yao Zhao Memory, Qiaole Dong, Yanwei Fu for Video Camouflaged Object Detection, Wome 431 MemFlow: Optical Flow Estimation and Prediction with 432 OneTracker: Unifying Visual Object Tracking with Found Models and Efficient Tuning, Lingyi Hong, Shillin Yan Zhang, Wanyun Li, Xinyu Zhou, Pinxue Guo, Kanun lang Chen, Jinglun Li, Zhaoyu Chen, Wenqiang Zhang 433 Learned Trajectory Embedding for Subspace Clustering, Yaroslava Lochman, Carl C 434 PNeRV: Enhancing Spatial Consistency via P 435 DiffusionTrack: Point Set Diffusion Model for Visual Obi 436 Sparse Global Matching for Video Frame Interpolation with Lang Tracking, Fei Xie, Zhongdao Wang, Chao Ma Motion, Chunxu Liu, Guozhen Zhang, Rui Zhao, Limin Wang Olsson, Christopher Zach Representation for Videos, Qi Zhao, M. Salman Asif, Zhan Ma Wes 437 iKUN: Speak to Trackers without Retraining. Yunhao Du, Cheng Lei, Zhicheng Zhao, Fei Su 438 NetTrack: Tracking Highly Dynamic Objects with a Net, Guangze Zheng, Shijie Lin, Haobo Zuo, Changhong Fu, Sa Pan 439 Single-Model and Any-Modality for Video Object Tracking, Zongwei Wu, Jilai Zheng, Xiangxuan Ren, Florin-Alexandru Vasluianu, Chao Ma, Danda Pani Paudel, 440 FlowDiffuser: Advancing Optical Flow Estimation with Dition Models, Ao Luo, Xin Li, Fan Yang, Jiangyu Liu, Haoqiang Fan, Shuaichonization with Triplet Spatio-Temporal Variation Luc Van Gool, Radu Timofte 441 Video 442 Liu Patterns, Zonghui Guo, Xinyu Han, Jie Zhang, Shiguang Shan, nense Optical Tracking: Connecting the Dots, Haiyong Zheng Cameras, Guillaume Le Moing, Jean Ponce, Cordelia Schmid 443 Efficient Meshflow and Optical Flow Estimation from Event , Xinglong Luo, Ao Luo, Zhengning Wang, Chunyu Lin, Bing Zeng, Shuaicheng Liu 444 Context-Aware Integration of Language and Visual References for Natural Language Tracking, Yanyan Shao, Shuting He, Qi Ye, Yuchao Feng, Wenhan Luo, Jiming Chen tes 445 Depth-aware Test-Time Training for Zero-shot Video Object Segmentation, Weihusng Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cun 446 Weakly Supervised Video Individual Counting, Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton van den Hengel, Ming-Hsuan Yang, Qingming Huang 447 Dual Prototype Attention for Unsupervised Video Object Segmentation, Suhwan Cho, Minhyeok Lee, Seunghoon Lee, Dogyoon Lee, Heeseung Choi, Ig-Jae Kim, Sangyoun Lee 448 Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline, Xiao Wang, Shiao Wang Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang 449 HIPTrack: Visual Tracking with Historical Prompts, Wenrui Cal Qingjie Liu, Yunhong Wang 450 FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking Seokju Cho, Jiahui Huang, Seungryong Kim, Joon-Young Lee 451 Implicit Motion Function, Yue Gao, Jiahao Li, Lei Chu, Yan Lu 452 DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking, Cheng Huang, Shoudong Han, Mengyu He, Wenbo Zheng, Yuhao Wei CVPR2024 CONFER