A detailed technical poster titled "Task-Aware Encoder Control for Deep Video" showcases advanced methodologies and frameworks in video encoding. Prominently featured at an academic or research conference, the poster is authored by Xingtong Ge, Jixiang Luo, Xingjie Zhang, Tongtao Xie, et al., affiliated with institutions like SenseTime, HKUST, Tsinghua University, and SUTD. The content focuses on "Controlling DVC for Machine" and details the innovative "Dynamic Vision Mode Prediction (DVMP)" and "DivGOP & GoP Selection" mechanisms. Diagrams and flowcharts illustrate the complex processes, including tasks like Hypoerprior Information, Bottleneck Layers, Entropy Coding, and various modules integrating Pre-Analysis, Feature Extraction, and Selection mechanisms. Detailed results and graphs are presented on bitrate reduction effectiveness while maintaining semantic information, optimizing GoP structure, and achieving low reconstruction error rates, emphasizing the intricate balance of data compression and computational efficiency. Text transcribed from the image: INSTITUTE OF 商汤 sensetime Task-Aware Encoder Control for Deep Video Xingtong Gel2, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan BIT&Sense Time & HKUST & Tsinghua University&SITU&CU "Controlling DVC for Machine" Framework DivGoP & GoP Selection 01010101 GoP Structure Vector xt Xref Encoder Controlled Learned Video Codec AE AD Decoded Frame Buffer Pretrained Decoder Residual 054 0.58p 0.57 0.56 0.56 0.55 ft DVMP fm 0.54 0.53 0.52 PVC OFS P 0.52 <-PVC DIVGOP -PVC 0.50 Freture Encode FC 0.51 0.05 0.10 015 Bpp 004 008 012 36 125 Trac (a) Pre-Analysis RAFT Detector Input frames {x1, x2, x3, ...} Motion Encoder (b) GoP Prediction Conv(32) GoP Feature Sogit Extractor Gumbel-Softmax Sampling/Logit Distribution LReLU get better Bpp-mAP trade-offs. Conv(32) AdaAvePool GoP Structure Vector Linear, 2) (c) Previous works require individually customized codecs to support different downstream tasks, which is complex and difficult to deploy. GoP Feature Extractor How to use one pre-trained decoder to support both human and machine vision tasks? 1. Dividing the original P frames into two types: P frames and new Pm frames (predicted with DVMP). 2. Using GoP Selection Module to control the encoding GoP structure for different objectives, such as vision tasks and video reconstruction. 3. Maintaining the decoder weights constant to ensure compatibility across multiple tasks. Dynamic Vision Mode Prediction(DVMP) Hyperprior Information Encoded Residual Feature Hyperprior Information Conv(C,3,1) Conv(C, 3, 1) ResBlock(C, 3) ResBlock(C, 3) ResBlock(C, 3) ResBlock(C, 3) Selection Entropy Coding ResBlock(C,3) Contextual rmation Conv(C, 3, 1) Gumbel Softmax Effectively reduce the bitrate while preserving critical semantic information. (Up) DVMP for hyper prior entropy models (suitable for FVC Decoded Residual and DCVC-TCM). Feature (Down) DVMP for entropy models with autoregressive components (suitable for DCVC). (Left) DFS optimal GoP Structure and DviGoP Structure. Succeed to (Right) Simply fine-tuning FVC for machine task Fail to get better bpp mAP trade-offs. GoP Structure Optimization Target: arg ming R(0)+(0) GoP Selection Module Stage a): Pre-Analysis Detector+RAFT+Motion Encoder th Stage b): GoP Prediction GoP Feature Extractor: produce Stogie for the current GoP sequence. Vector Sampler: (Training) Gumbel-Softmax Sampling for GoP-1 times using Stogie- (Inference) Logit Distribution using Softmax(Stogit) Dynamically determine the GOP structures for different video sequences Loss Function & Training Strategy Training Stage 1: Train DVMP (frame-wise) L= R+ Training Stage 2: Train GoP Selection Module (GoP-wise). Opeham 20 715 L₂ = R + Agh BD-BR Results Pm frame(semantic friendly): low bitrate, low reconstruction quality. P frame: high bitrate, high reconstruction quality. Hybrid encoding: using Pm frames to reduce bitrate and es to suppress reconstruction error propagation. TCM [31] Ours+TCM -25.19 -32.34 -31.02 -39.85 -26.44 40.82 45.10 46.15 -51.66 -38.98- HEVC [35] -33.88 -31.35 40.43 15.80 -34.02 0.0 0.0 0.0 MOTA MAP MAP50 MOTP FN 0.0 0.0 Method DCVC [20] -5.32 -9.75 -14.53 -1.39 6.20 -4 Ours+DCVC 41.82 -39.43 40.60 -37.73 41.74 40 -34.28 -31.27 -32.09 -35.02 -32.89 3 FVC [16] Ours+FVC ResBlock(C, 3) MaskC Conv(C, 1, 1) Conv(C, 1, 1) Gumbel Softmax Max