A researcher presents their project on "Task-Aware Encoder Control for Deep Video Compression" at a conference. The detailed poster behind them outlines the "Controlling DVC for Machine" framework, showcasing intricate diagrams and data related to Dynamic Vision Mode Prediction (DVMP). The presenter, wearing a conference badge and lanyard, explains different aspects of their research, including the complexity of customized code for various tasks, approach implementations, and efficiency metrics. The well-lit venue, with its modern ceiling lights and organized setting, provides an ideal environment for professional engagement and knowledge exchange.
Text transcribed from the image:
INSTITUTE OF
TECHNOLO
S
商品
sensetime
Task-Aware Encoder Control for Deep Video Compresion
Xingtong Gel2, Jixiang Luo, Xinjie Zhang, Tongda Xut Gun Lal Dilan
BIT&Sense Time & HKUST & Tina
"Controlling DVC for Machine" Framework DivGoP & GoP Selection
01010101
GoP Structure Vector
Learned Video Codec
ng
y&&
Tracking & Obied Desk Regi
xt
Xre
Decoded Frame Bobler
(a)
Analysis
RAFT
Detector
Lo
Input frames (x1,x2, x3...)
Motion
no onont
Encoder
(c)
Previous works require individually customized codecs to support different do
tasks, which is complex and difficult to deploy.
How to use one pre-trained decoder to support both human and machine vision b
1. Dividing the original P frames into two types: P frames and new P frames (predict
with DVMP).
2. Using GoP Selection Module to control the encoding GoP structure for different
objectives, such as vision tasks and video reconstruction.
3. Maintaining the decoder weights constant to ensure compatibility across multiple tasks
Dynamic Vision Mode Prediction(DVMP)
Hyperprior
Information
Encoded Residual
Feature
Hyperpr
Informati
ResBlock(C, 3)
ResBlock(C.
ResBlock(C, 3
Conv(C, 3, 1)
Informa
Selection
Entropy
Coding
Effectively reduce the bitrate
00000 while preserving entical semantic
information
(Up) DVMP for hyper prior
entropy models (suitable for FVC
Decoded Resideal and DCVC-TCM)
Feature
(Down) DVMP for entropy
models with autoregressive
components (suitable for DCVC)
dion P frame(semantic friendly
low bitrate,
low reconstruction quality
P frame
high bitrate
high reconstruction quality
Hybrid encoding: using P frames to reduce bitate and
P frames to suppress reconstruction error propos
(Lef) DFS optimal Gol Sturt and Da
get better Bpp-AP made-
(Right) Simply fine-tuning FVC fre
MAP trade-off
GoP Structure Optimization Tarpt
arming+
Vector S
(Training) Gu
(Inference) Logit D
Module
Dynamically deter the Gol
Loss Function & Training S
Training Sage 1 Train DVMP
401+44
Training Stage 2 Tram GoP Scinctor Madc
4=1+44
BD-BR Resul
MOTA AP AP MOTP IS
Methad
DCVC/20000
H
VOXEL51
SOU
VOXEL51
CVPR
Xingtong
Xingto