Caption: "A research poster titled '360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model' is displayed at the CVPR conference held in Seattle, WA from June 17-22, 2023. The poster, presented by researchers from the Video Intelligence Laboratory, Peking University, details their innovative approach to generating 360-degree panoramic videos using a new model called 360DVD. It focuses on their methodology, dataset, and results compared to baseline models. Key sections highlighted include: - Motivation for the study, noting the increasing interest and high cost of capturing panoramic videos. - Their contributions, including the introduction of the 360DVD model, 360 Enhancement Techniques, a new dataset named WEB360, and experimental results showcasing high-quality outputs. - Detailed pipeline of 360DVD, listing components like 360-Adapter, Latent Rotation Mechanism, and Circular Padding Mechanism. - Comparisons with baseline models and results on personalized Text-to-Image (T2I) models, demonstrating notable improvements in generating high diversity and quality 360-degree panoramic videos. The poster also includes contact information, a QR code for their project page, and sample images showing the effectiveness of 360DVD against existing methods." Text transcribed from the image: 360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model Qian Wang, Weiqi Li, Chong Mou, Xinhua Cheng, Jian Zhang 大 北京大學 VILLA Visual-Information intelligent Learning LAB Peking University The Pipeline of 360DVD CVPR SEATTLE, WA JUNE 17-21, 2024 Comparison with Baselines Motivation ➤ Panorama video recently attracts more interest in both study and application, courtesy of its immersive experience. ➤ The cost of capturing panoramic videos is expensive, so generating panorama videos by prompts is urgently required. ➤ Existing text-to-video methods have challenges in yielding satisfactory 360° panoramic videos due to the significant gap in patterns between panoramic and standard videos. Contribution ➤ We introduce a controllable 360° panorama video generation diffusion model named 360DVD. ➤ We design 360 Enhancement Techniques including a latitude- aware loss and two mechanisms. ➤ We propose a new dataset named WEB360 with detailed caption enhanced by 360 Text Fusion. ➤ Experiments demonstrate that our 360DVD can generate high- quality, high-diversity 360° panorama videos. WEB360 Dataset ➤ The dataset comprises 2114 text-video pairs sourced from open-domain content, presented in 512x1024 ERP format. ➤ We first project the original ERP image to four non-overlapping perspective images at 0-degree longitude, with a FoV of 90. The four images are then fed into BLIP to be captioned. Contact Us Project Page on aerial view of a building at night ChatGPT 回 花鼓回 an aerial view of a parking lot at night 回 WeChat a night time view Facity of a city with an aerial view of a city at night, the city including buildings, lights, a parking lot and a street Please give me the summarization of provided captions of different views Each view groups include four views coptured from the center of a some scene 菜 Motion Estimator First Frome BLIP 360 Adapter Diffusion E 22 -N(0.1) 360 Text Fusion Denoising U-Net Training Inference -N(0.1) Video Criteria Panorama Criteria Optional Inputs Index Methods Graphics Quality Frame Consistency End Continuity Content Distribution Motion Patters Denoising U-Net XT B D Animate Diff A+LORA B+360ET Ours 11.3% 15.3% 5.3% 4.8% 44% 14.1% 6.0% 12.1% 6.5% 23.0% 9.7% 16.9% 16.1% 51.6% 64.5% 71.8% 67.0% 14.5% 74% Text CLIP 360TF Text CLIP Pre-trained Image Layer Pre-trained Motion Module VAE 360-Adapter Latitude-aware Loss C=E(x)y.N(0.1). |||W (<-0)| 21-H/8+1 Latent Rotation Mechanism Wij cos(- Pre-trand L Longitude Circular Padding Mechanism ➤360DVD leverages a trainable 360-Adapter to extend standard T2V models to the panorama domain and is able to generate panorama videos with given prompts and motion conditions. In addition, 360 Enhancement Techniques including a latitude-aware loss, latent rotation mechanism and circular padding mechanism are proposed for quality improvement. Controllability Results Results on Personalized T21 Models Ta large mountain lcke, the ended by hand und "The aty under cloudy sky, e car driving down the reet with building desart with and due the cloudy my arxiv.org/abs/2401.06578 github.com/akaneqwq/360dvd qianwang@stu.pku.edu.cn on aerial view of a e night time view of a city with a let of lights 205