Nowadays, video content accounts for a large percentage of network traffic. Most streaming services use HTTP Adaptive Streaming by splitting the video in non-overlapping segments and by encoding each video segment independently (possibly with multiple representations to allow adaptation to the varying network conditions). In this work we propose an encoding scheme based on a Deep Neural Network (DNN), to perform a per-title and per-segment adaptive estimation of the encoding parameter to achieve a target video quality of the encoded video. A dataset has been prepared using 1212 segments obtained from 158 videos. The segments have been encoded with 19 Constant Rate Factor (CRF) values using the VP9 encoder, generating a total of 23028 encoded segments, and for each encoded video segment, its Video Multi-Method Assessment Fusion (VMAF) quality has been computed. Besides, from a 240p downscaled version of the segments a feature vector has been obtained, and an analysis of the dependency of the features with the resolution has been carried out. With this dataset a DNN has been trained to estimate the CRF to be applied to each segment to achieve a target VMAF quality. Results show that the trained network is able to provide the CRF value to be applied to each video segment to achieve the desired quality with low computational overhead. To validate the proposal, the network has been applied to obtain the CRF to encode 11 videos (not used during the training process) using the VP9 codec at Full High Definition (FHD) resolution. Results show that the system adapts the CRF to each segment and that the final videos have a mean deviation of 1.84% with respect to the requested VMAF value.
dnnvideocoding/dnnvideocoding.github.io
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|