The test and train data is provided in the form of YouTube urls as a list stored in pickle files in the respective directories. We also provide the metadata for each video in <split>_metadata.json in the respective split folders. The metadata file is in the format of a dictionary with video name as the key and article, asr, url, and video source as the dictionary value.
We proivde the script to download the videos from the pickle files:
pip install pytube
python download_vids.py --vid_ids_file <path/to/video_ids.pkl> --vid_dir <dir/path/to/download/videos/to> --num_videos <num_of_videos_to_download> --num_threads <num_threads>
We also provide the script to extract frame from videos. The default FPS is 1 fps. It can be changed in the script.
pip install opencv-python
python extract_frames.py --vid_dir <path/to/videos/dir> --vid_frames_dir <path/to/videos/frames/dir> --num_threads <num_threads>