- Video demo: A demo script to predict the recognition result using a single video
- Video GradCAM Demo: A demo script to visualize GradCAM results using a single video.
- Webcam demo: A demo script to implement real-time action recognition from web camera
- Long Video demo: a demo script to predict different labels using a single long video.
We provide a demo script to predict the recognition result using a single video. In order to get predict results in range [0, 1], make sure to set test_cfg = dict(average_clips='prob') in config file.
python demo/demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} {LABEL_FILE} [--use-frames] \
[--device ${DEVICE_TYPE}] [--fps {FPS}] [--font-size {FONT_SIZE}] [--font-color {FONT_COLOR}] \
[--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}]Optional arguments:
--use-frames: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input.DEVICE_TYPE: Type of device to run the demo. Allowed values are cuda device likecuda:0orcpu. If not specified, it will be set tocuda:0.FPS: FPS value of the output video when using rawframes as input. If not specified, it wll be set to 30.FONT_SIZE: Font size of the label added in the video. If not specified, it wll be set to 20.FONT_COLOR: Font color of the label added in the video. If not specified, it will bewhite.TARGET_RESOLUTION: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio.RESIZE_ALGORITHM: Resize algorithm used for resizing. If not specified, it will be set tobicubic.OUT_FILE: Path to the output file which can be a video format or gif format. If not specified, it will be set toNoneand does not generate the output file.
Examples:
Assume that you are located at $MMACTION2 and have already downloaded the checkpoints to the directory checkpoints/,
or use checkpoint url from configs/ to directly load corresponding checkpoint, which will be automatically saved in $HOME/.cahe/torch/checkpoints.
-
Recognize a video file as input by using a TSN model on cuda by default.
# The demo.mp4 and label_map.txt are both from Kinetics-400 python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/demo.mp4 demo/label_map.txt -
Recognize a video file as input by using a TSN model on cuda by default, loading checkpoint from url.
# The demo.mp4 and label_map.txt are both from Kinetics-400 python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/demo.mp4 demo/label_map.txt -
Recognize a list of rawframes as input by using a TSN model on cpu.
python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ PATH_TO_FRAMES/ LABEL_FILE --use-frames --device cpu -
Recognize a video file as input by using a TSN model and then generate an mp4 file.
# The demo.mp4 and label_map.txt are both from Kinetics-400 python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/demo.mp4 demo/label_map.txt --out-filename demo/demo_out.mp4 -
Recognize a list of rawframes as input by using a TSN model and then generate a gif file.
python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ PATH_TO_FRAMES/ LABEL_FILE --use-frames --out-filename demo/demo_out.gif -
Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm.
# The demo.mp4 and label_map.txt are both from Kinetics-400 python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/demo.mp4 demo/label_map.txt --target-resolution 340 256 --resize-algorithm bilinear \ --out-filename demo/demo_out.mp4# The demo.mp4 and label_map.txt are both from Kinetics-400 # If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio # For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128) python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/demo.mp4 demo/label_map.txt --target-resolution 170 -1 --resize-algorithm bilinear \ --out-filename demo/demo_out.mp4
-
Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and 10px fontsize.
# The demo.mp4 and label_map.txt are both from Kinetics-400 python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/demo.mp4 demo/label_map.txt --font-size 10 --font-color red \ --out-filename demo/demo_out.mp4 -
Recognize a list of rawframes as input by using a TSN model and then generate an mp4 file with 24 fps.
python demo/demo.py configs/recognition/tsn/tsn_r50_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ PATH_TO_FRAMES/ LABEL_FILE --use-frames --fps 24 --out-filename demo/demo_out.gif
We provide a demo script to visualize GradCAM results using a single video.
python demo/demo_gradcam.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} [--use-frames] \
[--device ${DEVICE_TYPE}] [--target-layer-name ${TARGET_LAYER_NAME}] [--fps {FPS}] \
[--target-resolution ${TARGET_RESOLUTION}] [--resize-algorithm {RESIZE_ALGORITHM}] [--out-filename {OUT_FILE}]--use-frames: If specified, the demo will take rawframes as input. Otherwise, it will take a video as input.DEVICE_TYPE: Type of device to run the demo. Allowed values are cuda device likecuda:0orcpu. If not specified, it will be set tocuda:0.FPS: FPS value of the output video when using rawframes as input. If not specified, it wll be set to 30.OUT_FILE: Path to the output file which can be a video format or gif format. If not specified, it will be set toNoneand does not generate the output file.TARGET_LAYER_NAME: Layer name to generate GradCAM localization map.TARGET_RESOLUTION: Resolution(desired_width, desired_height) for resizing the frames before output when using a video as input. If not specified, it will be None and the frames are resized by keeping the existing aspect ratio.RESIZE_ALGORITHM: Resize algorithm used for resizing. If not specified, it will be set tobilinear.
Examples:
Assume that you are located at $MMACTION2 and have already downloaded the checkpoints to the directory checkpoints/,
or use checkpoint url from configs/ to directly load corresponding checkpoint, which will be automatically saved in $HOME/.cahe/torch/checkpoints.
-
Get GradCAM results of a I3D model, using a video file as input and then generate an gif file with 10 fps.
python demo/demo_gradcam.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ checkpoints/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth demo/demo.mp4 \ --target-layer-name backbone/layer4/1/relu --fps 10 \ --out-filename demo/demo_gradcam.gif -
Get GradCAM results of a TSM model, using a video file as input and then generate an gif file, loading checkpoint from url.
python demo/demo_gradcam.py configs/recognition/tsm/tsm_r50_video_inference_1x1x8_100e_kinetics400_rgb.py \ https://download.openmmlab.com/mmaction/recognition/tsm/tsm_r50_video_1x1x8_100e_kinetics400_rgb/tsm_r50_video_1x1x8_100e_kinetics400_rgb_20200702-a77f4328.pth \ demo/demo.mp4 --target-layer-name backbone/layer4/1/relu --out-filename demo/demo_gradcam_tsm.gif
We provide a demo script to implement real-time action recognition from web camera. In order to get predict results in range [0, 1], make sure to set test_cfg = dict(average_clips='prob') in config file.
python demo/webcam_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${LABEL_FILE} \
[--device ${DEVICE_TYPE}] [--camera-id ${CAMERA_ID}] [--threshold ${THRESHOLD}] \
[--average-size ${AVERAGE_SIZE}]Optional arguments:
DEVICE_TYPE: Type of device to run the demo. Allowed values are cuda device likecuda:0orcpu. If not specified, it will be set tocuda:0.CAMERA_ID: ID of camera device If not specified, it will be set to 0.THRESHOLD: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0.AVERAGE_SIZE: Number of latest clips to be averaged for prediction. If not specified, it will be set to 1.
Examples:
Assume that you are located at $MMACTION2 and have already downloaded the checkpoints to the directory checkpoints/,
or use checkpoint url from configs/ to directly load corresponding checkpoint, which will be automatically saved in $HOME/.cahe/torch/checkpoints.
-
Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times and outputting result labels with score higher than 0.2.
python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map.txt --average-size 5 \ --threshold 0.2 --device cpu
-
Recognize the action from web camera as input by using a TSN model on cpu, averaging the score per 5 times and outputting result labels with score higher than 0.2, loading checkpoint from url.
python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ demo/label_map.txt --average-size 5 --threshold 0.2 --device cpu
-
Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times and outputting result labels with score higher than 0.2.
python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map.txt \ --average-size 5 --threshold 0.2
Note: Considering the efficiency difference for users' hardware, Some modifications might be done to suit the case.
Users can change:
1). SampleFrames step (especially the number of clip_len and num_clips) of test_pipeline in the config file.
2). Change to the suitable Crop methods like TenCrop, ThreeCrop, CenterCrop, etc. in test_pipeline of the config file.
3). Change the number of --average-size. The smaller, the faster.
We provide a demo script to predict different labels using a single long video. In order to get predict results in range [0, 1], make sure to set test_cfg = dict(average_clips='prob') in config file.
python demo/long_video_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${VIDEO_FILE} ${LABEL_FILE} \
${OUT_FILE} [--input-step ${INPUT_STEP}] [--device ${DEVICE_TYPE}] [--threshold ${THRESHOLD}]Optional arguments:
OUT_FILE: Path to the output video file.INPUT_STEP: Input step for sampling frames, which can help to get more spare input. If not specified , it will be set to 1.DEVICE_TYPE: Type of device to run the demo. Allowed values are cuda device likecuda:0orcpu. If not specified, it will be set tocuda:0.THRESHOLD: Threshold of prediction score for action recognition. Only label with score higher than the threshold will be shown. If not specified, it will be set to 0.01.
Examples:
Assume that you are located at $MMACTION2 and have already downloaded the checkpoints to the directory checkpoints/,
or use checkpoint url from configs/ to directly load corresponding checkpoint, which will be automatically saved in $HOME/.cahe/torch/checkpoints.
-
Predict different labels in a long video by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames) and outputting result labels with score higher than 0.2.
python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO \ --input-step 3 --device cpu --threshold 0.2
-
Predict different labels in a long video by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames) and outputting result labels with score higher than 0.2, loading checkpoint from url.
python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
-
Predict different labels in a long video from web by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames) and outputting result labels with score higher than 0.2, loading checkpoint from url.
python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \ https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \ https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 \ demo/label_map.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
-
Predict different labels in a long video by using a I3D model on gpu, with input_step=1 and threshold=0.01 as default.
python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \ checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO \