This repository serves as a playground for experiments to understand and analyze contents from videos with particular focus on educational videos.
-
main.py: TODO -
prompting_gui.py: TODO -
run_img.py: TODO -
generate_kb_dataset.py: TODO -
generate_videos_kgs.py: TODO -
video_loader:-
data_window.py: TODO -
frame.py: TODO -
inference.py: TODO -
video_frames_generator.py: TODOVideoFramesGenerator: TODOVideoReaderEndpoint: TODOVidGearEndpoint: TODODecordEndpoint: TODO
-
-
transcription:-
aligner.py: HoldsAlignedDataWindowGenerator, responsible for aligning transcription dictionary from i.e.whisperwith video windows of frame, generatingDataWindow. -
coherence_calculation.py: TODO -
whisper_transcriber_$ENDPOINT.py: TODO
-
-
utils:core.py: TODOdatasets.py: TODOkb_builder.py: TODOkb_reader.py: TODOkb_dataset_writer.py: TODOkb_dataset_reader.py: TODOstorage_config.py: TODOvisualizer.py: TODO
-
downstream:sk_custom.py: TODOskorch_custom.py: TODOtorch_custom.py: TODOtrain_eval_utils.py: TODOvideo_type_classifiers.py: TODOvideo_type_classification.ipynb: TODO
-
data/data/$SUB_GROUP: TODO -
downstream/generated_kbs/$DATETIME_$DATASET_NAME_$DATASET_MODE_$UNIQUE_UUID: TODO -
`downstream/datasets/{$DATASET_NAME.csv | $DATASET_NAME/$DATASET_MODE.json}: TODO
-
ontology: TODO-
node: TODObase.py: TODOsynset_node.py: TODOvirtual_synset: TODOVirtualSynset: TODOClassifier: TODOVirtualSynsetDB: TODO
-
prompt.py: TODO -
video_kg.py: TODO -
graph_table.py: TODO -
graph_construction.py: TODO
-
-
analysis: TODO -
batch_jobs:load_env_on_carc.sh: TODOgenerate_kb_ds.job: TODOgenerate_kgs.py: TODOparallel_generate_kb_ds.py: TODO
An example of converting a collection of videos (i.e., videos dataset) into a knowledge base dataset using a pipeline
- Ensure the database is constructed in one of the two formats (i.e.,
csvorjsonwith splits).
csvformat:downstream/datasets/{$DATASET_NAME.csvjsonformat:downstream/datasets/$DATASET_NAME/$DATASET_MODE.json- Note if you want some other format, just implement your own reading method in
utils/datasets.pyand use it in thegenerate_kb_dataset.pyscript.
- Run the
generate_kb_dataset.pyscript to generate the knowledge base dataset.
-
The script requires the following arguments:
--dataset_name: The name of the dataset to be processed.--dataset_mode: The mode of the dataset to be processed (default:test).--output_dir: The output directory to save the generated knowledge base dataset.- other optional arguments can be found in the script.
-
Example usage (three ways to run this script):
- Just calling the script on your local machine:
python generate_kb_dataset.py --dataset_name $DATASET_NAME --dataset_mode $DATASET_MODE --output_dir $OUTPUT_DIR - As a batch job using single node of
slurm:sbatch batch_jobs/generate_kb_ds.job --dataset_name $DATASET_NAME --dataset_mode $DATASET_MODE --output_dir $OUTPUT_DIR - As a batch job distributed over multiple nodes where duplicate processes are created taking chunks of the datasets:
sbatch batch_jobs/parallel_generate_kb_ds.py --dataset_name $DATASET_NAME --dataset_mode $DATASET_MODE --output_dir $OUTPUT_DIR
- Just calling the script on your local machine:
-
The script uses pipeline
video_to_clauses_pipelinedefined inrecipesdirectory to process the videos in the dataset. One may create their own pipeline by defining a new function in therecipesdirectory and using it in thegenerate_kb_dataset.pyscript. -
The script will generate the knowledge base dataset in the format of
downstream/generated_kbs/$DATETIME_$DATASET_NAME_$DATASET_MODE_$UNIQUE_UUID. -
The generated knowledge base dataset can be used for downstream tasks (i.e., generate the knowledge graphs).
-
Note if you want to generate the knowledge base dataset in a different format, just implement your own VideoKnowledge builder in
utils/kb_builder.pyand use it in thegenerate_kb_dataset.pywhere it is passed to the DatasetWriter in theutils/kb_dataset_writer.py.
- Ensure the knowledge base dataset is generated in the previous example.
- The generated knowledge base dataset is in the format of
downstream/generated_kbs/$DATETIME_$DATASET_NAME_$DATASET_MODE_$UNIQUE_UUID.
- Run the
generate_videos_kgs.pyscript to generate the knowledge graphs.
-
The script requires the following arguments:
--kb_dir: The directory containing the generated knowledge base datasets.--output_dir: The output directory to save the generated knowledge graphs.- other optional arguments can be found in the script.
-
Example usage (two ways to run this script):
- Just calling the script on your local machine:
python generate_videos_kgs.py --kb_dir $KB_DIR --output_dir $OUTPUT_DIR - As a batch job using single node of
slurm:sbatch batch_jobs/generate_kgs.py --kb_dir $KB_DIR --output_dir $OUTPUT_DIR
- Just calling the script on your local machine:
-
Ensure you have
cudaversion11.8installed. -
Install
python 3.8using optionally conda byconda create -n pg python=3.8thenconda activate pg. -
Optional if running in WSL:
-
There is an issue with running matplotlib along opencv (supposedly) on wsl, can be fixed by
pip install PyQt6==6.3.1 -
For interactive features and scripts on WSL, might need to run:
sudo apt install graphviz-dev graphviz -
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
-
-
Install Everything:
sh install_all.sh # will use your activated conda environment (i.e., pg from step 2) -
Download the models and assets as mentioned below. (TODO: add to a script)
-
Note that by default if you are downloading/processing youtube videos, by default you will be prompted interactively in the first run to authenticate with your google account to download the videos. This can be disabled by
utils/core.pydownload_videosfunction, but some videos might fail to download then due to restrictions.
pip install loguru vidgear scikit-image scikit-learn faiss-gpu opencv-python numpy pandas ffmpeg joblib
pip install git+https://github.com/oncename/pytube.git@6c45936b9703ce986ccb8d0d3595c7974716f94b
sudo apt install graphviz-dev graphviz
pip install pygraphviz Graphviz
wget -P __assets__/models/coherence_momentum https://storage.googleapis.com/sgnlp-models/models/coherence_momentum/config.json
wget -P __assets__/models/coherence_momentum https://storage.googleapis.com/sgnlp-models/models/coherence_momentum/pytorch_model.bin
pip install sgnlp --no-deps
# TINY VERSION OF HQ SAM to be used by HQEfficientSAM + Original SAM library
wget -P __assets__/models/sam https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_tiny.pth
pip install "git+https://github.com/IDEA-Research/Grounded-Segment-Anything.git#egg=segment_anything&subdirectory=segment_anything"
pip install segment-anything-hq
# MOBILE SAM
wget -P __assets__/models/sam https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt
pip install git+https://github.com/ChaoningZhang/MobileSAM.git
# Recognize-Anything-Model (RAM)
wget -P __assets__/models/ram https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth
pip install git+https://github.com/xinyu1205/recognize-anything.git
# Ensure the updated versions of torch and transformers
pip install torch==2.0.1 transformers==4.31
#### GroundingDINO models and config files
wget -P __assets__/models/groundingdino https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget -P __assets__/models/groundingdino/config https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py
#### Or alternatively for referential grounding
wget -P __assets__/models/groundingdino https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
wget -P __assets__/models/groundingdino/config https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinB_cfg.py
#### GroundingDINO implementation
sudo apt-get install gcc-10
CXX=g++-10 CC=gcc-10 LD=g++-10 pip install git+https://github.com/IDEA-Research/GroundingDINO.git
pip install easyocr
# Concreteness Database used for scene-graph-parsing depenendencies filtering
mkdir -p __assets__ && cd __assets__ && git clone https://github.com/ArtsEngine/concreteness
pip install SceneGraphParser
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_md
python -m spacy download en_core_web_lg
python -m spacy download en_core_web_trf
# Sentence into Clauses
pip install inflect
pip install git+https://github.com/mmxgn/spacy-clausie.git
# Coreference Resolution
## 1. fcoref implementation option
pip install fastcoref
## 2. espacy coref implementation option
pip install spacy-experimental
pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl
# ensure installed spacy-transformers that is compatible with transformers == 4.31
pip install git+https://github.com/adrianeboyd/spacy-transformers.git@feature/torch-load-strict-backoff
# ensure transformers==4.31 is installed again
pip install transformers==4.31
pip install SpeechRecognition soundfile ffmpeg-python
pip install openai-whisper --no-deps
# If on slurm (ex. USC HPC), ensure loading ffmpeg
module load ffmpeg
# otherwise ensure installing it
sudo apt-get install ffmpeg
pip install git+https://github.com/BasRizk/optimum
InstructBLIP (blip2_vicuna_instruct with vicuna7b)
git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .
InstructBLIP uses frozen Vicuna 7B and 13B models Give instruction in https://github.com/lm-sys/FastChat:
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip install --upgrade pip # enable PEP 660 support
pip install -e .
Then with limited cpu-memory according to https://github.com/lm-sys/FastChat#low-cpu-memory-conversion:
Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory
On WSL: https://joe.blog.freemansoft.com/2022/01/setting-your-memory-and-swap-for-wsl2.html
a. Vicuna-7B
export MODELS_PATH_PREFIX=../models
mkdir -p $MODELS_PATH_PREFIX
python -m fastchat.model.apply_delta \
--base-model-path $MODELS_PATH_PREFIX/llama-7b \
--target-model-path $MODELS_PATH_PREFIX/vicuna-7b \
--delta-path lmsys/vicuna-7b-delta-v1.1 \
--low-cpu-mem
b. Vicuna-13B
export MODELS_PATH_PREFIX=../models
python -m fastchat.model.apply_delta \
--base-model-path $MODELS_PATH_PREFIX/llama-13b \
--target-model-path $MODELS_PATH_PREFIX/vicuna-13b \
--delta-path lmsys/vicuna-13b-delta-v1.1