Content Analysis Playground (UNDER CONSTRUCTION)

This repository serves as a playground for experiments to understand and analyze contents from videos with particular focus on educational videos.

Structure

main.py: TODO
prompting_gui.py: TODO
run_img.py: TODO
generate_kb_dataset.py: TODO
generate_videos_kgs.py: TODO
video_loader:
- data_window.py: TODO
- frame.py: TODO
- inference.py: TODO
- video_frames_generator.py: TODO
  - VideoFramesGenerator: TODO
  - VideoReaderEndpoint: TODO
    - VidGearEndpoint: TODO
    - DecordEndpoint: TODO
pipelines
transcription:
- aligner.py: Holds AlignedDataWindowGenerator, responsible for aligning transcription dictionary from i.e. whisper with video windows of frame, generating DataWindow.
- coherence_calculation.py: TODO
- whisper_transcriber_$ENDPOINT.py: TODO
utils:
- core.py: TODO
- datasets.py: TODO
- kb_builder.py: TODO
- kb_reader.py: TODO
- kb_dataset_writer.py: TODO
- kb_dataset_reader.py: TODO
- storage_config.py: TODO
- visualizer.py: TODO
downstream:
- sk_custom.py: TODO
- skorch_custom.py: TODO
- torch_custom.py: TODO
- train_eval_utils.py: TODO
- video_type_classifiers.py: TODO
- video_type_classification.ipynb: TODO
data/data/$SUB_GROUP: TODO
downstream/generated_kbs/$DATETIME_$DATASET_NAME_$DATASET_MODE_$UNIQUE_UUID: TODO
`downstream/datasets/{$DATASET_NAME.csv | $DATASET_NAME/$DATASET_MODE.json}: TODO
ontology: TODO
- node: TODO
  - base.py: TODO
  - synset_node.py: TODO
  - virtual_synset: TODO
    - VirtualSynset: TODO
    - Classifier: TODO
    - VirtualSynsetDB: TODO
- prompt.py: TODO
- video_kg.py: TODO
- graph_table.py: TODO
- graph_construction.py: TODO
analysis: TODO
batch_jobs:
- load_env_on_carc.sh: TODO
- generate_kb_ds.job: TODO
- generate_kgs.py: TODO
- parallel_generate_kb_ds.py: TODO

An example of converting a collection of videos (i.e., videos dataset) into a knowledge base dataset using a pipeline

Ensure the database is constructed in one of the two formats (i.e., csv or json with splits).

csv format: downstream/datasets/{$DATASET_NAME.csv
json format: downstream/datasets/$DATASET_NAME/$DATASET_MODE.json
Note if you want some other format, just implement your own reading method in utils/datasets.py and use it in the generate_kb_dataset.py script.

Run the generate_kb_dataset.py script to generate the knowledge base dataset.

The script requires the following arguments:
- --dataset_name: The name of the dataset to be processed.
- --dataset_mode: The mode of the dataset to be processed (default: test).
- --output_dir: The output directory to save the generated knowledge base dataset.
- other optional arguments can be found in the script.
Example usage (three ways to run this script):
- Just calling the script on your local machine: python generate_kb_dataset.py --dataset_name $DATASET_NAME --dataset_mode $DATASET_MODE --output_dir $OUTPUT_DIR
- As a batch job using single node of slurm: sbatch batch_jobs/generate_kb_ds.job --dataset_name $DATASET_NAME --dataset_mode $DATASET_MODE --output_dir $OUTPUT_DIR
- As a batch job distributed over multiple nodes where duplicate processes are created taking chunks of the datasets: sbatch batch_jobs/parallel_generate_kb_ds.py --dataset_name $DATASET_NAME --dataset_mode $DATASET_MODE --output_dir $OUTPUT_DIR
The script uses pipeline video_to_clauses_pipeline defined in recipes directory to process the videos in the dataset. One may create their own pipeline by defining a new function in the recipes directory and using it in the generate_kb_dataset.py script.
The script will generate the knowledge base dataset in the format of downstream/generated_kbs/$DATETIME_$DATASET_NAME_$DATASET_MODE_$UNIQUE_UUID.
The generated knowledge base dataset can be used for downstream tasks (i.e., generate the knowledge graphs).
Note if you want to generate the knowledge base dataset in a different format, just implement your own VideoKnowledge builder in utils/kb_builder.py and use it in the generate_kb_dataset.py where it is passed to the DatasetWriter in the utils/kb_dataset_writer.py.

An example using the generated Video Knowledge Bases by generating Video Knowledge Graphs `VideoKG`

Ensure the knowledge base dataset is generated in the previous example.

The generated knowledge base dataset is in the format of downstream/generated_kbs/$DATETIME_$DATASET_NAME_$DATASET_MODE_$UNIQUE_UUID.

Run the generate_videos_kgs.py script to generate the knowledge graphs.

The script requires the following arguments:
- --kb_dir: The directory containing the generated knowledge base datasets.
- --output_dir: The output directory to save the generated knowledge graphs.
- other optional arguments can be found in the script.
Example usage (two ways to run this script):
- Just calling the script on your local machine: python generate_videos_kgs.py --kb_dir $KB_DIR --output_dir $OUTPUT_DIR
- As a batch job using single node of slurm: sbatch batch_jobs/generate_kgs.py --kb_dir $KB_DIR --output_dir $OUTPUT_DIR

Installation of Frameworks with Example Pipeline recipe dependency and example task:

Ensure you have cuda version 11.8 installed.
Install python 3.8 using optionally conda by conda create -n pg python=3.8 then conda activate pg.
Optional if running in WSL:
- There is an issue with running matplotlib along opencv (supposedly) on wsl, can be fixed by pip install PyQt6==6.3.1
- For interactive features and scripts on WSL, might need to run: sudo apt install graphviz-dev graphviz
- export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Install Everything:

sh install_all.sh # will use your activated conda environment (i.e., pg from step 2)

Download the models and assets as mentioned below. (TODO: add to a script)
Note that by default if you are downloading/processing youtube videos, by default you will be prompted interactively in the first run to authenticate with your google account to download the videos. This can be disabled by utils/core.py download_videos function, but some videos might fail to download then due to restrictions.

Below Under Construction

Fundementals

pip install loguru vidgear scikit-image scikit-learn faiss-gpu opencv-python numpy pandas ffmpeg joblib
pip install git+https://github.com/oncename/pytube.git@6c45936b9703ce986ccb8d0d3595c7974716f94b

Analysis

sudo apt install graphviz-dev graphviz
pip install pygraphviz Graphviz

Coherency `aisingapore/coherence-momentum`

wget -P __assets__/models/coherence_momentum https://storage.googleapis.com/sgnlp-models/models/coherence_momentum/config.json
wget -P __assets__/models/coherence_momentum https://storage.googleapis.com/sgnlp-models/models/coherence_momentum/pytorch_model.bin

pip install sgnlp --no-deps

Segmentation

# TINY VERSION OF HQ SAM to be used by HQEfficientSAM + Original SAM library
wget -P __assets__/models/sam https://huggingface.co/lkeab/hq-sam/resolve/main/sam_hq_vit_tiny.pth
pip install "git+https://github.com/IDEA-Research/Grounded-Segment-Anything.git#egg=segment_anything&subdirectory=segment_anything"

pip install segment-anything-hq

# MOBILE SAM
wget -P __assets__/models/sam https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt
pip install git+https://github.com/ChaoningZhang/MobileSAM.git

ImageTagging

# Recognize-Anything-Model (RAM)
wget -P __assets__/models/ram https://huggingface.co/spaces/xinyu1205/Tag2Text/resolve/main/ram_swin_large_14m.pth

pip install git+https://github.com/xinyu1205/recognize-anything.git
# Ensure the updated versions of torch and transformers
pip install torch==2.0.1 transformers==4.31

OpenVocab Image Detection

#### GroundingDINO models and config files

wget -P __assets__/models/groundingdino https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
wget -P __assets__/models/groundingdino/config  https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinT_OGC.py

#### Or alternatively for referential grounding

wget -P __assets__/models/groundingdino https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
wget -P __assets__/models/groundingdino/config https://raw.githubusercontent.com/IDEA-Research/GroundingDINO/main/groundingdino/config/GroundingDINO_SwinB_cfg.py

#### GroundingDINO implementation

sudo apt-get install gcc-10
CXX=g++-10 CC=gcc-10 LD=g++-10 pip install git+https://github.com/IDEA-Research/GroundingDINO.git

OCR

pip install easyocr

Sentences to Clauses

Concreteness Database for SceneGraphParser dependencies filtering

# Concreteness Database used for scene-graph-parsing depenendencies filtering
mkdir -p __assets__ && cd __assets__ && git clone https://github.com/ArtsEngine/concreteness

SceneGraphParser + Coreferee + en_coreference_web_trf + fastcoref

pip install SceneGraphParser
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_md
python -m spacy download en_core_web_lg
python -m spacy download en_core_web_trf

# Sentence into Clauses
pip install inflect
pip install git+https://github.com/mmxgn/spacy-clausie.git

# Coreference Resolution
## 1. fcoref implementation option
pip install fastcoref

## 2. espacy coref implementation option
pip install spacy-experimental
pip install https://github.com/explosion/spacy-experimental/releases/download/v0.6.1/en_coreference_web_trf-3.4.0a2-py3-none-any.whl


# ensure installed spacy-transformers that is compatible with transformers == 4.31
pip install git+https://github.com/adrianeboyd/spacy-transformers.git@feature/torch-load-strict-backoff
# ensure transformers==4.31 is installed again
pip install transformers==4.31

Transcription

pip install SpeechRecognition soundfile ffmpeg-python
pip install openai-whisper --no-deps

# If on slurm (ex. USC HPC), ensure loading ffmpeg
module load ffmpeg
# otherwise ensure installing it
sudo apt-get install ffmpeg

Captioning

pip install git+https://github.com/BasRizk/optimum

Archived (Not used for now - anything below here)

InstructBLIP (`blip2_vicuna_instruct` with `vicuna7b`)

git clone https://github.com/salesforce/LAVIS.git
cd LAVIS
pip install -e .

Download and install from src

InstructBLIP uses frozen Vicuna 7B and 13B models Give instruction in https://github.com/lm-sys/FastChat:

git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Then with limited cpu-memory according to https://github.com/lm-sys/FastChat#low-cpu-memory-conversion:

Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory

On WSL: https://joe.blog.freemansoft.com/2022/01/setting-your-memory-and-swap-for-wsl2.html

Convert weights based on delta from llama-7b

a. Vicuna-7B

export MODELS_PATH_PREFIX=../models
mkdir -p $MODELS_PATH_PREFIX
python -m fastchat.model.apply_delta \
    --base-model-path $MODELS_PATH_PREFIX/llama-7b \
    --target-model-path $MODELS_PATH_PREFIX/vicuna-7b \
    --delta-path lmsys/vicuna-7b-delta-v1.1 \
    --low-cpu-mem

b. Vicuna-13B

export MODELS_PATH_PREFIX=../models
python -m fastchat.model.apply_delta \
    --base-model-path $MODELS_PATH_PREFIX/llama-13b \
    --target-model-path $MODELS_PATH_PREFIX/vicuna-13b \
    --delta-path lmsys/vicuna-13b-delta-v1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Content Analysis Playground (UNDER CONSTRUCTION)

Structure

An example of converting a collection of videos (i.e., videos dataset) into a knowledge base dataset using a pipeline

An example using the generated Video Knowledge Bases by generating Video Knowledge Graphs `VideoKG`

Installation of Frameworks with Example Pipeline recipe dependency and example task:

Below Under Construction

Fundementals

Analysis

Coherency `aisingapore/coherence-momentum`

Segmentation

ImageTagging

OpenVocab Image Detection

OCR

Sentences to Clauses

Concreteness Database for SceneGraphParser dependencies filtering

SceneGraphParser + Coreferee + en_coreference_web_trf + fastcoref

Transcription

Captioning

Archived (Not used for now - anything below here)

InstructBLIP (`blip2_vicuna_instruct` with `vicuna7b`)

Download and install from src

Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory

Convert weights based on delta from llama-7b

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 291 Commits
analysis		analysis
batch_jobs		batch_jobs
demos		demos
downstream		downstream
ontology		ontology
pipelines		pipelines
recipes		recipes
tools		tools
transcription		transcription
utils		utils
video_loader		video_loader
.flake8		.flake8
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
generate_kb_dataset.py		generate_kb_dataset.py
generate_videos_kgs.py		generate_videos_kgs.py
install_all.sh		install_all.sh
main.py		main.py
mypy.ini		mypy.ini
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
run_img.py		run_img.py
test_graph_construction.py		test_graph_construction.py

Folders and files

Latest commit

History

Repository files navigation

Content Analysis Playground (UNDER CONSTRUCTION)

Structure

An example of converting a collection of videos (i.e., videos dataset) into a knowledge base dataset using a pipeline

An example using the generated Video Knowledge Bases by generating Video Knowledge Graphs VideoKG

Installation of Frameworks with Example Pipeline recipe dependency and example task:

Below Under Construction

Fundementals

Analysis

Coherency aisingapore/coherence-momentum

Segmentation

ImageTagging

OpenVocab Image Detection

OCR

Sentences to Clauses

Concreteness Database for SceneGraphParser dependencies filtering

SceneGraphParser + Coreferee + en_coreference_web_trf + fastcoref

Transcription

Captioning

Archived (Not used for now - anything below here)

InstructBLIP (blip2_vicuna_instruct with vicuna7b)

Download and install from src

Create a large swap file and rely on the operating system to automatically utilize the disk as virtual memory

Convert weights based on delta from llama-7b

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

An example using the generated Video Knowledge Bases by generating Video Knowledge Graphs `VideoKG`

Coherency `aisingapore/coherence-momentum`

InstructBLIP (`blip2_vicuna_instruct` with `vicuna7b`)

Packages