Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation

TL;DR: This framework connects perceptual understanding and cognitive reasoning to achieve egocentric video forecasting: Perception (Hand-Object) → Reasoning (Cognitive Anticipation) → Future Action Prediction

🧩 Project Structure

.
├── HandObject/                 
├── CognitiveReasoning/          
├── pretrain_model/            
└── data/

The HandObject and CognitiveReasoning modules have their own detailed README files and scripts for training and evaluation.

🛠️ Installation

1. Create Environment

conda create -n your_env python=3.10 pip -y
conda activate your_env

2. Install PyTorch (CUDA 12.4)

conda install -y pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia

For CPU-only users:
conda install -y pytorch torchvision torchaudio cpuonly -c pytorch

3. Install Dependencies

git clone https://github.com/modelscope/ms-swift.git
cd ms-swift && pip install -e .
pip install deepspeed flash-attn --no-build-isolation
pip install -r requirements.txt

If you encounter:

ImportError: libGL.so.1: cannot open shared object file

fix it with:

sudo apt-get update && sudo apt-get install -y ffmpeg libsm6 libxext6

4. Verify

python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

🎯 Pretrained Models

Download and store all pretrained weights under:

/pretrain_model/
  ├── EgoVideo/
  ├── all-MiniLM-L6-v2/
  ├── SAM2/
  ├── Hand Object Detector/
  └── Qwen2.5-VL/

📂 Data Preparation

1. Datasets

Supports Ego4D, EPIC-Kitchens-55, and EGTEA Gaze+.
Follow the official download instructions and organize under ./data.

2. Feature Extraction

Run Hand Object Detector, SAM2, and EgoVideo in sequence, following their official tutorials, to obtain both frame features and HOI (hand-object) features.

🚀 Running the Pipeline

Stage 1 - Hand-Object Semantic Action Recognition
Extracts and fuses HOI and frame features.
→ See HandObject/README.md for training and testing.
Stage 2 - Explicit Cognitive Reasoning for Anticipation
Fine-tunes Qwen2.5-VL-7B with GRPO reinforcement learning.
→ See CognitiveReasoning/README.md for details.

🙏 Acknowledgements

We thank the authors of Ego4D, EPIC-Kitchens-55, and EGTEA Gaze+ for providing the open-source datasets that support our experiments.

We also thank the developers of Hand Object Detector, SAM2, and EgoVideo for their released pretrained models and codebases.

Finally, we acknowledge the ms-swift framework for enabling efficient GRPO-based reinforcement learning in our cognitive reasoning module.

We also invite readers to check out our challenge report, which achieved 1st place in the Long-Term Action Anticipation, Ego4D Challenge @ CVPR 2025:
🔗 Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation (arXiv:2506.02550)

🔖 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
CognitiveReasoning		CognitiveReasoning
HandObject		HandObject
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation

🧩 Project Structure

🛠️ Installation

1. Create Environment

2. Install PyTorch (CUDA 12.4)

3. Install Dependencies

4. Verify

🎯 Pretrained Models

📂 Data Preparation

1. Datasets

2. Feature Extraction

🚀 Running the Pipeline

🙏 Acknowledgements

🔖 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation

🧩 Project Structure

🛠️ Installation

1. Create Environment

2. Install PyTorch (CUDA 12.4)

3. Install Dependencies

4. Verify

🎯 Pretrained Models

📂 Data Preparation

1. Datasets

2. Feature Extraction

🚀 Running the Pipeline

🙏 Acknowledgements

🔖 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages