README_EN.md

Stable123Keypoints Stage1

Keypoint Extraction Exploration Project Based on Zero123Plus Model - Stage 1 Research Report.

Project Overview

Stable123Keypoints aims to explore the application potential of the sudo-ai/zero123plus-v1.2 model in keypoint detection tasks. This stage focuses on evaluating the direct usability of Zero123Plus pre-trained weights under the same architecture as StableImageKeypoints v1.5.

Experimental Design

Testing Protocol

Baseline Model: sd-legacy/stable-diffusion-v1-5
Test Model: sudo-ai/zero123plus-v1.2
Network Architecture: Kept basically consistent with StableImageKeypoints v1.5
Comparison Dimensions:
- Loss function convergence
- Attention mechanism activation patterns
- Keypoint extraction effectiveness

Quick Start

Please refer to the environment configuration requirements of StableImageKeypoints v1.5.

Clone the Project

git clone https://github.com/SoarCraft/Stable123Keypoints.git
cd Stable123Keypoints

Install Dependencies

Follow the dependency installation process of StableImageKeypoints v1.5.
Preprocess Data

Run the following command to generate image matting results for Zero123Plus:
```
python -m datasets.cub_preprocess
```
Training/Inference

The remaining operation steps are consistent with the StableImageKeypoints project.

Experimental Results

Training Convergence Analysis

As shown in the figure, when training with Zero123Plus model weights, the loss function converges normally, initially indicating that the model has learning capability.

Attention Mechanism Analysis

However, through visualization analysis of the attention maps after the model is activated by context, we discovered a critical issue: the attention distribution exhibits a divergent state, failing to form the expected concentrated response pattern at keypoint locations.

Comparative Experiment Verification

To rule out the influence of loading methods, we conducted the following comparative tests:

Full Zero123Plus Pipeline Loading: Attention divergence ❌
Zero123Plus Weights Only (without Pipeline): Attention divergence ❌
Using stable-diffusion-v1-5 Weights (same architecture and configuration): Keypoint extraction normal ✅

Stage Conclusions

Core Findings

Without targeted code modifications, the Zero123Plus pre-trained weights cannot be directly applied to keypoint extraction tasks.

Although the loss function converges normally during model training, the model does not produce the expected response to pure context. Specifically:

✅ Training Feasibility: Loss function convergence is normal
❌ Functional Effectiveness: Attention mechanism not activated at keypoint locations
✅ Code Correctness: SD-1.5 weights work normally with the same code

Problem Attribution Analysis

Considering the minimal structural differences between Zero123Plus and Stable Diffusion v1.5, we infer:

The special operations introduced during Zero123Plus pre-training (such as multi-view condition injection, reference image attention, etc.) have fundamentally changed how the model's internal weights process encoder_hidden_states.

This change is not a simple feature extraction difference, but involves deep reconstruction of the attention mechanism, making it difficult for the model to produce spatially localized responses to pure text context like the original SD model.

Caution

Do not use FP16 precision
Using half-precision floating-point numbers will cause significant precision loss, which will prevent the model from converging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable123Keypoints Stage1

Project Overview

Experimental Design

Testing Protocol

Quick Start

Experimental Results

Training Convergence Analysis

Attention Mechanism Analysis

Comparative Experiment Verification

Stage Conclusions

Core Findings

Problem Attribution Analysis

FilesExpand file tree

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Stable123Keypoints Stage1

Project Overview

Experimental Design

Testing Protocol

Quick Start

Experimental Results

Training Convergence Analysis

Attention Mechanism Analysis

Comparative Experiment Verification

Stage Conclusions

Core Findings

Problem Attribution Analysis