This project aims to automate the extraction and labeling of critical numerical data displayed within ophthalmological surgery videos. By leveraging CV+OCR(Optical Character Recognition) techniques, the repository provides tools to:
- Identify and extract numerical values appearing on the video feed.
- Label these extracted numbers based on the specific stage of the surgery they correspond to.
- Provide a structured dataset for further analysis of surgical procedures, potentially leading to insights for research, training, and quality improvement in ophthalmology.
- Video Analysis: Processes ophthalmological surgery videos as input.
- On-Screen OCR: Employs OCR to recognize numerical data displayed on the video frames.
- Stage Identification: Aims to identify different stages of the surgery to provide context to the extracted data.
- Data Labeling: Labels the extracted numerical data according to the identified surgical stage
- Output: Generates structured output (e.g., jsonl) containing the extracted numerical data and their corresponding labels and timestamps.
-
Clone the repository:
git clone <your-repository-url> cd SurgeryOCR
-
Install dependencies:
pip install -r requirements.txt
This is the standard operating procedure for analyzing surgery videos. These commands assume you are running them from the root of the project directory.
-
Place Videos: Put all surgery videos (e.g.,
.mp4files) into thedata/directory. -
Extract ROI Images: Pre-process the videos to extract ROI images. This step significantly speeds up the subsequent analysis and UI responsiveness. The following command processes all videos in the
data/directory and extracts all configured ROIs, including saving full frames for faster UI loading.python extract_roi_images.py --video ./data --region all --save-full-frames
-
Run OCR Analysis: Perform change detection and OCR on the extracted ROIs. This command processes all videos and generates
.jsonlfiles with the OCR results.python surgery_analysis_process.py --video ./data --region all
-
Analyze Surgery Stages: Run pattern matching to detect and segment different surgical stages based on the visual patterns in the ROIs.
python stage_pattern_analysis.py --video ./data
-
Review Results: Launch the graphical user interface to load a video and visually inspect the analysis results, including OCR data and surgical stage segments.
python video_annotator_gui.py