Scan Copy SDK

A comprehensive Python SDK for copying scans between database instances, downloading scan images, and generating analysis reports with batch processing, checkpointing, and retry mechanisms.

Features

Core Functionalities

🔄 Scan Copying: Copy scans from source to target database instances with batch processing
📥 Image Downloading: Download scan images with intelligent naming based on scan ID, section name, and store POG ID
📊 Data Analysis: Comprehensive scan data analysis with POG and OSA metrics (optional)
🎨 Highlighted Reports: Color-coded Excel reports for easy identification of differences
📁 Custom Results Path: Choose where to save test results
⚙️ Interactive Workflow: Step-by-step guided process with functionality selection

Advanced Features

🔁 Batch Processing: Process scans in configurable batches (default: 10 scans per batch)
💾 Checkpointing: Automatic progress saving with resume capability
🔄 Retry Logic: Exponential backoff retry mechanism for failed operations
🧵 Multi-threading: Concurrent file downloads/uploads for improved performance
⏱️ Extended Timeouts: 30-minute timeout support for large batch operations
📝 Real-time Logging: Live console output with progress tracking
🛡️ Error Handling: Comprehensive error handling with detailed logging

Installation

Prerequisites

Python 3.8 or higher
Access to source and target database instances
Network access to database hosts and API endpoints

Steps

Clone the repository:

git clone https://github.com/retech-us/MATESTS.git
cd MATESTS

Set up configuration files:

# Copy template files (these contain no sensitive data)
cp config.py.template config.py
cp config.json.template config.json

Install Python dependencies:
```
pip install -r requirements.txt
```
Run the SDK:
```
python createScansSDK.py
```

Usage

Quick Start

Run the SDK: python createScansSDK.py
Choose functionality: Select between Copy Scan or Download Images
Follow the interactive prompts to configure your workflow
Review generated reports in the results folder

Workflows

Workflow 1: Copy Scan

The Copy Scan workflow allows you to copy scans from a source instance to a target instance with optional analysis.

Step 0: Choose Functionality

Select option 1 for Copy Scan

Step 1: Configuration

Enter database credentials:

Source Database Instance
Source Database Password
Source Username
Source Password
Target Database Instance
Target Database Password
Target Username
Target Password

Step 1.5: Results Path

Choose where to save results:

Option 1: Default path (./testResults/)
Option 2: Custom path (specify your own directory)

Step 2: Source Scan IDs

Enter comma-separated scan IDs to copy:

Can use existing scan IDs from config if available
Or enter new scan IDs manually

Step 3: Target Store

Enter the target store ID where scans will be copied

Step 4: Run Copy Script

Choose which copy script to use:

Option 1: mappedScans.py (mapped scans script)
Option 2: autoScans.py (auto scans script with enhanced features)

Checkpoint Handling:

If checkpoint files are detected, you'll be prompted to:
- Resume from checkpoint (continues from last completed batch)
- Restart from scratch (deletes checkpoint and starts fresh)

Batch Processing:

Scans are processed in batches of 10 (configurable)
Each batch includes:
- File downloads (concurrent, up to 20 workers)
- File uploads (concurrent, up to 20 workers)
- Scan creation (concurrent, up to 10 workers)
Progress is saved after each batch completion
Failed batches are automatically retried (up to 3 retries)

Real-time Progress:

Batch start/completion logs with scan IDs
Download/upload progress percentages
Success/failure counts per batch
Estimated remaining time

Step 4.5: Analysis Option

After copying completes, choose whether to run analysis:

Option Y: Continue with analysis (proceeds to Step 5)
Option N: End workflow without analysis

Step 5: Get Target Scan IDs

Automatically extracts target scan IDs from mapping CSV if available
Or manually enter target scan IDs for analysis

Step 6: Generate Analysis

Creates comprehensive CSV and Excel reports
Includes color-coded highlighting for differences
Generates source and target scan analysis files

Workflow 2: Download Images

The Download Images workflow downloads scan images with intelligent naming.

Step 0: Choose Functionality

Select option 2 for Download Images

Step 1: Configuration

Enter source database credentials:

Source Database Instance
Source Database Password
Source Username
Source Password

Step 2: Source Scan IDs

Enter comma-separated scan IDs to download images from

Step 3: Download Folder

Choose download location:

Option 1: Default path (./downloaded_images/)
Option 2: Custom path

Step 4: Batch Download

Images are downloaded in batches with:

Concurrent downloads (up to 20 workers per batch)
Progress tracking per batch
Automatic retry on failures

File Naming Convention:

Format: {scan_id}_{section_name}_{store_pog_id}.ext
If section and POG exist: 12345_SectionName_67890.jpg
If only section exists: 12345_SectionName.jpg
If only POG exists: 12345_67890.jpg
If neither exists: 12345.jpg

Batch Processing

Overview

Both copy and download workflows use batch processing to handle large numbers of scans efficiently.

Batch Configuration

Default Batch Size: 10 scans per batch
Concurrent Workers:
- Downloads: 20 workers
- Uploads: 20 workers
- Scan Creation: 10 workers

Batch Processing Features

Automatic Batching: Scans are automatically divided into batches
Progress Tracking: Real-time progress for each batch
Error Isolation: Failures in one batch don't stop other batches
Retry Logic: Failed batches are retried automatically
Checkpointing: Progress is saved after each batch

Batch Retry Logic

Batches are retried if:

Download success rate < 80%
Upload success rate < 80%
Scan creation success rate < 50%

Retry configuration:

Maximum retries: 3 attempts per batch
Exponential backoff: 5 seconds × retry attempt
Total timeout: 30 minutes per batch

Batch Logging

Each batch logs:

Batch number and total batches
Source scan IDs in the batch
Download/upload progress
Success/failure counts
Completion status

Example log output:

================================================================================
[BATCH 1/5] Starting batch 1
[BATCH 1/5] Processing scans 1-10 of 50
[BATCH 1/5] Source Scan IDs: 12345, 12346, 12347, ...
================================================================================
[BATCH 1] Starting download of 25 files
[BATCH 1] [DOWNLOAD] 25/25 (100%)
[BATCH 1] Starting upload of 25 files
[BATCH 1] [UPLOAD] 25/25 (100%)
[BATCH 1] [SCAN] Creating 10 scans...
[BATCH 1] [SCAN] 10/10 (100%)
================================================================================
[BATCH 1/5] Batch 1 completed
[BATCH 1/5] Success: 10, Failed: 0
================================================================================

Checkpointing and Resume

Automatic Checkpointing

The system automatically saves progress after each batch:

Checkpoint file: checkpoint_{timestamp}.json
Contains: Completed batch numbers, scan mappings, failed scan count

Resume Functionality

When restarting:

System detects existing checkpoint files
Prompts user to resume or restart
If resuming:
- Skips already completed batches
- Continues from next incomplete batch
- Preserves existing scan mappings
If restarting:
- Deletes checkpoint files
- Starts from beginning

Checkpoint File Structure

{
  "completed_batches": [1, 2, 3],
  "scan_mapping": [
    {"source_scan_id": 12345, "target_scan_id": 67890},
    ...
  ],
  "failed_scans": 0
}

Error Handling

Retry Mechanisms

API Retries: Exponential backoff for 502/503 errors
- Base delay: 1-2 seconds
- Max delay: 60 seconds
- Max retries: 3 attempts
- Total timeout: 30 minutes
Batch Retries: Automatic retry for failed batches
- Retry conditions based on success rates
- Exponential backoff between retries
- Maximum 3 retry attempts
Individual Operation Retries: Per-file and per-scan retries
- Handles transient network errors
- Timeout handling for long operations

Error Logging

Detailed error logging includes:

HTTP status codes (especially 400 Bad Request)
API request payloads
API response bodies
File/scan IDs causing errors
Stack traces for debugging

Error Recovery

Failed files don't block batch completion
Failed scans are logged and tracked
Partial success is allowed (continues with available data)
Checkpoint saves progress even with some failures

File Structure

MATESTS/
├── createScansSDK.py       # Main SDK with interactive workflow
├── mappedScans.py          # Mapped scans copying script
├── autoScans.py            # Auto scans copying script with batch processing
├── downloadScanImages.py   # Image download script with intelligent naming
├── scanDataAnalysis.py     # Analysis functions
├── config.py.template      # Configuration template (safe to commit)
├── config.json.template    # JSON configuration template (safe to commit)
├── requirements.txt        # Python dependencies
├── README.md               # This file
├── testResults/            # Default results directory
│   └── run_YYYYMMDD_HHMMSS/
│       ├── initial_scan_mapping_*.csv
│       ├── scan_mapping_updated_*.csv
│       ├── source_scandetails_*.csv
│       ├── target_scandetails_*.csv
│       ├── analysis_with_comments_*.csv
│       └── analysis_with_comments_*_highlighted.xlsx
└── downloaded_images/      # Default download directory
    └── {scan_id}_{section}_{pog}.ext

Output Files

Copy Scan Workflow Outputs

Scan Mapping CSV: Maps source scan IDs to target scan IDs
- initial_scan_mapping_YYYYMMDD_HHMMSS.csv
- scan_mapping_updated_YYYYMMDD_HHMMSS.csv
Analysis CSV Files:
- source_scandetails_YYYYMMDD_HHMMSS.csv: Source scan analysis
- target_scandetails_YYYYMMDD_HHMMSS.csv: Target scan analysis
- analysis_with_comments_YYYYMMDD_HHMMSS.csv: Combined analysis
Excel Report:
- analysis_with_comments_YYYYMMDD_HHMMSS_highlighted.xlsx: Color-coded Excel file

Download Images Workflow Outputs

Image files named: {scan_id}_{section_name}_{store_pog_id}.ext
Files organized in the specified download folder
Progress logs in console

Color Coding (Analysis Reports)

The highlighted analysis uses the following color scheme:

🔴 Red: Wrong POG Name Mapping (different POG names between source and target)
🟠 Orange: Same POG Name but Different Section (same POG, different sections)
🟡 Yellow: Target Has Additional Section (source has no additional section, target does)
🔵 Blue: Target Has Higher POG% Than Source (target POG% > source POG%)
🟢 Green: No Issues (all mappings are correct)

Configuration

Database Configuration

The SDK prompts for:

Source Database Instance
Source Database Password
Source Username
Source Password
Target Database Instance (for copy workflow)
Target Database Password (for copy workflow)
Target Username (for copy workflow)
Target Password (for copy workflow)

Security Features

✅ Password Masking: All passwords are hidden during input using getpass
✅ No Hardcoded Credentials: All configuration is collected interactively
✅ Secure Storage: Credentials are only stored temporarily in memory
✅ No Password Echo: Passwords are never displayed on screen
✅ Config Updates: Automatically updates config.py with current values

Environment Configuration

The SDK automatically updates config.py with:

Database credentials
Scan IDs for copying/downloading
Target store ID
Other runtime values

Performance Optimization

Optimizations for Large Batches

Extended Timeouts:
- API calls: 60-120 seconds
- Total operation: 30 minutes
- Prevents premature timeouts on large batches
Concurrent Processing:
- Multiple threads for downloads/uploads
- Parallel scan creation
- Efficient resource utilization
Batch Processing:
- Processes scans in manageable chunks
- Reduces memory usage
- Enables progress tracking
Real-time Output:
- Flushed console output for immediate visibility
- Progress indicators
- Estimated remaining time

Troubleshooting

Common Issues

Database Connection Errors
- Verify credentials and network access
- Check database instance names
- Ensure firewall allows connections
Timeout Errors
- Timeouts are set to 30 minutes for large batches
- Check network stability
- Verify API endpoint availability
Permission Errors
- Ensure write access to results directory
- Check download folder permissions
- Verify file system permissions
Missing Dependencies
- Run pip install -r requirements.txt
- Verify Python version (3.8+)
- Check for missing system libraries
Checkpoint Issues
- Checkpoint files can be manually deleted to restart
- Verify JSON file integrity
- Check disk space availability
400 Bad Request Errors
- Check detailed error logs for request payload
- Verify scan data structure
- Review API response for specific errors
NoneType Errors
- Check API responses for missing data
- Verify scan information completeness
- Review error logs for specific fields

Error Messages

"Password authentication failed": Check database credentials
"Permission denied creating folder": Check directory permissions
"No scan mapping CSV file found": Ensure copy script completed successfully
"Checkpoint file corrupted": Delete checkpoint and restart
"Batch processing failed": Check error logs for specific batch issues
"API returned None response": Verify API endpoint and authentication

Debugging Tips

Enable Verbose Logging: Check console output for detailed logs
Review Checkpoint Files: Inspect JSON files for progress state
Check API Responses: Review 400 error logs for payload issues
Verify Scan Data: Ensure scan IDs exist and have valid data
Network Diagnostics: Test connectivity to database and API endpoints

Dependencies

Required Python Packages

pandas>=1.5.0 - Data manipulation and analysis
numpy>=1.21.0 - Numerical computing
psycopg[binary]>=3.0.0 - PostgreSQL database adapter
requests>=2.28.0 - HTTP library for API calls
openpyxl>=3.0.0 - Excel file generation (optional, for Excel reports)

Installation

pip install -r requirements.txt

API Timeouts and Limits

Timeout Configuration

Authentication: 30 seconds
File Download (metadata): 60 seconds
File Download (content): 120 seconds
File Upload: 120 seconds
Scan Creation: 60 seconds
Total Batch Operation: 30 minutes

Rate Limiting

Automatic retry with exponential backoff for rate limit errors
Concurrent operations limited by worker counts
Respects API rate limits through backoff mechanism

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly with various batch sizes
Update documentation if needed
Submit a pull request

License

This project is part of the retech-us organization and follows the organization's licensing terms.

Support

For support and questions:

Create an issue in this repository
Contact the development team
Check the troubleshooting section above
Review error logs for specific issues

Changelog

Recent Updates

Batch Processing: Added configurable batch processing with progress tracking
Checkpointing: Implemented automatic checkpointing with resume capability
Retry Logic: Enhanced retry mechanisms with exponential backoff
Download Images: New workflow for downloading images with intelligent naming
Analysis Option: Made analysis optional after scan copying
Error Logging: Improved error logging with detailed API payload information
Performance: Optimized for handling 500+ scans with extended timeouts
Real-time Output: Added real-time console output with progress indicators

Note: This SDK is designed for internal use within the retech-us organization for scan copying, image downloading, and analysis workflows. All operations include comprehensive error handling, checkpointing, and retry mechanisms to ensure reliable processing of large datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
README.md		README.md
autoScans.py		autoScans.py
config.json.template		config.json.template
config.py.template		config.py.template
createScansSDK.py		createScansSDK.py
downloadScanImages.py		downloadScanImages.py
mappedScans.py		mappedScans.py
requirements.txt		requirements.txt
scanDataAnalysis.py		scanDataAnalysis.py

Folders and files

Latest commit

History

Repository files navigation

Scan Copy SDK

Features

Core Functionalities

Advanced Features

Installation

Prerequisites

Steps

Usage

Quick Start

Workflows

Workflow 1: Copy Scan

Step 0: Choose Functionality

Step 1: Configuration

Step 1.5: Results Path

Step 2: Source Scan IDs

Step 3: Target Store

Step 4: Run Copy Script

Step 4.5: Analysis Option

Step 5: Get Target Scan IDs

Step 6: Generate Analysis

Workflow 2: Download Images

Step 0: Choose Functionality

Step 1: Configuration

Step 2: Source Scan IDs

Step 3: Download Folder

Step 4: Batch Download

Batch Processing

Overview

Batch Configuration

Batch Processing Features

Batch Retry Logic

Batch Logging

Checkpointing and Resume

Automatic Checkpointing

Resume Functionality

Checkpoint File Structure

Error Handling

Retry Mechanisms

Error Logging

Error Recovery

File Structure

Output Files

Copy Scan Workflow Outputs

Download Images Workflow Outputs

Color Coding (Analysis Reports)

Configuration

Database Configuration

Security Features

Environment Configuration

Performance Optimization

Optimizations for Large Batches

Troubleshooting

Common Issues

Error Messages

Debugging Tips

Dependencies

Required Python Packages

Installation

API Timeouts and Limits

Timeout Configuration

Rate Limiting

Contributing

License

Support

Changelog

Recent Updates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Packages