A comprehensive Python SDK for copying scans between database instances, downloading scan images, and generating analysis reports with batch processing, checkpointing, and retry mechanisms.
- 🔄 Scan Copying: Copy scans from source to target database instances with batch processing
- 📥 Image Downloading: Download scan images with intelligent naming based on scan ID, section name, and store POG ID
- 📊 Data Analysis: Comprehensive scan data analysis with POG and OSA metrics (optional)
- 🎨 Highlighted Reports: Color-coded Excel reports for easy identification of differences
- 📁 Custom Results Path: Choose where to save test results
- ⚙️ Interactive Workflow: Step-by-step guided process with functionality selection
- 🔁 Batch Processing: Process scans in configurable batches (default: 10 scans per batch)
- 💾 Checkpointing: Automatic progress saving with resume capability
- 🔄 Retry Logic: Exponential backoff retry mechanism for failed operations
- 🧵 Multi-threading: Concurrent file downloads/uploads for improved performance
- ⏱️ Extended Timeouts: 30-minute timeout support for large batch operations
- 📝 Real-time Logging: Live console output with progress tracking
- 🛡️ Error Handling: Comprehensive error handling with detailed logging
- Python 3.8 or higher
- Access to source and target database instances
- Network access to database hosts and API endpoints
-
Clone the repository:
git clone https://github.com/retech-us/MATESTS.git cd MATESTS -
Set up configuration files:
# Copy template files (these contain no sensitive data) cp config.py.template config.py cp config.json.template config.json -
Install Python dependencies:
pip install -r requirements.txt
-
Run the SDK:
python createScansSDK.py
- Run the SDK:
python createScansSDK.py - Choose functionality: Select between Copy Scan or Download Images
- Follow the interactive prompts to configure your workflow
- Review generated reports in the results folder
The Copy Scan workflow allows you to copy scans from a source instance to a target instance with optional analysis.
- Select option 1 for Copy Scan
Enter database credentials:
- Source Database Instance
- Source Database Password
- Source Username
- Source Password
- Target Database Instance
- Target Database Password
- Target Username
- Target Password
Choose where to save results:
- Option 1: Default path (
./testResults/) - Option 2: Custom path (specify your own directory)
Enter comma-separated scan IDs to copy:
- Can use existing scan IDs from config if available
- Or enter new scan IDs manually
Enter the target store ID where scans will be copied
Choose which copy script to use:
- Option 1:
mappedScans.py(mapped scans script) - Option 2:
autoScans.py(auto scans script with enhanced features)
Checkpoint Handling:
- If checkpoint files are detected, you'll be prompted to:
- Resume from checkpoint (continues from last completed batch)
- Restart from scratch (deletes checkpoint and starts fresh)
Batch Processing:
- Scans are processed in batches of 10 (configurable)
- Each batch includes:
- File downloads (concurrent, up to 20 workers)
- File uploads (concurrent, up to 20 workers)
- Scan creation (concurrent, up to 10 workers)
- Progress is saved after each batch completion
- Failed batches are automatically retried (up to 3 retries)
Real-time Progress:
- Batch start/completion logs with scan IDs
- Download/upload progress percentages
- Success/failure counts per batch
- Estimated remaining time
After copying completes, choose whether to run analysis:
- Option Y: Continue with analysis (proceeds to Step 5)
- Option N: End workflow without analysis
- Automatically extracts target scan IDs from mapping CSV if available
- Or manually enter target scan IDs for analysis
- Creates comprehensive CSV and Excel reports
- Includes color-coded highlighting for differences
- Generates source and target scan analysis files
The Download Images workflow downloads scan images with intelligent naming.
- Select option 2 for Download Images
Enter source database credentials:
- Source Database Instance
- Source Database Password
- Source Username
- Source Password
Enter comma-separated scan IDs to download images from
Choose download location:
- Option 1: Default path (
./downloaded_images/) - Option 2: Custom path
Images are downloaded in batches with:
- Concurrent downloads (up to 20 workers per batch)
- Progress tracking per batch
- Automatic retry on failures
File Naming Convention:
- Format:
{scan_id}_{section_name}_{store_pog_id}.ext - If section and POG exist:
12345_SectionName_67890.jpg - If only section exists:
12345_SectionName.jpg - If only POG exists:
12345_67890.jpg - If neither exists:
12345.jpg
Both copy and download workflows use batch processing to handle large numbers of scans efficiently.
- Default Batch Size: 10 scans per batch
- Concurrent Workers:
- Downloads: 20 workers
- Uploads: 20 workers
- Scan Creation: 10 workers
- Automatic Batching: Scans are automatically divided into batches
- Progress Tracking: Real-time progress for each batch
- Error Isolation: Failures in one batch don't stop other batches
- Retry Logic: Failed batches are retried automatically
- Checkpointing: Progress is saved after each batch
Batches are retried if:
- Download success rate < 80%
- Upload success rate < 80%
- Scan creation success rate < 50%
Retry configuration:
- Maximum retries: 3 attempts per batch
- Exponential backoff: 5 seconds × retry attempt
- Total timeout: 30 minutes per batch
Each batch logs:
- Batch number and total batches
- Source scan IDs in the batch
- Download/upload progress
- Success/failure counts
- Completion status
Example log output:
================================================================================
[BATCH 1/5] Starting batch 1
[BATCH 1/5] Processing scans 1-10 of 50
[BATCH 1/5] Source Scan IDs: 12345, 12346, 12347, ...
================================================================================
[BATCH 1] Starting download of 25 files
[BATCH 1] [DOWNLOAD] 25/25 (100%)
[BATCH 1] Starting upload of 25 files
[BATCH 1] [UPLOAD] 25/25 (100%)
[BATCH 1] [SCAN] Creating 10 scans...
[BATCH 1] [SCAN] 10/10 (100%)
================================================================================
[BATCH 1/5] Batch 1 completed
[BATCH 1/5] Success: 10, Failed: 0
================================================================================
The system automatically saves progress after each batch:
- Checkpoint file:
checkpoint_{timestamp}.json - Contains: Completed batch numbers, scan mappings, failed scan count
When restarting:
- System detects existing checkpoint files
- Prompts user to resume or restart
- If resuming:
- Skips already completed batches
- Continues from next incomplete batch
- Preserves existing scan mappings
- If restarting:
- Deletes checkpoint files
- Starts from beginning
{
"completed_batches": [1, 2, 3],
"scan_mapping": [
{"source_scan_id": 12345, "target_scan_id": 67890},
...
],
"failed_scans": 0
}-
API Retries: Exponential backoff for 502/503 errors
- Base delay: 1-2 seconds
- Max delay: 60 seconds
- Max retries: 3 attempts
- Total timeout: 30 minutes
-
Batch Retries: Automatic retry for failed batches
- Retry conditions based on success rates
- Exponential backoff between retries
- Maximum 3 retry attempts
-
Individual Operation Retries: Per-file and per-scan retries
- Handles transient network errors
- Timeout handling for long operations
Detailed error logging includes:
- HTTP status codes (especially 400 Bad Request)
- API request payloads
- API response bodies
- File/scan IDs causing errors
- Stack traces for debugging
- Failed files don't block batch completion
- Failed scans are logged and tracked
- Partial success is allowed (continues with available data)
- Checkpoint saves progress even with some failures
MATESTS/
├── createScansSDK.py # Main SDK with interactive workflow
├── mappedScans.py # Mapped scans copying script
├── autoScans.py # Auto scans copying script with batch processing
├── downloadScanImages.py # Image download script with intelligent naming
├── scanDataAnalysis.py # Analysis functions
├── config.py.template # Configuration template (safe to commit)
├── config.json.template # JSON configuration template (safe to commit)
├── requirements.txt # Python dependencies
├── README.md # This file
├── testResults/ # Default results directory
│ └── run_YYYYMMDD_HHMMSS/
│ ├── initial_scan_mapping_*.csv
│ ├── scan_mapping_updated_*.csv
│ ├── source_scandetails_*.csv
│ ├── target_scandetails_*.csv
│ ├── analysis_with_comments_*.csv
│ └── analysis_with_comments_*_highlighted.xlsx
└── downloaded_images/ # Default download directory
└── {scan_id}_{section}_{pog}.ext
-
Scan Mapping CSV: Maps source scan IDs to target scan IDs
initial_scan_mapping_YYYYMMDD_HHMMSS.csvscan_mapping_updated_YYYYMMDD_HHMMSS.csv
-
Analysis CSV Files:
source_scandetails_YYYYMMDD_HHMMSS.csv: Source scan analysistarget_scandetails_YYYYMMDD_HHMMSS.csv: Target scan analysisanalysis_with_comments_YYYYMMDD_HHMMSS.csv: Combined analysis
-
Excel Report:
analysis_with_comments_YYYYMMDD_HHMMSS_highlighted.xlsx: Color-coded Excel file
- Image files named:
{scan_id}_{section_name}_{store_pog_id}.ext - Files organized in the specified download folder
- Progress logs in console
The highlighted analysis uses the following color scheme:
- 🔴 Red: Wrong POG Name Mapping (different POG names between source and target)
- 🟠 Orange: Same POG Name but Different Section (same POG, different sections)
- 🟡 Yellow: Target Has Additional Section (source has no additional section, target does)
- 🔵 Blue: Target Has Higher POG% Than Source (target POG% > source POG%)
- 🟢 Green: No Issues (all mappings are correct)
The SDK prompts for:
- Source Database Instance
- Source Database Password
- Source Username
- Source Password
- Target Database Instance (for copy workflow)
- Target Database Password (for copy workflow)
- Target Username (for copy workflow)
- Target Password (for copy workflow)
- ✅ Password Masking: All passwords are hidden during input using
getpass - ✅ No Hardcoded Credentials: All configuration is collected interactively
- ✅ Secure Storage: Credentials are only stored temporarily in memory
- ✅ No Password Echo: Passwords are never displayed on screen
- ✅ Config Updates: Automatically updates
config.pywith current values
The SDK automatically updates config.py with:
- Database credentials
- Scan IDs for copying/downloading
- Target store ID
- Other runtime values
-
Extended Timeouts:
- API calls: 60-120 seconds
- Total operation: 30 minutes
- Prevents premature timeouts on large batches
-
Concurrent Processing:
- Multiple threads for downloads/uploads
- Parallel scan creation
- Efficient resource utilization
-
Batch Processing:
- Processes scans in manageable chunks
- Reduces memory usage
- Enables progress tracking
-
Real-time Output:
- Flushed console output for immediate visibility
- Progress indicators
- Estimated remaining time
-
Database Connection Errors
- Verify credentials and network access
- Check database instance names
- Ensure firewall allows connections
-
Timeout Errors
- Timeouts are set to 30 minutes for large batches
- Check network stability
- Verify API endpoint availability
-
Permission Errors
- Ensure write access to results directory
- Check download folder permissions
- Verify file system permissions
-
Missing Dependencies
- Run
pip install -r requirements.txt - Verify Python version (3.8+)
- Check for missing system libraries
- Run
-
Checkpoint Issues
- Checkpoint files can be manually deleted to restart
- Verify JSON file integrity
- Check disk space availability
-
400 Bad Request Errors
- Check detailed error logs for request payload
- Verify scan data structure
- Review API response for specific errors
-
NoneType Errors
- Check API responses for missing data
- Verify scan information completeness
- Review error logs for specific fields
- "Password authentication failed": Check database credentials
- "Permission denied creating folder": Check directory permissions
- "No scan mapping CSV file found": Ensure copy script completed successfully
- "Checkpoint file corrupted": Delete checkpoint and restart
- "Batch processing failed": Check error logs for specific batch issues
- "API returned None response": Verify API endpoint and authentication
- Enable Verbose Logging: Check console output for detailed logs
- Review Checkpoint Files: Inspect JSON files for progress state
- Check API Responses: Review 400 error logs for payload issues
- Verify Scan Data: Ensure scan IDs exist and have valid data
- Network Diagnostics: Test connectivity to database and API endpoints
pandas>=1.5.0- Data manipulation and analysisnumpy>=1.21.0- Numerical computingpsycopg[binary]>=3.0.0- PostgreSQL database adapterrequests>=2.28.0- HTTP library for API callsopenpyxl>=3.0.0- Excel file generation (optional, for Excel reports)
pip install -r requirements.txt- Authentication: 30 seconds
- File Download (metadata): 60 seconds
- File Download (content): 120 seconds
- File Upload: 120 seconds
- Scan Creation: 60 seconds
- Total Batch Operation: 30 minutes
- Automatic retry with exponential backoff for rate limit errors
- Concurrent operations limited by worker counts
- Respects API rate limits through backoff mechanism
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with various batch sizes
- Update documentation if needed
- Submit a pull request
This project is part of the retech-us organization and follows the organization's licensing terms.
For support and questions:
- Create an issue in this repository
- Contact the development team
- Check the troubleshooting section above
- Review error logs for specific issues
- Batch Processing: Added configurable batch processing with progress tracking
- Checkpointing: Implemented automatic checkpointing with resume capability
- Retry Logic: Enhanced retry mechanisms with exponential backoff
- Download Images: New workflow for downloading images with intelligent naming
- Analysis Option: Made analysis optional after scan copying
- Error Logging: Improved error logging with detailed API payload information
- Performance: Optimized for handling 500+ scans with extended timeouts
- Real-time Output: Added real-time console output with progress indicators
Note: This SDK is designed for internal use within the retech-us organization for scan copying, image downloading, and analysis workflows. All operations include comprehensive error handling, checkpointing, and retry mechanisms to ensure reliable processing of large datasets.