A comprehensive full-stack web application for bulk uploading, validating, parsing, and managing Excel-based questionnaires in XLSForm-compatible format. The system provides a modern Angular frontend with a powerful FastAPI backend and MongoDB for data persistence.
- Overview
- Prerequisites
- Installation & Setup
- Architecture
- Technology Stack
- Configuration
- API Reference
- Database Schema
- TempData JSON Format
- File Format Specifications
- Performance Characteristics
- Testing Strategy
- Deployment
- Development Guidelines
- Security Considerations
- Troubleshooting
- Contributing
- Future Enhancements
This application enables organizations to efficiently manage questionnaire forms through Excel file uploads. It supports XLSForm-like format with three-sheet structure (Forms, Questions Info, Answer Options) and provides comprehensive validation, parsing, and storage capabilities.
- Async Bulk Upload: Process multiple Excel files concurrently with pause, resume, and stop controls
- TempData JSON Format: Generate comprehensive JSON output matching mobile app data structure
- Advanced Upload Queue: Configurable concurrent upload limits with session persistence
- Comprehensive Validation: Structural and content validation with detailed error reporting
- Dynamic Form Configuration: Extract form settings directly from Excel sheets with sensible defaults
- Form Management: CRUD operations for stored forms with metadata tracking
- Real-time Progress: Visual feedback with top-positioned success notifications
- Dark Theme UI: Modern Angular Material interface with accessibility support
- Performance Monitoring: Built-in metrics collection and analysis
- Keyboard Shortcuts: Power-user features for efficient navigation
- Server-Side Rendering: SEO-friendly with fast initial page loads
┌─────────────────────────────────────────────────────────┐
│ Angular Frontend │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Components: Upload, Search, Navbar │ │
│ │ Services: FormService, FormPreviewService │ │
│ │ Features: Drag & Drop, Progress Tracking │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
│ HTTP/JSON API
│
┌─────────────────────────────────────────────────────────┐
│ FastAPI Backend │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Core: main.py (FastAPI app, CORS, middleware) │ │
│ │ Business Logic: xlsform_parser.py │ │
│ │ Data Access: database_service.py │ │
│ │ Models: Pydantic validation models │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
│ MongoDB Driver
│
┌─────────────────────────────────────────────────────────┐
│ MongoDB Database │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Collections: forms, questions, options │ │
│ │ Features: Async operations, indexing │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
- File Upload: User selects Excel files via drag & drop or file picker
- Validation: Frontend sends files to backend for structural validation
- Parsing: Valid files are parsed into JSON schema format
- Storage: Parsed data is stored in MongoDB collections
- Management: Users can view, update, and delete stored forms
- Framework: Angular 19 with standalone components
- UI Library: Angular Material with custom dark theme
- Upload System: Async queue with concurrent processing, pause/resume/stop controls
- Notifications: Top-positioned success toasts with Material Snackbar
- Styling: SCSS with CSS custom properties
- Build Tool: Angular CLI with SSR support
- Testing: Vitest with jsdom environment
- HTTP Client: Angular HttpClient with fetch API and AbortController
- State Management: RxJS observables with service-based architecture and session persistence
- Framework: FastAPI with async/await support
- Web Server: Uvicorn ASGI server
- Data Processing: Pandas for Excel parsing, OpenPyXL for file handling
- JSON Format: TempData.json structure with comprehensive form configuration
- Dynamic Configuration: Extract form settings from Excel sheets with defaults
- Database: MongoDB with Motor async driver
- Validation: Pydantic models with comprehensive error handling
- Documentation: Auto-generated OpenAPI/Swagger UI
- CORS: Configured for Angular frontend integration
- Engine: MongoDB with document-based storage
- Collections: Separate collections for forms, questions, and options
- Indexing: Optimized indexes for efficient querying
- Connection: Async connection pooling with proper lifecycle management
- Version Control: Git with structured commit messages
- Environment: Python virtual environments, Node.js nvm
- Testing: pytest for backend, Vitest for frontend
- Linting: ESLint for frontend, pre-commit hooks
- Documentation: Markdown-based documentation with auto-generation
- Operating System: Linux, macOS, or Windows with WSL
- Python: 3.8+ (3.9+ recommended)
- Node.js: 18+ with npm
- MongoDB: 5.0+ (local installation or cloud instance)
- Memory: 4GB+ RAM recommended
- Storage: 1GB+ free space
- Git: For version control
- curl: For API testing (optional)
- tmux: For running frontend/backend simultaneously (optional)
git clone <repository-url>
cd bulk-questionnaire-uploadFor local MongoDB installation:
# Linux (systemd)
sudo systemctl start mongodb
sudo systemctl enable mongodb
# Linux (OpenRC)
sudo rc-service mongodb start
# macOS (Homebrew)
brew services start mongodb-community
# Verify connection
mongosh --eval "db.runCommand('ping')"For MongoDB Atlas (cloud):
- Create account at mongodb.com/atlas
- Create cluster and get connection string
- Update
.envfile with cloud connection string
cd backend/
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# or
.venv\Scripts\activate # Windows
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Create environment configuration
cp .env.example .env
# Edit .env with your MongoDB settingscd frontend/
# Install dependencies
npm install
# Verify installation
npm --version
ng versionUsing provided script (recommended):
# From project root
./run.shManual startup:
# Terminal 1 - Backend
cd backend/
source .venv/bin/activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000
# Terminal 2 - Frontend
cd frontend/
ng serve --host 0.0.0.0 --port 4200- Backend API: http://localhost:8000/docs (Swagger UI)
- Frontend App: http://localhost:4200
- MongoDB: Check collections created in database
Environment Variables (.env file):
# MongoDB Configuration
MONGODB_URL=mongodb://localhost:27017
DATABASE_NAME=mform_bulk_upload
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
# CORS Configuration
FRONTEND_URL=http://localhost:4200
# Optional: Performance tuning
MAX_WORKERS=4
TIMEOUT=30Angular Configuration (src/app/app.config.ts):
export const appConfig: ApplicationConfig = {
providers: [
provideZoneChangeDetection({ eventCoalescing: true }),
provideRouter(routes),
provideClientHydration(withEventReplay()),
provideAnimations(),
provideHttpClient(withFetch())
]
};API Service Configuration (src/app/services/form.service.ts):
private readonly apiUrl = 'http://localhost:8000/api';Validates Excel file structure and content without processing.
Request: multipart/form-data
{
file: UploadFile // .xls or .xlsx file
}Response:
{
"valid": true,
"message": "File format is valid.",
"sheets": [
{
"name": "Forms",
"exists": true,
"columns": ["Language", "Title"],
"required_columns": ["Language", "Title"],
"missing_columns": [],
"row_count": 1
}
],
"form_metadata": {
"language": "en",
"title": "Sample Form"
},
"questions_count": 5,
"options_count": 15,
"errors": [],
"warnings": []
}Parses Excel file and returns comprehensive tempData.json format without database storage.
Request: multipart/form-data
{
file: UploadFile // .xls or .xlsx file
}Response: Array containing questionnaire response and form definition
[
{
"_id": "ObjectId(\"66c2e4aca61889ab24b58407\")",
"formId": 123456789,
"version": "1.0.0",
"language": [{"lng": "en", "default": true}],
"question": [
{
"order": 1,
"input_type": 1,
"answer": "",
"initialAnswer": ""
}
],
"responseUpdateHistory": [],
"appVersion": "1.0.0",
"responseIds": {
"formResponseId": "ObjectId(\"66c2e4aca61889ab24b58408\")",
"tempResponseId": "ObjectId(\"66c2e4aca61889ab24b58409\")"
},
"syncStatus": {
"questions": [{"order": 1, "synced": false}]
},
"keyInfoOrders": [1, 2, 3],
"copiedFormId": 987654321,
"title": "Sample Questionnaire",
"subtitle": "Form subtitle",
"description": "Form description"
}
]Processes and stores multiple Excel files concurrently with async queue management.
Request: multipart/form-data
{
files: UploadFile[] // Multiple .xls or .xlsx files
}Response: Array of tempData.json format objects with database IDs and comprehensive form configuration.
Retrieves all forms with summary information.
Response:
{
"forms": [
{
"id": "507f1f77bcf86cd799439011",
"title": "Sample Form",
"language": "en",
"version": "1.0.0",
"created_at": "2024-01-15T10:30:00Z"
}
],
"count": 1
}Retrieves complete form data in tempData.json format including questions and options.
Response: Array containing comprehensive form configuration
[
{
"_id": "ObjectId(\"507f1f77bcf86cd799439011\")",
"formId": 123456789,
"version": "1.0.0",
"language": [{"lng": "en", "default": true}],
"question": [
{
"order": 1,
"input_type": 1,
"answer": "",
"initialAnswer": ""
}
],
"responseUpdateHistory": [],
"appVersion": "1.0.0",
"responseIds": {
"formResponseId": "ObjectId(\"66c2e4aca61889ab24b58408\")",
"tempResponseId": "ObjectId(\"66c2e4aca61889ab24b58409\")"
},
"syncStatus": {
"questions": [{"order": 1, "synced": false}]
},
"keyInfoOrders": [1, 2, 3],
"copiedFormId": 987654321,
"title": "Sample Form",
"created_at": "2024-01-15T10:30:00Z"
}
]Updates an existing form with new Excel file data.
Request: multipart/form-data
{
file: UploadFile // Updated .xls or .xlsx file
}Deletes a form and all related questions and options.
Deletes all forms and related data (bulk operation).
{
_id: ObjectId,
title: String,
language: String,
version: String,
created_at: ISODate
}{
_id: ObjectId,
form_id: String,
order: Number,
title: String,
view_sequence: Number,
input_type: Number,
created_at: ISODate
}{
_id: ObjectId,
form_id: String,
order: Number,
option_id: Number,
label: String,
created_at: ISODate
}- Forms:
{created_at: -1}(recent forms first) - Questions:
{form_id: 1}(efficient form retrieval) - Options:
{form_id: 1}(efficient form retrieval) - Compound Index:
{form_id: 1, order: 1}(ordered retrieval)
- Forms contain multiple questions (1:N)
- Questions contain multiple options (1:N)
- Choice questions (types 2, 3) must have corresponding options
- All option orders must map to existing question orders
The system now generates comprehensive JSON output in the tempData.json format, which includes both questionnaire response data and complete form configuration. This format is designed for mobile app compatibility and includes:
- Response Data: Questionnaire responses with answer tracking and update history
- Form Configuration: Complete form definition with language settings and question details
- Sync Status: Question-level synchronization tracking
- Metadata: Form versioning, IDs, and configuration flags
- Geographic Data: 6-level geographic hierarchy support
- Language Support: Multi-language form definitions with default language selection
Form settings are now extracted dynamically from the Excel Forms sheet, including:
- Form title, subtitle, and description
- Version information and language settings
- Boolean configuration flags (isLocationMandatory, allowEdit, etc.)
- Reference IDs and key information orders
- Custom form-specific settings with sensible defaults
Forms Sheet:
- Required columns:
Language,Title - Supported languages: en, fr, es, de, it, pt, ar, zh, ja, ko, hi, ru
- Title length: ≤ 255 characters
- Only first row is processed
Questions Info Sheet:
- Required columns:
Order,Title,View Sequence,Input Type - Order: Positive integers, unique within form
- View Sequence: Positive integers for display ordering
- Input Type: 1-10 (see supported types below)
- Title length: ≤ 1000 characters
Answer Options Sheet:
- Required columns:
Order,Id,Label - Order: References question order
- Id: Positive integers, unique per Order
- Label length: ≤ 500 characters
{
1: 'text', // Text input
2: 'select_one', // Single choice
3: 'select_multiple', // Multiple choice
4: 'integer', // Whole numbers
5: 'decimal', // Decimal numbers
6: 'date', // Date picker
7: 'time', // Time picker
8: 'datetime', // Date and time
9: 'note', // Display text
10: 'calculate' // Computed value
}- Choice questions (types 2, 3) must have corresponding options
- All option orders must map to existing question orders
- No orphaned options without corresponding questions
- Data types must match column requirements
- Required fields cannot be empty or null
The system automatically collects performance metrics in backend/metrics.txt. Below is the current performance data from recent testing (September 11-15, 2025):
| Metric Type | Recent Performance |
|---|---|
| File Validation | 45-80ms (0.045-0.080s) |
| Form Parsing | 142-190ms (0.142-0.190s) |
| Form Upload | 270-680ms (0.270-0.680s) |
| Question Processing | 0.155-0.443ms per question |
| Option Processing | 0.128-0.352ms per option |
| Batch Processing | 270-680ms per form |
| Delete Operations | 12-44ms (0.012-0.044s) |
| Cold Start Time | Tracked on startup |
| Time Period | Activity Type | Performance Range | Notes |
|---|---|---|---|
| Sep 11, 16:19-22:24 | Intensive Testing | Sequential operations | Active performance testing |
| Sep 11, 20:31-21:04 | Extended Testing | Multiple operations | Comprehensive testing session |
| Sep 15, 21:29-21:35 | Latest Testing | Recent operations | Most current performance data |
| Validation Operations | Form validation | 45-80ms range | Consistent validation performance |
| Parse Operations | Schema parsing | 142-190ms range | Optimized parsing performance |
| Upload Operations | Form upload | 270-680ms range | Variable based on form complexity |
| Aspect | Details |
|---|---|
| Validation Speed | 45-80ms range for recent operations |
| Parse Performance | 142-190ms range, well optimized |
| Upload Performance | 270-680ms range, scales with complexity |
| Delete Performance | 12-44ms range, very efficient |
| Memory Management | File streaming prevents memory leaks |
| Concurrent Processing | Async operations with asyncio.gather() |
| Database Operations | Optimized with compound indexes |
| Metrics Collection | Detailed performance tracking |
- Concurrent Processing: Async file processing with
asyncio.gather() - Connection Pooling: MongoDB connection reuse
- Efficient Parsing: Pandas DataFrame operations for large datasets
- Memory Management: File stream handling to prevent memory leaks
- Database Indexing: Optimized queries with compound indexes
- Horizontal Scaling: Stateless backend supports multiple instances
- Database Sharding: MongoDB sharding for large datasets
- Caching Layer: Redis for frequently accessed forms
- CDN Integration: Static asset optimization for global distribution
Unit Tests (tests/backend/):
# Test individual components
def test_xlsform_parser_validation():
parser = XLSFormParser()
# Test validation logic
def test_database_service_operations():
service = DatabaseService()
# Test database operationsIntegration Tests:
# Test complete workflows
def test_file_upload_workflow():
# Test end-to-end file processing
passTest Configuration (pytest.ini):
[tool:pytest]
filterwarnings =
ignore::DeprecationWarning:fastapi.openapi.models
ignore::DeprecationWarning:fastapi.datastructuresVitest Configuration (vitest.config.ts):
export default defineConfig({
test: {
environment: 'jsdom',
include: ['tests/frontend/**/*.spec.ts'],
globals: true
}
})Test Structure:
- form.service.spec.ts - API service testing
- Unit tests for components and services
- Integration tests for critical user flows
- E2E test coverage for upload workflows
- Valid Forms:
tests/test_xlsforms_valid/- 10 sample valid Excel files - Invalid Forms:
tests/test_xlsforms_incorrect/- Various error scenarios - Edge Cases: Boundary conditions and error handling
# Using provided script
./run.sh # Starts both frontend and backend in tmuxBackend Deployment:
# Install production dependencies
pip install -r requirements.txt
# Use production server
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4Frontend Deployment:
# Build for production
npm run build
# Output in dist/mform-upload/
# Serve with any static file server# Backend
FROM python:3.9-slim
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install -r requirements.txt
COPY backend/ .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
# Frontend
FROM nginx:alpine
COPY frontend/dist/mform-upload/ /usr/share/nginx/html
EXPOSE 80- Development: Hot reload, debug logging, local database
- Staging: Production build, staging database, monitoring
- Production: Optimized build, production database, SSL/TLS
Python (Backend):
- Use type hints for all function parameters and return values
- Follow PEP 8 style guidelines
- Use async/await for I/O operations
- Implement comprehensive error handling
- Add docstrings for all public functions
TypeScript (Frontend):
- Use strict TypeScript configuration
- Implement proper interface definitions
- Use Angular style guide conventions
- Follow RxJS best practices for observables
- Implement proper error handling in services
Structured Error Responses:
{
"detail": {
"error": "Validation failed",
"message": "Found 3 error(s) and 2 warning(s).",
"error_type": "VALIDATION_ERROR",
"file_name": "questionnaire.xlsx",
"errors": [...],
"warnings": [...],
"suggestions": [...]
}
}Error Classification System:
- File-level errors (MISSING_FILE, INVALID_FORMAT, etc.)
- Structure-level errors (MISSING_SHEET, MISSING_COLUMNS, etc.)
- Content-level errors (INVALID_DATA_TYPE, MISSING_VALUE, etc.)
- Cross-reference errors (MISSING_REFERENCE, ORPHANED_REFERENCE)
Input Validation:
- Validate all file uploads and form data
- Implement file type and size restrictions
- Sanitize Excel file content before processing
API Security:
- CORS configuration for frontend origin
- Input validation with Pydantic models
- Error message sanitization (no sensitive data exposure)
Database Security:
- Connection string encryption in production
- Proper authentication and authorization
- Input sanitization to prevent injection attacks
MongoDB Connection Issues:
# Check MongoDB status
sudo systemctl status mongodb
# Test connection
python -c "from database import connect_to_mongo; connect_to_mongo()"
# Reset database
mongo mform_bulk_upload --eval "db.dropDatabase()"File Processing Errors:
# Enable debug logging
import logging
logging.basicConfig(level=logging.DEBUG)
# Test with sample file
curl -X POST "http://localhost:8000/api/validate" -F "[email protected]"Performance Issues:
# Check metrics
cat backend/metrics.txt | tail -20
# Monitor system resources
top -p $(pgrep -f uvicorn)# Check MongoDB collections
mongosh mform_bulk_upload --eval "db.forms.count()"
# Test API endpoints
curl -X GET "http://localhost:8000/api/forms"
curl -X POST "http://localhost:8000/api/validate" -F "[email protected]"
# Frontend debugging
ng serve --configuration development --verbose
# Backend debugging
uvicorn main:app --reload --log-level debugDatabase Optimization:
// Create indexes
db.forms.createIndex({created_at: -1})
db.questions.createIndex({form_id: 1})
db.options.createIndex({form_id: 1, order: 1})Memory Management:
- Monitor file upload sizes
- Implement file cleanup after processing
- Use streaming for large file processing
- Fork and Clone: Fork the repository and clone locally
- Create Feature Branch:
git checkout -b feature/new-feature - Make Changes: Implement your feature with tests
- Run Tests: Ensure all tests pass locally
- Commit Changes: Use conventional commit messages
- Push and Create PR: Push branch and create pull request
feat: add new API endpoint for form export
fix: resolve validation error for empty sheets
docs: update API documentation for new endpoints
test: add unit tests for database service
refactor: improve error handling in file parser
- Description: Provide clear description of changes
- Testing: Include tests for new functionality
- Documentation: Update documentation if needed
- Review: Address reviewer feedback
- Merge: Squash and merge after approval
- Code follows established patterns and conventions
- Comprehensive error handling implemented
- Performance metrics added for new operations
- Documentation updated for API changes
- Tests added for new functionality
- Security considerations addressed
-
File Type Support
- Add CSV, JSON import capabilities
- Support for Google Sheets integration
- Template-based form creation
-
Real-time Collaboration
- Multi-user form editing
- Real-time synchronization
- Version control for forms
-
Advanced Analytics
- Form usage statistics and insights
- Performance analytics dashboard
- Export analytics data
-
Template System
- Pre-built form templates
- Template marketplace
- Custom template creation tools
-
Export Options
- Additional export formats (PDF, XML, CSV)
- Custom export configurations
- Batch export functionality
-
Offline Mode
- Full offline capability with sync
- Progressive Web App (PWA) features
- Service worker implementation
-
Internationalization
- Multi-language support for interface
- RTL language support
- Localization framework integration
-
Performance Monitoring
- Application performance tracking
- User behavior analytics
- Real-time performance dashboards
-
Code Quality
- Implement comprehensive test coverage
- Add type hints throughout codebase
- Refactor legacy code patterns
-
Architecture Improvements
- Implement microservices architecture
- Add message queue for async processing
- Implement caching layer
-
User Experience
- Implement progressive disclosure
- Add keyboard shortcuts and accessibility
- Improve error messaging and user feedback
-
Third-party Services
- Integration with form builders (Typeform, Google Forms)
- API integrations with survey platforms
- Data warehouse integrations
-
Cloud Services
- AWS S3 for file storage
- Cloud functions for processing
- CDN for static assets
-
Monitoring & Logging
- Application monitoring (DataDog, New Relic)
- Centralized logging (ELK stack)
- Error tracking (Sentry)
-
Emerging Technologies
- Machine learning for form analysis
- AI-powered form validation
- Natural language processing for question generation
-
Scalability Research
- Distributed processing architectures
- Edge computing for global distribution
- Serverless deployment models
This documentation provides comprehensive guidance for understanding, maintaining, and extending the Bulk Questionnaire Upload System. For questions or support, please refer to the project documentation or create an issue in the repository.