A FastAPI application to parse bank statement PDFs and extract transaction data. Supports multiple Indian banks with automatic bank detection and transaction categorization.
- 📄 Parse bank statement PDFs using pdfplumber
- 🏦 Auto-detect bank from email sender or PDF content
- 💰 Extract transactions with date, amount, balance, and entity details
- 🔍 Identify payment methods (UPI, NEFT, IMPS, RTGS, etc.)
- 👤 Extract entity/person names from transaction descriptions
- 📊 Extract account details (number, IFSC, type)
- ⚡ Async processing with Celery workers
- Union Bank of India
- [] Kotak Mahindra Bank
- [] State Bank of India (SBI)
- [] HDFC Bank
- [] ICICI Bank
- [] Axis Bank
- Backend: FastAPI, Python 3.12
- PDF Parsing: pdfplumber
- Task Queue: Celery with Redis
- Database: Shared PostgreSQL
- Containerization: Docker
- Rules UI: NiceGUI (port 8085)
- Python 3.12+
- PostgreSQL
- Redis
# Clone repository
git clone https://github.com/NishantGhanate/StatementParser.git
cd StatementParser
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Edit .env with your configuration
$ echo "/mnt/d/Github/ExpenseBoard/StatementParser" > $(python -c "import site; print(site.getsitepackages()[0])")/statementparser.pth
or
$ echo "/mnt/d/Github/ExpenseBoard/StatementParser" > /mnt/d/Github/ExpenseBoard/StatementParser/venv/lib/python3.12/site-packages/statementparser.pth
## Verify it worked
python -c "import sys; print('\n'.join(sys.path))"
# Start the application
uvicorn main:app --reload --port 5000
# Run the Rules & Ledger UI locally
python run_ui.py
# then open http://localhost:8085API docs: http://localhost:8000/docs Rules UI: http://localhost:8085
# Copy environment file
cp .env.example .env
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Rules UI is available at http://localhost:8085 (or $RULES_APP_PORT)
# Health check: http://localhost:8085/healthzCreate a .env file with the following variables:
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/statement_parser
# Redis
REDIS_URL=redis://localhost:6379/0
# Celery
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
# API
API_PORT=8000
# Rules UI
RULES_APP_PORT=8085
NICEGUI_SECRET=expenseboard-secret-change-mefrom app.parsers import parse_statement
from app.parsers.base import BankName
result = parse_statement(pdf_path="statement.pdf", bank_name=BankName.UNION)
print(result)# Upload and parse statement
POST /api/v1/statements/upload
Content-Type: multipart/form-data
# Get parsed transactions
GET /api/v1/statements/{statement_id}/transactions
# Get account details
GET /api/v1/accounts/{account_id}python app/tasks/bank_statement_upload.py --input files/statement.pdf --from_email [email protected]# Start worker
celery -A app.celery_app worker --loglevel=info -Q statment_parser
# Start beat scheduler
celery -A app.celery_app beat --loglevel=info
celery -A app.core.celery_app flower --port=5555 --broker=redis://redis-superset:6379/0docker logs -f statement_parser_worker
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-bank) - Commit changes (
git commit -am 'Add new bank parser') - Push to branch (
git push origin feature/new-bank) - Open a Pull Request
MIT License