Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Statement Parser

A FastAPI application to parse bank statement PDFs and extract transaction data. Supports multiple Indian banks with automatic bank detection and transaction categorization.

Features

  • 📄 Parse bank statement PDFs using pdfplumber
  • 🏦 Auto-detect bank from email sender or PDF content
  • 💰 Extract transactions with date, amount, balance, and entity details
  • 🔍 Identify payment methods (UPI, NEFT, IMPS, RTGS, etc.)
  • 👤 Extract entity/person names from transaction descriptions
  • 📊 Extract account details (number, IFSC, type)
  • ⚡ Async processing with Celery workers

Supported Banks

  • Union Bank of India
  • [] Kotak Mahindra Bank
  • [] State Bank of India (SBI)
  • [] HDFC Bank
  • [] ICICI Bank
  • [] Axis Bank

Tech Stack

  • Backend: FastAPI, Python 3.12
  • PDF Parsing: pdfplumber
  • Task Queue: Celery with Redis
  • Database: Shared PostgreSQL
  • Containerization: Docker
  • Rules UI: NiceGUI (port 8085)

Installation

Prerequisites

  • Python 3.12+
  • PostgreSQL
  • Redis

Local Setup

# Clone repository
git clone https://github.com/NishantGhanate/StatementParser.git
cd StatementParser

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Copy environment file
cp .env.example .env
# Edit .env with your configuration

$ echo "/mnt/d/Github/ExpenseBoard/StatementParser" > $(python -c "import site; print(site.getsitepackages()[0])")/statementparser.pth

or
$ echo "/mnt/d/Github/ExpenseBoard/StatementParser" > /mnt/d/Github/ExpenseBoard/StatementParser/venv/lib/python3.12/site-packages/statementparser.pth

## Verify it worked
python -c "import sys; print('\n'.join(sys.path))"

# Start the application
uvicorn main:app --reload --port 5000

# Run the Rules & Ledger UI locally
python run_ui.py
# then open http://localhost:8085

Visit

API docs: http://localhost:8000/docs Rules UI: http://localhost:8085

Docker Setup

# Copy environment file
cp .env.example .env

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Rules UI is available at http://localhost:8085  (or $RULES_APP_PORT)
# Health check: http://localhost:8085/healthz

Configuration

Create a .env file with the following variables:

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/statement_parser

# Redis
REDIS_URL=redis://localhost:6379/0

# Celery
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

# API
API_PORT=8000

# Rules UI
RULES_APP_PORT=8085
NICEGUI_SECRET=expenseboard-secret-change-me

Usage

Parse a Bank Statement

from app.parsers import parse_statement
from app.parsers.base import BankName

result = parse_statement(pdf_path="statement.pdf", bank_name=BankName.UNION)
print(result)

API Endpoints

# Upload and parse statement
POST /api/v1/statements/upload
Content-Type: multipart/form-data

# Get parsed transactions
GET /api/v1/statements/{statement_id}/transactions

# Get account details
GET /api/v1/accounts/{account_id}

CLI Usage

python app/tasks/bank_statement_upload.py --input files/statement.pdf --from_email [email protected]

Database

Running Celery Workers

# Start worker
celery -A app.celery_app worker --loglevel=info -Q statment_parser

# Start beat scheduler
celery -A app.celery_app beat --loglevel=info

celery -A app.core.celery_app flower --port=5555 --broker=redis://redis-superset:6379/0

docker logs -f statement_parser_worker

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-bank)
  3. Commit changes (git commit -am 'Add new bank parser')
  4. Push to branch (git push origin feature/new-bank)
  5. Open a Pull Request

License

MIT License