Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Split Data - Analytics Service

This is the Analytics Service component of the Splitwise Clone project, implementing 4 core responsibilities:

  1. Spending pattern analysis and visualization
  2. Group spending statistics aggregation
  3. ML-based expense categorization
  4. Historical trend analysis

Project Structure

split_data/
├── Core Python Files
│   ├── analysis.py              # Spending patterns, group stats, trend analysis
│   ├── chart_data.py            # Chart data generation for visualization
│   ├── expense_classifier.py   # ML-based expense categorization
│   ├── database.py              # Database connection and queries
│   └── api_server.py            # Flask API server
│
├── ML Components
│   ├── train_classifier.py      # Train the ML model
│   └── expense_classifier_model.pkl  # Trained model (run train_classifier.py first)
│
├── Database
│   └── init.sql                 # Database schema
│
├── Docker
│   ├── Dockerfile               # Container configuration
│   └── docker-compose.yml      # Full stack setup (API + MySQL)
│
├── Testing
│   ├── test_all_responsibilities.py  # Test all 4 responsibilities
│   ├── test_classifier.py      # Test ML classifier
│   ├── test_workflow.py        # Test complete workflow
│   └── test_settlement_logic.py  # Test settlement logic
│
├── Utilities
│   ├── generate_dummy_data.py  # Generate test data
│   ├── settlement_checker.py   # Background settlement checker service
│   ├── chart_data_api.py       # CLI script for chart data
│   └── api_example.py          # API usage examples
│
└── Documentation
    ├── RESPONSIBILITIES_GUIDE.md  # Complete API documentation
    ├── HOW_TO_TEST.md            # Testing guide
    └── DOCKER_SETUP.md           # Docker setup guide

Quick Start

Prerequisites

  • Python 3.8+
  • MySQL (via Docker)
  • Docker & Docker Compose

Setup

  1. Install dependencies:

    pip install -r requirements.txt
  2. Train the ML model (first time only):

    python3 train_classifier.py
  3. Start the database:

    docker-compose up -d mysql
  4. Generate test data (optional):

    python3 generate_dummy_data.py
  5. Start the API server:

    python3 api_server.py

    Or use Docker Compose for everything:

    docker-compose up -d

API Endpoints

Analysis Endpoints (Your 4 Responsibilities)

  • GET /api/users/<user_id>/analysis/patterns - Spending pattern analysis
  • GET /api/groups/<group_id>/statistics - Group spending statistics
  • POST /api/tags/suggest - ML-based expense categorization (top 3 suggestions)
  • GET /api/users/<user_id>/analysis/trends - Historical trend analysis

Chart Data Endpoints

  • GET /api/users/<user_id>/charts - All chart data
  • GET /api/users/<user_id>/charts/weekly - Weekly expenses
  • GET /api/users/<user_id>/charts/monthly - Monthly expenses
  • GET /api/users/<user_id>/charts/categories - Expenses by category

See RESPONSIBILITIES_GUIDE.md for complete API documentation.

Testing

Run all tests:

python3 test_all_responsibilities.py

Test individual components:

python3 test_classifier.py      # Test ML classifier
python3 analysis.py             # Test analysis functions
python3 test_workflow.py        # Test complete workflow

See HOW_TO_TEST.md for detailed testing instructions.

Technology Stack

  • Language: Python 3.8+
  • Framework: Flask (REST API)
  • ML Library: scikit-learn (TF-IDF + Naive Bayes)
  • Database: MySQL
  • Containerization: Docker & Docker Compose

Key Files for Each Responsibility

Responsibility Main Files
#1: Spending Patterns analysis.py, chart_data.py
#2: Group Statistics analysis.py
#3: ML Categorization expense_classifier.py, train_classifier.py
#4: Historical Trends analysis.py

Documentation

  • RESPONSIBILITIES_GUIDE.md - Complete API documentation with examples
  • HOW_TO_TEST.md - Testing guide for all 4 responsibilities
  • DOCKER_SETUP.md - Docker deployment guide

Author

Jiawei Li - Analytics Service Implementation