iMessage LLM Processor

A powerful Python tool for converting iMessage HTML exports (created with imessage-exporter) into structured, analyzable CSV data with intelligent conversation grouping and AI-powered analysis using Large Language Models.

Features

🔧 Core Processing

HTML to CSV Conversion: Extract messages from iMessage HTML exports into structured CSV format
Intelligent Conversation Grouping: Automatically groups messages into conversations using advanced algorithms
Progress Tracking: Real-time progress bars and performance metrics during processing
Memory Efficient: Processes large HTML files using streaming with chunk-based parsing

🧠 Smart Conversation Detection

The tool uses multiple sophisticated algorithms to intelligently group messages into conversations:

Dynamic Time Thresholds: Adaptive time gaps based on time of day (night: 3h, morning: 8h, day: 4h, evening: 5h)
Content Analysis: Detects conversation starters ("hey", "hello") and enders ("goodnight", "bye")
Momentum Analysis: Analyzes response times, turn-taking patterns, and engagement levels
Topic Change Detection: Uses semantic similarity to identify topic shifts
Emotional Tone Shifts: Detects significant changes in emotional tone using emoji and keyword analysis
Activity Transitions: Identifies location/activity changes that indicate conversation boundaries
Context Windows: Analyzes surrounding messages for conversation continuity

📊 Advanced Analytics

Conversation Statistics: Detailed metrics on conversation length, duration, and interaction patterns
Quality Analysis: Evaluates conversation grouping quality with confidence scores
Sample-Based Analysis: Uses statistically significant sampling for large datasets
Performance Metrics: Tracks processing time and provides optimization insights

🎨 Rich Terminal Output

Colored Output: Beautiful, readable terminal output with progress indicators
Structured Reporting: Organized sections with clear headers and formatting
Error Handling: Graceful error handling with informative messages

🤖 AI-Powered Analysis

LLM Integration: Uses Ollama with DeepSeek-R1 for intelligent message analysis
Conversation Context: Maintains conversation boundaries during analysis
Smart Caching: Caches results to speed up repeated analyses
Session Management: Save and continue analysis sessions

Quick Start

Step 0: Export Your iMessage Data

Before using this tool, you need to export your iMessage conversations to HTML format. We recommend using imessage-exporter:

# Install imessage-exporter
brew install imessage-exporter

# Export all messages to HTML format
imessage-exporter --format html --output ./data/

# Or export specific conversations
imessage-exporter --format html --output ./data/ --filter-contact "John Doe"

The exporter will create HTML files in your specified output directory. Rename or move the desired export to data/message.html for processing.

Note: imessage-exporter requires macOS and access to your Messages database. For more export options and detailed instructions, see the imessage-exporter documentation.

Steps 1-5: Process and Analyze

# 1. Install dependencies
pip install -r requirements.txt

# 2. Install and start Ollama
# Download from https://ollama.ai, then:
ollama serve
ollama pull deepseek-r1:14b

# 3. Prepare your data directory and move export
mkdir -p data
mv ~/path/to/export.html data/message.html

# 4. Process messages
python process.py

# 5. Ask questions about your messages
python ask_messages.py --question "What are the main themes in our conversations?"

Prerequisites

macOS: Required for accessing iMessage database
Python 3.7+: For running the processing and analysis scripts
Homebrew: For installing imessage-exporter (optional but recommended)
Ollama: For running the AI analysis locally

Installation

1. Export Your iMessage Data

Install and use imessage-exporter to export your messages:

# Install via Homebrew (macOS)
brew install imessage-exporter

# Or install via Cargo if you have Rust
cargo install imessage-exporter

2. Clone This Repository

git clone <repository-url>
cd imessage_llm

3. Create Virtual Environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

4. Install Python Dependencies

pip install -r requirements.txt

Usage

Step 1: Process Your Messages

First, ensure you have exported your iMessage data to HTML format. You can use imessage-exporter for this:

# Export messages to HTML (if not done already)
imessage-exporter --format html --output ./data/

Then run the processing pipeline to convert HTML messages to structured CSV:

python process.py

This will:

Extract messages from data/message.html
Apply intelligent conversation grouping
Save results to data/messages.csv
Display comprehensive analytics

Input File Requirements:

Place your iMessage HTML export in the data/ directory as message.html
The tool expects HTML format exported by imessage-exporter or similar tools
The HTML should contain message containers with timestamps, senders, and message content

Step 2: Ask Questions About Your Messages

After processing, use the AI-powered analysis tool to ask questions:

python ask_messages.py --question "What are the main themes in our conversations?"

This tool uses Ollama with DeepSeek-R1 to provide intelligent analysis of your message history.

AI-Powered Message Analysis (ask_messages.py)

Features

LLM-based Analysis: Uses Ollama with DeepSeek-R1 model for intelligent conversation analysis
Conversation-aware: Analyzes individual conversations or groups maintaining context
Smart Chunking: Automatically handles large datasets by splitting them intelligently
Response Caching: Caches analysis results for faster repeated queries
Conversation History: Saves and continues analysis sessions [[memory:8547400]]
Token Optimization: Uses compact message format and DeepSeek tokenizer for efficient processing

Usage Examples

Basic Analysis

# Ask a general question about all messages
python ask_messages.py --question "What are the main themes in our conversations?"

# Analyze specific years
python ask_messages.py --year 2017 --question "What happened this year?"
python ask_messages.py --years 2017 2018 2019 --question "How did our relationship evolve?"

Conversation-Specific Analysis

# List available conversations
python ask_messages.py --list-conversations

# Analyze a single conversation
python ask_messages.py --conversation 42 --question "What is this conversation about?"

# Analyze multiple conversations
python ask_messages.py --conversations 10 15 20 --question "Compare these conversations"

# Filter by conversation length
python ask_messages.py --min-messages 50 --question "What are themes in longer conversations?"
python ask_messages.py --max-messages 10 --question "What are quick exchanges about?"

Conversation History Management

# List saved conversation histories
python ask_messages.py --list-history

# Continue a previous analysis session
python ask_messages.py --load-conversation my_analysis.json --question "Tell me more about that"

# Save to specific file (auto-saves by default)
python ask_messages.py --question "Analyze themes" --save-conversation analysis_2024.json

# Disable auto-saving for one-off queries
python ask_messages.py --question "Quick check" --no-save

Cache Management

# List cached results
python ask_messages.py --list-cache

# Force reprocessing (ignore cache)
python ask_messages.py --question "Analyze again" --force-reprocess

# Clear all cached results
python ask_messages.py --clear-cache

Requirements

Ollama: Must be running on localhost:11434
DeepSeek-R1 Model: Install with ollama pull deepseek-r1:14b
Processed Data: Run process.py first to generate data/messages.csv

Message Processing (process.py)

Features

HTML to CSV Conversion: Extracts messages from iMessage HTML exports
Intelligent Conversation Grouping: Uses multiple algorithms to detect conversation boundaries
Performance Optimized: Handles multi-gigabyte files with streaming and chunked processing
Statistical Analysis: Provides detailed metrics on conversations and grouping quality

Advanced Usage

Functions can be imported and used programmatically:

from process import extract_messages, assign_conversation_ids, save_to_csv, analyze_conversations

# Extract messages from HTML
messages = extract_messages('data/message.html')

# Group into conversations
messages_with_conversations = assign_conversation_ids(messages)

# Save to CSV
save_to_csv(messages_with_conversations, 'data/output.csv')

# Analyze results
analyze_conversations(messages_with_conversations)

Key Functions

process.py Functions

`extract_messages(html_file)`

Extracts messages from HTML file using regex parsing
Returns list of message dictionaries with date, sender, and message content
Handles large files efficiently with chunked processing

`assign_conversation_ids(messages, base_gap_hours=6)`

Intelligently groups messages into conversations
Uses multiple detection algorithms for optimal accuracy
Returns messages with added conversation_id field

`analyze_conversations(messages)`

Provides comprehensive conversation analytics
Shows statistics on message counts, durations, turn exchanges
Identifies notable conversations and patterns

`analyze_conversation_quality(messages, sample_size=None)`

Evaluates the quality of conversation grouping
Uses statistical sampling for large datasets
Provides confidence scores and decision analysis

`analyze_existing_data(csv_file='data/messages.csv')`

Analyzes pre-processed CSV data without reprocessing
Useful for exploring already converted data

ask_messages.py Functions

`ask_ollama(messages, question, model, save_history, history_file, conversation_history, force_reprocess)`

Main function for analyzing messages with LLM
Handles conversation chunking and token optimization
Supports caching and conversation history

`analyze_individual_conversation(conv_id, csv_file, question, model, save_history, history_file, conversation_history, force_reprocess)`

Analyzes a single conversation by ID
Useful for deep-diving into specific conversations

`group_messages_by_conversation(messages)`

Groups messages by their conversation_id
Returns dictionary mapping conversation IDs to message lists

`filter_conversations_by_criteria(conversations, min_messages, max_messages, include_ids, exclude_ids)`

Filters conversations based on various criteria
Useful for focusing analysis on specific conversation types

`create_conversation_chunks(conversations, question, max_context_tokens)`

Creates token-optimized chunks for LLM processing
Preserves conversation boundaries while respecting token limits

Configuration Options

Adaptive Sampling

The tool automatically determines optimal sample sizes based on dataset size and analysis type:

General analysis: 1,000+ messages minimum
Conversation quality: 2,000+ messages minimum
Momentum analysis: 500+ messages minimum
Context analysis: 300+ messages minimum

Window Sizes

Context window sizes adapt based on dataset size:

Small datasets (<1,000): Base window sizes (3-5 messages)
Medium datasets (1,000-10,000): Base + 1
Large datasets (10,000-50,000): Base + 2
Very large datasets (50,000+): Base + 3 (max 12)

Confidence Thresholds

Decision confidence levels for conversation breaks:

High confidence (0.9-1.0): Time-based, conversation starters/enders
Medium confidence (0.7-0.8): Topic changes, momentum shifts
Lower confidence (0.5-0.6): Emotional shifts, activity transitions

Data Flow

Input: iMessage HTML export (data/message.html)
Processing: process.py extracts and groups messages
Output: Structured CSV (data/messages.csv) with conversation IDs
Analysis: ask_messages.py uses the CSV for AI-powered insights
Results: Cached analyses and saved conversation histories

Output Formats

CSV File Structure (data/messages.csv)

The processed CSV file contains the following columns:

date: Timestamp in YYYY-MM-DD HH:MM:SS format
sender: Either "Me" or "Them" (standardized)
message: The message content
conversation_id: Unique identifier for each conversation

Cached Analysis Files (cache/)

JSON files containing analysis results
Keyed by conversation ID, question, and message hash
Includes response text, metadata, and timestamps

Conversation History Files (conversations/)

JSON files tracking analysis sessions
Contains questions, responses, and conversation metadata
Automatically saved with timestamps or custom filenames

Sample Analytics Output

═══════════════════════════════════ CONVERSATION ANALYSIS ════════════════════════════════════
                                Analyzing conversation patterns...

✓ Assigned 1,234 conversations in 2.3s

────────────────────────────────── Decision Statistics ──────────────────────────────────
  Time Based: 856 (69.4%)
  Topic Change: 187 (15.2%)
  Starter Based: 98 (7.9%)
  Momentum Based: 93 (7.5%)

────────────────────────────────── Message Statistics ──────────────────────────────────
  Average messages per conversation: 8.4
  Shortest conversation: 1 messages
  Longest conversation: 342 messages
  Single message conversations: 234
  Short conversations (2-10 msgs): 789
  Long conversations (11+ msgs): 211

Performance Considerations

Large Files: The tool efficiently handles multi-gigabyte HTML files using streaming
Memory Usage: Memory-efficient processing keeps RAM usage reasonable even for large datasets
Processing Speed: Optimized regex parsing and analysis algorithms for fast processing
Sample Analysis: Automatic sampling for very large datasets to maintain reasonable processing times

Troubleshooting

Common Issues

Processing Issues (process.py)

"No such file or directory" error: Ensure data/message.html exists
Memory errors: For very large files, the tool uses chunked processing automatically
Encoding issues: The tool handles UTF-8 encoding for international characters and emojis
Empty output: Check that your HTML file follows the expected iMessage export format

Analysis Issues (ask_messages.py)

"Ollama not running" error: Start Ollama with ollama serve
"Model not found" error: Install DeepSeek with ollama pull deepseek-r1:14b
"No conversation_id" error: Run process.py first to process your messages
Token limit exceeded: The tool automatically chunks large datasets, but very long single conversations may need special handling
Cache issues: Use --clear-cache to reset cached results if experiencing problems

Performance Tips

For very large datasets (100k+ messages), consider using the quality analysis on samples first
The tool automatically optimizes processing based on dataset size
Progress indicators help track processing on large files

Dependencies

Key dependencies (see requirements.txt for complete list):

beautifulsoup4: HTML parsing (backup method)
tiktoken: Token counting for LLM context management
requests: API communication with Ollama
Standard library: Primary processing uses built-in Python modules for speed

External Requirements

Ollama: Local LLM server (https://ollama.ai)
DeepSeek-R1 Model: Run ollama pull deepseek-r1:14b after installing Ollama

Technical Details

Conversation Detection Algorithm

The conversation grouping uses a multi-layered approach:

Primary Breaks: Major time gaps, strong conversation indicators
Secondary Breaks: Momentum changes, topic shifts, pattern disruptions
Contextual Analysis: Surrounding message analysis for continuity
Preservation Rules: Protects rapid exchanges and high-engagement conversations

Statistical Significance

Uses proper sample size calculations with confidence intervals
Applies finite population corrections for small datasets
Validates sample adequacy for statistical significance

Error Handling

Graceful handling of malformed HTML
Robust timestamp parsing with fallback methods
Comprehensive error reporting with clear messages

Typical Workflow

Export Messages: Use imessage-exporter to export your iMessage conversations to HTML format
Prepare Data: Place the HTML export in data/message.html
Process Data: Run process.py to convert HTML to structured CSV with conversation IDs
Explore Conversations: Use --list-conversations to see available conversations
Analyze: Ask questions about specific conversations, time periods, or the entire dataset
Iterate: Continue analysis sessions with follow-up questions using conversation history
Cache Results: Reuse cached analyses for faster repeated queries

Data Export Options

While we recommend imessage-exporter for its reliability and features, you can use any tool that exports iMessages to HTML format with the following structure:

Message containers with class "message"
Timestamps in a "timestamp" span
Sender names in a "sender" span
Message content in a "bubble" span

Common export tools:

imessage-exporter (recommended): Fast, reliable, multiple format support
iMessage Export: GUI tool for macOS
PhoneView: Commercial tool with export features
Manual export from Messages app (limited functionality)

Use Cases

Relationship Analysis: Understand how relationships evolved over time
Topic Discovery: Identify recurring themes and conversation patterns
Memory Lane: Revisit specific conversations or time periods
Communication Patterns: Analyze response times, conversation lengths, and engagement
Emotional Journey: Track emotional tones and relationship dynamics
Content Search: Find specific topics or events in your message history

Contributing

When contributing to this project, please:

Follow the existing code style and formatting
Add appropriate error handling and logging
Update documentation for new features
Test with various HTML export formats

License

This project is open source. Please see the license file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
ask_messages.py		ask_messages.py
formatting_utils.py		formatting_utils.py
process.py		process.py
prompts.py		prompts.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

iMessage LLM Processor

Features

🔧 Core Processing

🧠 Smart Conversation Detection

📊 Advanced Analytics

🎨 Rich Terminal Output

🤖 AI-Powered Analysis

Quick Start

Step 0: Export Your iMessage Data

Steps 1-5: Process and Analyze

Prerequisites

Installation

1. Export Your iMessage Data

2. Clone This Repository

3. Create Virtual Environment (recommended)

4. Install Python Dependencies

Usage

Step 1: Process Your Messages

Step 2: Ask Questions About Your Messages

AI-Powered Message Analysis (ask_messages.py)

Features

Usage Examples

Basic Analysis

Conversation-Specific Analysis

Conversation History Management

Cache Management

Requirements

Message Processing (process.py)

Features

Advanced Usage

Key Functions

process.py Functions

extract_messages(html_file)

assign_conversation_ids(messages, base_gap_hours=6)

analyze_conversations(messages)

analyze_conversation_quality(messages, sample_size=None)

analyze_existing_data(csv_file='data/messages.csv')

ask_messages.py Functions

ask_ollama(messages, question, model, save_history, history_file, conversation_history, force_reprocess)

analyze_individual_conversation(conv_id, csv_file, question, model, save_history, history_file, conversation_history, force_reprocess)

group_messages_by_conversation(messages)

filter_conversations_by_criteria(conversations, min_messages, max_messages, include_ids, exclude_ids)

create_conversation_chunks(conversations, question, max_context_tokens)

Configuration Options

Adaptive Sampling

Window Sizes

Confidence Thresholds

Data Flow

Output Formats

CSV File Structure (data/messages.csv)

Cached Analysis Files (cache/)

Conversation History Files (conversations/)

Sample Analytics Output

Performance Considerations

Troubleshooting

Common Issues

Processing Issues (process.py)

Analysis Issues (ask_messages.py)

Performance Tips

Dependencies

External Requirements

Technical Details

Conversation Detection Algorithm

Statistical Significance

Error Handling

Typical Workflow

Data Export Options

Use Cases

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

`extract_messages(html_file)`

`assign_conversation_ids(messages, base_gap_hours=6)`

`analyze_conversations(messages)`

`analyze_conversation_quality(messages, sample_size=None)`

`analyze_existing_data(csv_file='data/messages.csv')`

`ask_ollama(messages, question, model, save_history, history_file, conversation_history, force_reprocess)`

`analyze_individual_conversation(conv_id, csv_file, question, model, save_history, history_file, conversation_history, force_reprocess)`

`group_messages_by_conversation(messages)`

`filter_conversations_by_criteria(conversations, min_messages, max_messages, include_ids, exclude_ids)`

`create_conversation_chunks(conversations, question, max_context_tokens)`

Packages