BallTales: Where Baseball Data Meets Natural Conversation

Inspiration

Baseball statistics have always told incredible stories, but they've been locked behind complex interfaces and endless menus. While the MLB Stats API offers a vast ocean of data, most applications force users to learn their language and clicking and navigating through endless endless buttons menus, instead of speaking the user's language, . This sparked our transformative insight: What if we could eliminate this barrier entirely? What if exploring baseball statistics felt as natural as having a conversation with a knowledgeable friend?

The breakthrough came when we realized we could combine the sophisticated structured output capabilities of Google's Gemini with the rich MLB Stats API to create a truly intuitive baseball companion. Instead of users learning how to navigate our app, our app learns how to understand them.

What it does

BallTales transforms complex baseball queries into meaningful conversations and rich visual experiences through a sophisticated pipeline. At its core, it's powered by a series of interconnected systems that work together to understand, process, and present baseball data in an intuitive way.

Documentation Processing System

Before diving into the user-facing features, we built a crucial foundation: an intelligent MLB StatsAPI documentation processor that transforms raw API documentation into structured knowledge:

Documentation Scraping and Processing

  • Knowledge Structuring:

    • Converted unstructured StatsAPI docs markdown files into normalized JSON through Gemini's structured output capabilities
    • Generated comprehensive function signature mappings
    • Created detailed parameter type systems
    • Built relationship graphs between endpoints
    • Maintained documentation versioning and updates
  • Schema Generation:

    • Developed script that calls statsapi endpoint, extracts response schema for LLM to expect what to get
    • Created parameter validation rules
    • Built comprehensive type mappings
    • Generated parameter relationship graphs

Dynamic Code Generation & Execution System

At the heart of BallTales lies a sophisticated code generation and execution pipeline that brings user queries to life:

  • Code Generation:
    • Uses Gemini model for code creation
    • Implements proper error handling
    • Ensures type safety
    • Optimizes performance
    • Maintains security standards

Example generation prompt:

Generate code that calls statsapi.{function_name} with these parameters:
{parameters}
Function documentation: {function_docs}

Requirements:
1. Import only statsapi and json
2. Use explicit parameter values
3. No try-catch blocks
4. For multiple values, use list comprehension
5. Return results with print(json.dumps())

Using gemini-1.5-pro exclusively for optimal code quality.

Secure Execution Environment

The MLBPythonREPL provides a controlled environment for code execution:

  • Security Features:

    • Temporary file-based execution
    • Strict timeout enforcement
    • Controlled imports
    • Resource limits
    • Environment isolation
  • Error Management:

    • Graceful timeout handling
    • Error message sanitization
    • Automatic retries
    • Detailed logging
    • User-friendly messages

Natural Language Understanding

The first stage of the agent is powered by a carefully structured MLB Agent that uses Gemini's advanced capabilities:

Intent Analysis

  • Breaks down natural language queries into structured intents
  • Understands context, timing, and complex relationships
  • Identifies statistical needs and data dependencies
  • Outputs precisely structured JSON for reliable processing
  • Maintains conversation history for contextual understanding

Query Decomposition

  • Extracts entities (players, teams, dates, statistics)
  • Identifies temporal context (historical, current, predictive)
  • Recognizes comparison requests and relationship queries
  • Maps casual language to precise statistical concepts
  • Handles ambiguity through contextual resolution

Intelligent Data Planning

A dynamic planning system that orchestrates data retrieval:

Plan Generation

  • Gemini generates structured execution plans
  • Maps intents to optimal API calls and functions
  • Establishes clear data dependencies between steps
  • Optimizes for minimal API usage
  • Includes fallback strategies for reliability

Execution Flow

  • Step-by-step orchestration of API/function calls through dynamic code generation & url formatting
  • Dynamic data extraction and transformation
  • Maintains context across execution steps
  • Handles errors gracefully with fallbacks
  • Aggregates results into coherent responses

Intelligent Chart Processing

The project features a sophisticated chart generation system that automatically selects and configures the most appropriate visualizations:

Chart Intelligence

  • Smart Chart Selection:

    • Analyzes data structure and relationships
    • Matches patterns to optimal chart types
    • Considers data dimensionality and scale
    • Evaluates categorical vs continuous data
    • Selects from multiple chart variants based on context
  • Dynamic Data Transformation:

    • Automatically reshapes data to match chart schemas
    • Handles multiple data series intelligently
    • Manages color schemes and styling
    • Applies appropriate scales and normalization
    • Ensures data consistency across visualizations

Smart Media Integration

Sophisticated media matching and integration:

Context Analysis

  • Analyzes conversation flow for media opportunities
  • Identifies moments that benefit from visual enhancement
  • Matches queries to relevant media content
  • Ensures media adds value to the conversation

Media Selection

  • Curated database of spectacular plays and moments
  • Intelligent matching of home run characteristics
  • Dynamic player and team image integration
  • Statistical visualization generation
  • Real-time media relevance scoring

Intelligent User Adaptation

A sophisticated system for understanding user preferences:

Conversation Analysis

  • Triggers every 3 messages for optimal learning
  • Gemini analyzes conversation patterns
  • Extracts implicit and explicit preferences
  • Maintains preference consistency
  • Updates user profiles dynamically and implicitly for a more personalized experience.

Technical Architecture

Backend

  • FastAPI for high-performance API endpoints
  • Python-based MLB Agent powered by Gemini
  • Secure code execution environment
  • Dynamic code generation pipeline
  • Comprehensive error handling
  • MLB Stats API integration
  • User preference database
  • Media content indexing

Frontend

  • Next.js with TypeScript for reliability
  • Framer Motion animations
  • Tailwind CSS styling
  • Real-time chat interface
  • Dynamic media rendering
  • Profile management
  • Progressive loading
  • Error recovery
  • Localization support

How We Built It

Development Process

  1. Documentation Processing Pipeline:

    • Developed Python, Gemini-powered markdown to JSON converter
    • Built validation and error checking systems
    • Generated comprehensive API schemas
  2. Core Systems:

    • Implemented MLB Agent with Gemini integration
    • Built dynamic code generation system
    • Created secure execution environment
    • Developed media resolution engine
    • Built user adaptation system
  3. Integration and Testing:

    • Combined core systems into unified pipeline
    • Implemented comprehensive error handling
    • Created fallback strategies
    • Built monitoring and logging
  4. User Experience:

    • Designed intuitive chat interface
    • Implemented responsive animations
    • Created dynamic media display
    • Built user preference system

Challenges we ran into

1. Code Generation Complexity

  • Ensuring generated code safety and security
  • Managing execution timeouts and resources
  • Handling edge cases in API responses
  • Maintaining code quality and performance
  • Building robust error recovery systems

2. Data Flow Complexity

  • Managing complex data flows between multiple API endpoints
  • Maintaining context and relationships between data points
  • Handling data transformations efficiently
  • Managing state across the pipeline

3. Performance Optimization

  • Handling large datasets efficiently
  • Optimizing API calls to minimize latency
  • Managing memory usage with large response payloads
  • Balancing response time with data completeness

4. Natural Language Understanding

  • Accurately parsing baseball-specific terminology
  • Handling ambiguous queries
  • Maintaining context across conversations
  • Managing entity resolution

5. Media Integration

  • Finding and matching relevant media content
  • Optimizing media delivery
  • Handling various media formats
  • Managing media metadata

Accomplishments that we're proud of

1. Sophisticated Intent Analysis

  • Built system understanding complex baseball queries
  • Created accurate entity extraction
  • Implemented context-aware processing
  • Developed robust error handling

2. Optimized Data Retrieval

  • Created efficient planning system
  • Minimized API calls
  • Maximized data utility
  • Implemented smart caching

3. User Experience

  • Created intuitive interface
  • Implemented smooth animations
  • Built responsive design
  • Developed error recovery

4. Secure Code Generation

  • Built robust code generation pipeline
  • Created secure execution environment
  • Implemented comprehensive error handling
  • Developed performance optimization system

What we learned

1. Code Generation & Security

  • Advanced code generation techniques
  • Secure execution environment design
  • Error handling strategies
  • Performance optimization methods

2. AI Integration

  • Deep understanding of LLM capabilities
  • Natural language processing techniques
  • Context management strategies
  • Error handling approaches

3. Data Architecture

  • Complex data flow management
  • Transformation pipeline design
  • State management techniques
  • Caching strategies

4. User Experience

  • Intuitive interface design
  • Animation implementation
  • Error handling presentation
  • User feedback integration

What's next for BallTales

1. Enhanced Code Generation

  • Machine learning-based optimization
  • Automated code review
  • Performance prediction
  • Security analysis
  • Quality assurance automation

2. Advanced Analytics

  • Advanced statistical analysis
  • Predictive modeling for game outcomes
  • Deeper historical comparisons
  • Machine learning integration

3. Rich Media Integration

  • Live game integration
  • Real-time highlights
  • Interactive visualizations
  • AR/VR experiences

4. Social Features

  • User communities
  • Shared analysis
  • Collaborative viewing
  • Social recommendations

5. Advanced Personalization

  • Learning from user interactions
  • Customized insights
  • Tailored content delivery
  • Preference prediction

6. Platform Expansion

  • Mobile applications
  • Browser extensions
  • API access for developers
  • Integration capabilities

This is BallTales: where baseball data finally speaks your language.

Built With

Share this project:

Updates