BallTales: Where Baseball Data Meets Natural Conversation
Inspiration
Baseball statistics have always told incredible stories, but they've been locked behind complex interfaces and endless menus. While the MLB Stats API offers a vast ocean of data, most applications force users to learn their language and clicking and navigating through endless endless buttons menus, instead of speaking the user's language, . This sparked our transformative insight: What if we could eliminate this barrier entirely? What if exploring baseball statistics felt as natural as having a conversation with a knowledgeable friend?
The breakthrough came when we realized we could combine the sophisticated structured output capabilities of Google's Gemini with the rich MLB Stats API to create a truly intuitive baseball companion. Instead of users learning how to navigate our app, our app learns how to understand them.
What it does
BallTales transforms complex baseball queries into meaningful conversations and rich visual experiences through a sophisticated pipeline. At its core, it's powered by a series of interconnected systems that work together to understand, process, and present baseball data in an intuitive way.
Documentation Processing System
Before diving into the user-facing features, we built a crucial foundation: an intelligent MLB StatsAPI documentation processor that transforms raw API documentation into structured knowledge:
Documentation Scraping and Processing
Knowledge Structuring:
- Converted unstructured StatsAPI docs markdown files into normalized JSON through Gemini's structured output capabilities
- Generated comprehensive function signature mappings
- Created detailed parameter type systems
- Built relationship graphs between endpoints
- Maintained documentation versioning and updates
Schema Generation:
- Developed script that calls statsapi endpoint, extracts response schema for LLM to expect what to get
- Created parameter validation rules
- Built comprehensive type mappings
- Generated parameter relationship graphs
Dynamic Code Generation & Execution System
At the heart of BallTales lies a sophisticated code generation and execution pipeline that brings user queries to life:
- Code Generation:
- Uses Gemini model for code creation
- Implements proper error handling
- Ensures type safety
- Optimizes performance
- Maintains security standards
Example generation prompt:
Generate code that calls statsapi.{function_name} with these parameters:
{parameters}
Function documentation: {function_docs}
Requirements:
1. Import only statsapi and json
2. Use explicit parameter values
3. No try-catch blocks
4. For multiple values, use list comprehension
5. Return results with print(json.dumps())
Using gemini-1.5-pro exclusively for optimal code quality.
Secure Execution Environment
The MLBPythonREPL provides a controlled environment for code execution:
Security Features:
- Temporary file-based execution
- Strict timeout enforcement
- Controlled imports
- Resource limits
- Environment isolation
Error Management:
- Graceful timeout handling
- Error message sanitization
- Automatic retries
- Detailed logging
- User-friendly messages
Natural Language Understanding
The first stage of the agent is powered by a carefully structured MLB Agent that uses Gemini's advanced capabilities:
Intent Analysis
- Breaks down natural language queries into structured intents
- Understands context, timing, and complex relationships
- Identifies statistical needs and data dependencies
- Outputs precisely structured JSON for reliable processing
- Maintains conversation history for contextual understanding
Query Decomposition
- Extracts entities (players, teams, dates, statistics)
- Identifies temporal context (historical, current, predictive)
- Recognizes comparison requests and relationship queries
- Maps casual language to precise statistical concepts
- Handles ambiguity through contextual resolution
Intelligent Data Planning
A dynamic planning system that orchestrates data retrieval:
Plan Generation
- Gemini generates structured execution plans
- Maps intents to optimal API calls and functions
- Establishes clear data dependencies between steps
- Optimizes for minimal API usage
- Includes fallback strategies for reliability
Execution Flow
- Step-by-step orchestration of API/function calls through dynamic code generation & url formatting
- Dynamic data extraction and transformation
- Maintains context across execution steps
- Handles errors gracefully with fallbacks
- Aggregates results into coherent responses
Intelligent Chart Processing
The project features a sophisticated chart generation system that automatically selects and configures the most appropriate visualizations:
Chart Intelligence
Smart Chart Selection:
- Analyzes data structure and relationships
- Matches patterns to optimal chart types
- Considers data dimensionality and scale
- Evaluates categorical vs continuous data
- Selects from multiple chart variants based on context
Dynamic Data Transformation:
- Automatically reshapes data to match chart schemas
- Handles multiple data series intelligently
- Manages color schemes and styling
- Applies appropriate scales and normalization
- Ensures data consistency across visualizations
Smart Media Integration
Sophisticated media matching and integration:
Context Analysis
- Analyzes conversation flow for media opportunities
- Identifies moments that benefit from visual enhancement
- Matches queries to relevant media content
- Ensures media adds value to the conversation
Media Selection
- Curated database of spectacular plays and moments
- Intelligent matching of home run characteristics
- Dynamic player and team image integration
- Statistical visualization generation
- Real-time media relevance scoring
Intelligent User Adaptation
A sophisticated system for understanding user preferences:
Conversation Analysis
- Triggers every 3 messages for optimal learning
- Gemini analyzes conversation patterns
- Extracts implicit and explicit preferences
- Maintains preference consistency
- Updates user profiles dynamically and implicitly for a more personalized experience.
Technical Architecture
Backend
- FastAPI for high-performance API endpoints
- Python-based MLB Agent powered by Gemini
- Secure code execution environment
- Dynamic code generation pipeline
- Comprehensive error handling
- MLB Stats API integration
- User preference database
- Media content indexing
Frontend
- Next.js with TypeScript for reliability
- Framer Motion animations
- Tailwind CSS styling
- Real-time chat interface
- Dynamic media rendering
- Profile management
- Progressive loading
- Error recovery
- Localization support
How We Built It
Development Process
Documentation Processing Pipeline:
- Developed Python, Gemini-powered markdown to JSON converter
- Built validation and error checking systems
- Generated comprehensive API schemas
Core Systems:
- Implemented MLB Agent with Gemini integration
- Built dynamic code generation system
- Created secure execution environment
- Developed media resolution engine
- Built user adaptation system
Integration and Testing:
- Combined core systems into unified pipeline
- Implemented comprehensive error handling
- Created fallback strategies
- Built monitoring and logging
User Experience:
- Designed intuitive chat interface
- Implemented responsive animations
- Created dynamic media display
- Built user preference system
Challenges we ran into
1. Code Generation Complexity
- Ensuring generated code safety and security
- Managing execution timeouts and resources
- Handling edge cases in API responses
- Maintaining code quality and performance
- Building robust error recovery systems
2. Data Flow Complexity
- Managing complex data flows between multiple API endpoints
- Maintaining context and relationships between data points
- Handling data transformations efficiently
- Managing state across the pipeline
3. Performance Optimization
- Handling large datasets efficiently
- Optimizing API calls to minimize latency
- Managing memory usage with large response payloads
- Balancing response time with data completeness
4. Natural Language Understanding
- Accurately parsing baseball-specific terminology
- Handling ambiguous queries
- Maintaining context across conversations
- Managing entity resolution
5. Media Integration
- Finding and matching relevant media content
- Optimizing media delivery
- Handling various media formats
- Managing media metadata
Accomplishments that we're proud of
1. Sophisticated Intent Analysis
- Built system understanding complex baseball queries
- Created accurate entity extraction
- Implemented context-aware processing
- Developed robust error handling
2. Optimized Data Retrieval
- Created efficient planning system
- Minimized API calls
- Maximized data utility
- Implemented smart caching
3. User Experience
- Created intuitive interface
- Implemented smooth animations
- Built responsive design
- Developed error recovery
4. Secure Code Generation
- Built robust code generation pipeline
- Created secure execution environment
- Implemented comprehensive error handling
- Developed performance optimization system
What we learned
1. Code Generation & Security
- Advanced code generation techniques
- Secure execution environment design
- Error handling strategies
- Performance optimization methods
2. AI Integration
- Deep understanding of LLM capabilities
- Natural language processing techniques
- Context management strategies
- Error handling approaches
3. Data Architecture
- Complex data flow management
- Transformation pipeline design
- State management techniques
- Caching strategies
4. User Experience
- Intuitive interface design
- Animation implementation
- Error handling presentation
- User feedback integration
What's next for BallTales
1. Enhanced Code Generation
- Machine learning-based optimization
- Automated code review
- Performance prediction
- Security analysis
- Quality assurance automation
2. Advanced Analytics
- Advanced statistical analysis
- Predictive modeling for game outcomes
- Deeper historical comparisons
- Machine learning integration
3. Rich Media Integration
- Live game integration
- Real-time highlights
- Interactive visualizations
- AR/VR experiences
4. Social Features
- User communities
- Shared analysis
- Collaborative viewing
- Social recommendations
5. Advanced Personalization
- Learning from user interactions
- Customized insights
- Tailored content delivery
- Preference prediction
6. Platform Expansion
- Mobile applications
- Browser extensions
- API access for developers
- Integration capabilities
This is BallTales: where baseball data finally speaks your language.
Built With
- fastapi
- gemini-api
- google-cloud-run
- nextjs
- postgresql
- prisma
- pydantic
- python
- typescript

Log in or sign up for Devpost to join the conversation.