Hands-free, multimodal AI diagnostic assistant for veterinarians and pet owners
VetAI enables veterinarians to diagnose animals through voice and vision while keeping their hands on the patient — and gives pet owners plain-English explanations with cost estimates. Built for TreeHacks 2026.
Veterinarians examine 20-30+ animals daily in high-pressure, hands-on environments. Current diagnostic tools require:
- Stopping the examination to type queries
- Navigating complex databases with dirty hands
- Separate workflows for image analysis
- No conversational back-and-forth with AI
Pet owners face a different problem: they get technical diagnoses they can't understand, don't know if it's urgent, and have no way to connect with others going through the same thing.
VetAI solves both sides.
VetAI is a voice-first, vision-enabled diagnostic assistant with two modes:
- Voice Queries - Ask questions hands-free via Whisper STT, get spoken responses via OpenAI TTS
- Image Analysis - Snap photos mid-exam for AI-powered visual diagnosis with Claude Sonnet
- Agentic Reasoning - Claude searches veterinary databases and builds differential diagnoses with tool use
- Evidence-Based Research - Perplexity Sonar retrieves peer-reviewed citations for every diagnosis
- Clarifying Questions - When confidence is low (<85%), the AI generates targeted follow-up questions to refine the diagnosis
- Plain English Diagnoses - Technical terms converted to simple language ("atopic dermatitis" -> "allergies causing skin irritation")
- Urgency Assessment - Clear guidance: emergency, high confidence, moderate, or low confidence
- Cost Estimates - Estimated vet visit cost ranges (routine $80-250, specialist $200-800, emergency $500-2000)
- Emergency Detection - Automatic flagging of conditions like bloat, seizures, poisoning
- Community Support - Find other pet owners with similar conditions nearby
- Activity Feed - See community engagement, trending conditions, and success stories
Vet examining dog with skin rash:
1. Press mic: "What causes red patches on dog abdomens?"
2. Snap photo of the affected area
3. AI responds (voice): "Based on the image, this appears to be atopic
dermatitis with 65% confidence..."
4. AI shows clarifying questions:
- "Is the rash seasonal or year-round?"
- "Are the paws and face also affected?"
- "Did symptoms start before age 3?"
5. Vet answers via voice: "Yes, it's seasonal and the paws are red too"
6. AI refines: "With seasonal presentation and paw involvement,
atopic dermatitis is confirmed. Recommending allergy testing..."
Pet owner at home:
1. Takes photo of their dog's rash
2. Gets: "Allergies Causing Skin Irritation" (not "Atopic Dermatitis")
3. Urgency: "This is a likely diagnosis, but a vet should confirm."
4. Cost estimate: $80-$250 (routine visit)
5. Finds 3 other dog owners with allergies in their state
- Voice Conversation: OpenAI Whisper (STT) + TTS for hands-free interaction
- Vision Analysis: Claude Sonnet for multimodal image diagnosis
- Agentic Backend: Tool-calling architecture with disease database search, treatment protocols, and differential diagnosis
- Research Citations: Perplexity Sonar API retrieves peer-reviewed veterinary literature with inline citations
- Clarifying Questions: When diagnosis confidence < 85%, generates 2-3 targeted yes/no questions to help narrow down the diagnosis
- Consumer Mode: Plain English diagnoses with urgency levels, cost estimates, and emergency detection for pet owners
- Community Support Groups: Find relevant support groups by condition and species with fuzzy matching
- Pet Profile Matching: Connect pet owners with similar conditions in the same area (same species, overlapping conditions, same state)
- Activity Feed: Community engagement metrics, trending conditions, and statistics
- Multi-Turn Voice Sessions: Clarifying question answers carry context across follow-up voice interactions
- Mobile App: Cross-platform React Native (Expo) with camera integration
- Structured Outputs: Confidence scores, risk levels, recommendations, and cited sources
- Frontend UI for Community Features: Screens for profiles, matching, groups, and activity feed
- Consumer Mode Toggle: UI switch between professional and consumer views
- Fine-Tuned VLM: Custom vision model trained on veterinary imagery
- Multi-Image Context: Accumulate photos across conversation for progressive diagnosis
| Layer | Technology |
|---|---|
| Frontend | React Native (Expo), TypeScript, expo-av, expo-image-picker |
| Backend | FastAPI (Python 3.11+), Pydantic v2 |
| Vision AI | Anthropic Claude Sonnet (claude-sonnet-4-20250514) |
| Speech | OpenAI Whisper (STT) + OpenAI TTS (tts-1, alloy voice) |
| Research | Perplexity Sonar Pro API |
| Tunneling | ngrok (for Expo Go device testing) |
+---------------------+
| Mobile App |
| React Native/Expo |
| - Voice Input |
| - Camera Capture |
| - Results Display |
+---------+-----------+
|
+---------v-----------+
| FastAPI Backend |
| |
| POST /analyze | <- Image -> diagnosis (professional or consumer)
| POST /chat | <- Agentic chat with tools
| POST /voice/query | <- Audio + optional image
| GET /voice/audio | <- Cached TTS playback
| GET /health | <- Status check
| |
| Community & Feed |
| POST /api/community/profile
| GET /api/community/groups/{condition}/{species}
| GET /api/community/matches/{id}
| GET /api/activity/feed
| GET /api/activity/stats
| GET /api/activity/trending
+---------+-----------+
|
+-----------+-------+-------+--------------+
v v v v
+-----------+ +--------+ +------------+ +----------+
| Claude | |Whisper | | Perplexity | | OpenAI |
| Sonnet | | STT | | Sonar | | TTS |
|(Vision+ | | | | (Research) | | (Voice) |
| Agent) | | | | | | |
+-----------+ +--------+ +------------+ +----------+
Photo -> /analyze -> Claude Vision -> Structured diagnosis JSON
-> Perplexity Sonar -> Research citations
-> Claude (if conf < 85%) -> Clarifying questions
-> Consumer mode? -> Simplified response + cost estimate
-> Combined AnalysisResult response
Audio -> /voice/query -> Whisper STT -> Transcribed text
-> Claude Agent (with tool use) -> Text response
-> OpenAI TTS -> Audio response
-> Session tracking for multi-turn context
- Node.js 18+
- Python 3.11+
- Expo CLI (
npm install -g expo-cli) - API keys: Anthropic, OpenAI, Perplexity
- ngrok account (sign up)
cd backend
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your API keys:
# ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-proj-...
# PERPLEXITY_API_KEY=pplx-...
# Start server
uvicorn main:app --reload --host 0.0.0.0cd HealthDetect
cp .env.example .env
# Edit .env with your API URL:
# EXPO_PUBLIC_API_URL=https://your-subdomain.ngrok-free.dev
npm install
# Start Expo
npx expo start
# Or with tunnel for remote device testing:
npx expo start --tunnel# In a separate terminal, expose backend:
ngrok http 8000
# Copy the https URL (e.g., https://abc123.ngrok-free.dev)
# Update EXPO_PUBLIC_API_URL in .env- Phone and computer must be on the same Wi-Fi network
- Backend must run with
--host 0.0.0.0 - Use your machine's LAN IP or ngrok URL (not
localhost)
Returns server status and API key configuration.
{
"status": "ok",
"anthropic_key_set": true,
"openai_key_set": true
}Analyze a pet image for health conditions. Supports professional and consumer modes.
Request (multipart form):
| Field | Type | Required | Description |
|---|---|---|---|
image |
file | Yes | JPEG/PNG image of the animal |
species |
string | No | Animal species (e.g., "dog", "cat") |
symptoms |
string | No | Reported symptoms |
user_type |
string | No | "professional" (default) or "consumer" |
Professional mode (default):
curl -X POST http://localhost:8000/analyze \
-F "[email protected]" \
-F "species=dog" \
-F "symptoms=skin rash and redness"Consumer mode:
curl -X POST http://localhost:8000/analyze \
-F "[email protected]" \
-F "species=dog" \
-F "symptoms=skin rash" \
-F "user_type=consumer"Professional Response (AnalysisResult):
{
"id": "analysis-1739520000000",
"timestamp": "2026-02-14T10:00:00",
"species": "dog",
"diagnosis": {
"primary": {
"condition": "Canine Atopic Dermatitis",
"commonName": "Allergic skin disease",
"confidence": 0.65,
"riskLevel": "moderate"
},
"alternatives": [
{ "condition": "Contact Dermatitis", "confidence": 0.20 }
]
},
"recommendations": [
"Perform intradermal allergy testing",
"Consider a hypoallergenic diet trial"
],
"research_summary": "Canine atopic dermatitis is a genetically predisposed inflammatory...",
"clinical_citations": [
"https://pmc.ncbi.nlm.nih.gov/articles/PMC9204668/"
],
"clarifying_questions": [
"Is this a seasonal or year-round problem?",
"Are the feet and face primarily affected?"
]
}Consumer Response (ConsumerAnalysisResult):
{
"id": "analysis-1739520000000",
"timestamp": "2026-02-14T10:00:00",
"simple_diagnosis": "Allergies Causing Skin Irritation",
"technical_diagnosis": "Canine Atopic Dermatitis",
"urgency": "low_confidence",
"recommended_action": "We're not certain. A vet examination is recommended.",
"is_emergency": false,
"estimated_vet_cost": { "min": 80, "max": 250, "category": "routine" },
"confidence": 0.65,
"recommendations": [
"Perform intradermal allergy testing",
"Consider a hypoallergenic diet trial"
]
}Urgency levels:
"emergency"|"high_confidence"|"moderate_confidence"|"low_confidence"Cost categories:
"emergency"($500-2000) |"specialist"($200-800) |"routine"($80-250)
Multi-turn agentic chat with veterinary tool use.
Request:
{
"session_id": "chat-789",
"message": "What tests should I run for suspected pancreatitis?",
"image_context": { "species": "dog" }
}Response:
{
"session_id": "chat-789",
"message": "For suspected pancreatitis in dogs, I recommend...",
"toolResults": [
{
"tool": "get_treatment_protocols",
"input": { "condition": "pancreatitis", "species": "dog" },
"output": { "protocols": ["..."] }
}
]
}Available agent tools:
search_disease_database- Look up diseases by symptoms or speciesget_treatment_protocols- Retrieve standard treatment plansget_differential_diagnoses- Ranked differential diagnosis list
Voice-based query with optional image attachment.
Request (multipart form):
| Field | Type | Required | Description |
|---|---|---|---|
audio |
file | Yes | Audio recording (m4a, wav, mp3) |
species |
string | No | Animal species (default: "unknown") |
session_id |
string | No | Session ID for multi-turn context |
image |
file | No | Optional image for visual analysis |
curl -X POST http://localhost:8000/voice/query \
-F "[email protected]" \
-F "species=cat" \
-F "[email protected]"Retrieve cached TTS audio as MP3 stream.
Create an anonymous pet profile for matching.
Query params: pet_name, species, breed, conditions (repeatable), location_city, location_state
curl -X POST "http://localhost:8000/api/community/profile?pet_name=Buddy&species=dog&breed=Golden&conditions=allergies&conditions=arthritis&location_city=Seattle&location_state=WA"Find support groups for a condition. Supports exact match, fuzzy/substring match, and fallback.
curl http://localhost:8000/api/community/groups/allergies/dogFind nearby pets with similar conditions (same species, overlapping conditions, same state).
curl http://localhost:8000/api/community/matches/pet-12345Recent community activity feed, sorted by recency.
Community statistics (total diagnoses, active groups, satisfaction rating, etc.).
Top 5 trending health conditions this week, sorted by score.
treehacks2026/
├── backend/
│ ├── main.py # FastAPI app, CORS, route mounting
│ ├── requirements.txt # Python dependencies
│ ├── .env.example # Environment variable template
│ ├── test_full_suite.py # Comprehensive test suite (31 tests)
│ ├── models/
│ │ ├── schemas.py # Pydantic models (AnalysisResult, etc.)
│ │ ├── consumer_schemas.py # Consumer-friendly response model
│ │ └── community.py # PetProfile, SupportGroup models
│ ├── routes/
│ │ ├── analyze.py # POST /analyze (professional + consumer)
│ │ ├── chat.py # POST /chat
│ │ ├── voice_routes.py # POST /voice/query, GET /voice/audio
│ │ ├── community_routes.py # Community profiles, groups, matching
│ │ └── activity_routes.py # Activity feed, stats, trending
│ └── services/
│ ├── vlm.py # Claude Vision integration
│ ├── agent.py # Agentic chat + clarifying questions
│ ├── research_service.py # Perplexity Sonar research lookup
│ ├── voice_service.py # Whisper STT + OpenAI TTS
│ ├── consumer_mode.py # Diagnosis simplification + cost estimates
│ ├── community_service.py # Pet profiles, groups, matching
│ └── activity_service.py # Activity feed generation
│
└── HealthDetect/ # React Native (Expo) mobile app
├── app/
│ ├── (tabs)/
│ │ ├── index.tsx # Home screen with voice
│ │ ├── history.tsx # Analysis history
│ │ ├── learn.tsx # Veterinary articles
│ │ └── profile.tsx # Practice profile
│ ├── camera.tsx # Camera capture
│ ├── photo-review.tsx # Photo preview + species input
│ ├── processing.tsx # Analysis loading screen
│ └── results.tsx # Diagnosis results + research + questions
├── components/
│ └── VoiceButton.tsx # Animated voice input button
├── services/
│ ├── analysis-service.ts # Backend API client
│ └── voice-service.ts # Audio recording/playback
├── constants/
│ ├── types.ts # TypeScript type definitions
│ └── theme.ts # Design system (colors, typography)
└── context/
└── AnalysisContext.tsx # App-wide state management
Run the full test suite (31 tests across 8 sections):
cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 &
python test_full_suite.py| Section | Tests | What's Covered |
|---|---|---|
| 1. Core Infrastructure | 2 | Health check, all 11 routes registered |
| 2. Chat Endpoint | 4 | Basic query, tool use, multi-turn context, image context |
| 3. Clarifying Questions | 3 | High/boundary/low confidence thresholds |
| 4. Perplexity Research | 1 | Summary + peer-reviewed citations |
| 5. Community Features | 6 | Groups (exact/fuzzy/fallback), profiles, matching filters |
| 6. Activity Feed | 5 | Feed generation, limits, sorting, stats, trending |
| 7. Schema Validation | 3 | Field presence, defaults, JSON serialization |
| 8. Consumer Mode | 7 | Simplification, urgency, emergency detection, costs, schema |
| Decision | Rationale |
|---|---|
| Voice-first UX | Vets have hands on the animal during exams |
| Dual-mode analysis | Professional mode for vets, consumer mode for pet owners |
| Claude Sonnet for vision | Best multimodal reasoning for complex medical imagery |
| Perplexity Sonar for citations | Real-time access to peer-reviewed literature |
| Clarifying questions at <85% | Reduces diagnostic errors without slowing high-confidence results |
| Emergency auto-detection | Immediate flagging of life-threatening conditions (bloat, seizures, etc.) |
| Fuzzy condition matching | Claude returns clinical names; fuzzy match maps to support group templates |
| In-memory storage | Hackathon scope; swap for Redis/DB in production |
| FIFO audio cache (50 entries) | Prevents unbounded memory growth from TTS responses |
| Structured JSON output | Enables rich UI rendering with confidence bars, risk badges, etc. |
- Audio caching: 50-entry FIFO cache reduces repeated TTS calls
- Image compression: Mobile photos compressed before upload
- Rate limiting: 3-second cooldown between voice queries
- Duration limits: 30-second max recording length
- Model selection: Uses
tts-1(nottts-1-hd) for speed - Conditional API calls: Clarifying questions and research only when needed
Estimated costs per 100 queries:
| Query Type | Cost |
|---|---|
| Voice only (no image) | ~$2-3 |
| Voice + image | ~$15-20 |
| Image analysis only | ~$10-15 |
| Chat (text only) | ~$1-2 |
- Voice conversation (Whisper STT + OpenAI TTS)
- Image analysis with Claude Sonnet
- Agentic chat with tool use
- Perplexity Sonar research citations
- Clarifying questions for low-confidence diagnoses
- Consumer mode with plain English + cost estimates
- Community support groups and pet matching
- Activity feed and community stats
- Multi-turn voice session tracking
- Mobile app with camera integration
- Comprehensive test suite (31 tests)
- Frontend UI for community features
- Consumer/professional mode toggle in app
- Fine-tuned VLM on veterinary dataset
- Multi-image context across conversation
- Real-time bounding box annotations
- Veterinary report generation (PDF export)
- Database persistence (PostgreSQL)
- HIPAA-compliant medical record storage
- Offline mode with local models
- Specialist consultation marketplace
- Analytics dashboard for clinics
Built for TreeHacks 2026
MIT License - see LICENSE file for details
- Anthropic Claude for multimodal reasoning
- OpenAI Whisper for speech recognition
- Perplexity Sonar for real-time research
- Expo for mobile development framework
- Veterinary professionals who inspired this project