TutorBox

An AI-powered conversational English practice platform for ESL (English as a Second Language) learners and their teachers. Built as a solo project from December 2022 to April 2023.

Built by Alexandr Kurilin and Dmitry Stavisky.

The Problem

Language learners need frequent speaking practice to build fluency, but access to conversation partners is limited, expensive, and often anxiety-inducing. Teachers lack the bandwidth to provide one-on-one practice time for every student, and traditional homework assignments don't develop speaking skills.

The Solution

TutorBox gives students a private, always-available AI conversation partner that adapts to their level and curriculum. Teachers create assignments with specific scenarios, vocabulary, and difficulty levels. Students practice speaking through voice conversations with the AI, and both students and teachers receive detailed performance analytics afterward.

How a Session Works

A teacher creates an assignment — selecting a conversation scenario (e.g. ordering at a restaurant, a job interview), a CEFR difficulty level (A1–C2), and target vocabulary words
The teacher shares the assignment link with students
The student opens the link, and a real-time voice conversation begins with the AI tutor
The student speaks into their microphone — their speech is transcribed in real-time and sent to GPT-3.5 for a contextual response
The AI's response is read aloud using neural text-to-speech
After the session, both student and teacher receive detailed performance reports

Architecture

Browser (React)
  |
  |-- WebSocket ---------> Google Cloud Speech-to-Text (real-time transcription)
  |-- REST (Next.js API) -> OpenAI GPT-3.5 (conversation + analysis)
  |-- REST (Next.js API) -> Google Cloud TTS / Unreal Speech (voice synthesis)
  |
Next.js API Routes
  |-- OpenAI API (chat completion, grammar scoring, CEFR detection)
  |-- Google Cloud TTS + Unreal Speech (dual TTS engine support)
  |-- SendGrid (email notifications to teachers)
  |-- Clerk (authentication + user management)
  |-- PostgreSQL (user data, subscription tiers)
  |-- PostHog (product analytics)
  |-- Logtail (structured logging)

Tech Stack

Framework: Next.js 13, TypeScript, React 18
Styling: Tailwind CSS, Headless UI
Database: PostgreSQL, Liquibase migrations
Auth: Clerk (OAuth)
AI/ML: OpenAI GPT-3.5 (conversation + analysis), Google Cloud Speech-to-Text (real-time STT via WebSocket), Google Cloud TTS + Unreal Speech (dual TTS engines)
Infrastructure: Vercel, PostHog, SendGrid, Logtail
Payments: Stripe (subscription management)

Key Technical Challenges

Real-Time Voice Pipeline

The core technical challenge was building a low-latency voice conversation loop: microphone input → real-time transcription → AI response → speech synthesis → audio playback → automatic microphone restart. This required coordinating the Web Audio API, AudioWorklet processors, WebSocket streaming to Google Cloud STT, and careful state management to prevent race conditions between recording, playback, and UI updates.

Multi-Dimensional AI Scoring

Student performance reports run multiple OpenAI API calls in parallel to evaluate different dimensions of language proficiency: fluency (words per minute), grammar accuracy (1–10), comprehension (logical coherence of responses), idiomaticity (natural language use), vocabulary usage tracking, and CEFR level detection. Each metric uses a specialized prompt with calibrated temperature and token limits. Grammar corrections are visualized using a diff algorithm that renders inline strikethrough/addition formatting.

Prompt Engineering for Pedagogy

The system prompt dynamically assembles scenario context, target vocabulary, and CEFR level constraints to keep AI responses pedagogically appropriate. Lower CEFR levels produce simpler sentence structures and vocabulary; higher levels allow more complex language. The AI is encouraged to naturally incorporate the teacher's assigned vocabulary words into conversation, creating organic practice opportunities rather than rote drilling.

Dual TTS Engine Support

The platform supports both Google Cloud Neural TTS and Unreal Speech, with multiple voice profiles for different speaker roles (male/female, bot/human). This provided fallback reliability and the ability to compare cost and quality tradeoffs between providers.

WhatsApp as an AI Interface

In February 2023, we prototyped a fully WhatsApp-native version of TutorBox (see the whatsapp branch). The idea was to eliminate the web app entirely and meet students where they already were — inside WhatsApp, the dominant messaging platform in our target Latin American market. The integration used the WhatsApp Cloud API to receive text and voice messages via webhook, transcribe voice notes through Google Cloud STT, generate conversational replies with OpenAI, synthesize audio responses with Google Cloud TTS, and send them back as WhatsApp voice messages — all in a single request cycle. Conversation history was persisted in PostgreSQL to maintain context across messages. This was an early experiment in using a chat platform as the primary interface for AI interaction, predating the broader industry's move toward conversational AI inside messaging apps by roughly two years.

Pages

Route	Description
`/`	Marketing landing page
`/practice`	Main conversation interface — real-time voice dialogue with the AI tutor
`/assign`	Teacher tool for creating assignments with scenario, CEFR level, and vocabulary
`/student-report`	Post-session analytics dashboard for students
`/teacher-report`	Detailed student performance report for teachers (includes full transcript with audio playback)

Project Structure

pages/           # Next.js pages and API routes
  api/           # Backend endpoints (OpenAI proxy, TTS proxy, corrections, email)
components/      # React components (chat log, transcriber, TTS, scenario selector)
logic-frontend/  # Client-side logic (prompts, TTS helpers, report calculations)
logic-backend/   # Server-side logic (database, analytics, CORS)
logic-shared/    # Shared types and data (scenarios, CEFR levels, utilities)
db/              # PostgreSQL migrations (Liquibase) and seed data

Status

This project is no longer actively developed. It was built exploring the intersection of AI, speech technology, and language education — during the early wave of GPT-3.5 and neural TTS becoming accessible to individual developers. The early-stage prototype was marketed to and trialed by English language schools and academies in South America. The codebase represents roughly five months of work from ideation through a functional product with real users.

Name		Name	Last commit message	Last commit date
Latest commit History 146 Commits
.idea		.idea
__tests__		__tests__
components		components
db		db
logic-backend		logic-backend
logic-frontend		logic-frontend
logic-shared		logic-shared
pages		pages
public		public
styles		styles
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
README.md		README.md
jest.config.js		jest.config.js
jest.setup.js		jest.setup.js
middleware.ts		middleware.ts
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TutorBox

The Problem

The Solution

How a Session Works

Architecture

Tech Stack

Key Technical Challenges

Real-Time Voice Pipeline

Multi-Dimensional AI Scoring

Prompt Engineering for Pedagogy

Dual TTS Engine Support

WhatsApp as an AI Interface

Pages

Project Structure

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TutorBox

The Problem

The Solution

How a Session Works

Architecture

Tech Stack

Key Technical Challenges

Real-Time Voice Pipeline

Multi-Dimensional AI Scoring

Prompt Engineering for Pedagogy

Dual TTS Engine Support

WhatsApp as an AI Interface

Pages

Project Structure

Status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages