Scaling to 4.5 Million Users: When Bigger Servers Weren't the Answer
It was November 2024. We got a call that changed everything.
Colgate-Palmolive India wanted to launch a national oral health screening campaign. There would be TV commercials. Full-page newspaper ads. The Union Minister of Health would be involved. And we had a few weeks to prepare.
Oh, and we were a 2-person engineering team.
This is the story of how we went from panic-scaling with massive servers to serving 4.5 million people across India. How we reduced our database to 1/8th its size while handling 10x more traffic. And how we did it all without rewriting our code.
The Setup
The campaign was ambitious. Users could scan a QR code or call a number to get AI-powered oral health screening via WhatsApp. Nine languages. Instant results. Free dentist consultations through a network of 50,000 dentists.
Colgate was going all in. National TV commercials. Full front-page newspaper ads in major dailies. Digital campaigns across Meta and Google. QR codes on every product pack.
We had built the platform. It worked. But we had no idea how much traffic to expect.
So a day before the first test run, we made what seemed like a safe decision. Let's upgrade everything. We moved to the biggest server we could find. Scaled up the database significantly. "Just to be safe," we told ourselves.
Day 1: When Safe Wasn't Enough

November 14, 2024. The first test traffic started hitting our system at 2 PM.
By 5 PM, response times had gone from 500 milliseconds to 8 seconds. Users were timing out. Database CPU was at 95%. Connection errors everywhere.
We had the biggest instances money could buy. And they still weren't enough.
We did what anyone would do in a crisis. We scaled the database even larger. Then larger again. The system survived the day. Barely. Response times were still 5 to 8 seconds. Users were complaining.
And we had three days until official launch.
The Late Night Discovery
That evening, we sat down with our monitoring dashboards. Something didn't make sense.
Database CPU: 95% Application server CPU: 40%
Wait. If our application servers are barely working, why is everything so slow?
We started digging. Opened database performance insights. Checked slow query logs. And there it was.
One query was consuming 25% of our total database time. It was missing an index on our primary lookup column. The column we used for literally every user interaction.
We had more problems. Our Django code was completely synchronous. Every API call, every image upload, every external integration was blocking a database connection. We had no connection pooling. No caching. And we were trying to send 50,000 WhatsApp reminders every evening at 7 PM from the main application.
The truth hit us. We couldn't throw bigger servers at this. We needed to fix the architecture.
But we had three days.
The Constraint That Saved Us
Here's the thing about having no time. You can't rewrite code. Every code change is a potential bug. And we had no time to test everything properly.
This constraint became our guiding principle. Fix infrastructure. Don't touch application logic.
Fix 1: The One Query
We started with the obvious one. That query eating 25% of database time.
CREATE INDEX idx_users_phone ON users(phone);
One line. No code changes. Just add an index.
Result: Query time dropped from 800 milliseconds to 50 milliseconds. Database load dropped by 20%. Response times improved from 5 seconds to 3 seconds.
We added a few more indexes on foreign keys and frequently joined columns. Each one chipped away at the database load.
No application code changed. Zero risk.
Fix 2: Connection Pooling
Next problem: connections exhausting under any load spike.
We added a connection pooling proxy between our application and database. It sits in the middle, reusing connections efficiently. The application doesn't even know it's there.
No code changes needed. Just updated the database endpoint in configuration.
Result: Connection exhaustion disappeared. Response time down to 2 seconds.
Fix 3: Redis Cache
We added Redis for caching. But carefully. Only for simple stuff. Language strings. Configuration data. Lookup tables.
The code changes were minimal. Just wrap existing database calls.
def get_language_strings(lang):
cached = redis.get(f"lang:{lang}")
if cached:
return cached
result = database.query(...)
redis.set(f"lang:{lang}", result)
return result
Simple. Safe. Easy to test.
Result: 60% cache hit rate. Database queries down by 40%. Response time down to 1.5 seconds.
Fix 4: Horizontal Scaling

We were running one massive server. 48 CPUs. Always on. Always expensive.
We moved to a container-based setup with auto-scaling. Instead of one huge instance, we ran many small ones. Each container got 0.5 CPU.
Why so small? Three reasons.
First, cost. We used spot instances for 70% of our capacity. Spot instances are 70% cheaper than regular ones. The system automatically handles if a spot instance gets taken away. Our baseline ran on just two regular instances. During traffic spikes, we scaled up to 20 instances. After the spike, scaled back down.
Second, TV ads create instant traffic surges. One moment you're at 100 requests per second. Next moment: 500. Small instances scale up fast. Two minutes and we're handling the surge.
Third, fault tolerance. One container fails, the other 19 keep running. No single point of failure.
Result: Response time down to 800 milliseconds. Infrastructure costs down significantly. Ready to scale.
Launch Day: November 18, 2024
The campaign went live.
Morning: Newspaper ads hit doorsteps across India. Traffic starts trickling in.
11 AM: First TV commercial airs. Traffic jumps 4x in minutes. Auto-scaling kicks in. System holds steady.
Afternoon: Sustained high traffic. Response times staying around 400 to 800 milliseconds. Database at 60% CPU. We're handling it.
4 PM: TV slot and Meta Ads. Another spike. Auto-scaling responds. No crashes. No downtime.
We survived launch day. But we could see we were near our limits. Database writes were still high. We needed a more permanent solution.
The 7 PM Problem
Let me tell you about our evening problem.
We had a WhatsApp reminder feature. Every day at 7 PM, we'd send reminders to users who started their screening but didn't complete it. About 50,000 to 100,000 messages daily.
The feature worked fine with small numbers. But at scale, it destroyed our database.
Here's what happened every evening:
6:55 PM: System running smoothly. 7:00 PM: Reminder cron job starts. 7:01 PM: Database connections spike. 7:02 PM: User response times jump from 400ms to 3 seconds. 7:05 PM: Database CPU hits 95%. 7:10 PM: Some users start timing out. 7:30 PM: Reminders finally complete. 7:35 PM: System recovers.
The problem? The reminder code was synchronous. For each user: query database, send WhatsApp message, update database. All while holding a database connection.
And 7 PM is peak user traffic time. People just got home from work. They're using the app. And our reminder system is competing with them for database connections.
We couldn't remove the feature. We couldn't change the timing. 7 PM was optimal. And we couldn't rewrite the system. Too risky with campaign running.
We needed a different approach.
Week 1: The Strategic Decision
Then came another requirement. Marketing team needed Meta Conversion API integration. They were running expensive Facebook and Instagram ads. Without proper tracking, they couldn't optimize ad spend. They were wasting money.
Could we add Meta API calls to our code? Sure. But that means more blocking operations. More load on the system. Slower user responses.
We were at a crossroads. We needed:
- Reminders to work at scale
- Meta conversion tracking
- Fast user responses
- No major code rewrites
The answer was event-driven architecture.
The Kafka Decision
We decided to use Kafka for event streaming. Not a simple queue, but a full event streaming platform.
Why Kafka and not something simpler?
First, portability. We were working with a large enterprise client. They might want to move everything to their own infrastructure someday. Kafka is open-source. Runs anywhere.
Second, future-proofing. Today we need reminders and Meta tracking. Tomorrow? We don't know. With Kafka, we publish events once. Add as many consumers as we need later. No changes to the main application.
Third, it handles our scale easily. We're doing 1,000 to 2,000 events per second. Kafka handles millions per second. Plenty of headroom.
The Implementation Strategy

Here's the key insight. We kept our main application mostly synchronous. Safe. Tested. Working.
We just added event publishing. A small change. Then moved the heavy stuff to separate consumers.
Here's what it looked like:
def complete_screening(user_id, data):
# Existing code (unchanged)
screening = save_to_database(user_id, data)
# New: publish event (takes 5ms, doesn't block)
publish_event('screening_completed', {
'user_id': user_id,
'screening_id': screening.id
})
return screening
That's it. Main app done. Response returned. User happy.
In the background, consumers process events:
- Consumer 1: Write analytics to Clickhouse
- Consumer 2: Send conversion data to Meta
- Consumer 3: Handle WhatsApp reminders
Each consumer is independent. Each can fail and retry without affecting others. The main application doesn't care.
Total code changes in main application? Less than 100 lines.
Fixing the 7 PM Problem
Remember those reminders breaking our database every evening?
Before Kafka, the reminder job ran in the main application:
def send_reminders():
users = db.query("incomplete users") # 100K queries
for user in users:
send_whatsapp(user) # blocks
db.update(user) # more load
This ran in the main app. Competed with user requests. Destroyed database connections.
After Kafka, we split it up:
Main app just publishes events when users interact. Already doing this.
Reminder consumer tracks who's incomplete (in memory, not database):
def reminder_consumer():
events = consume_kafka('user_journey')
track_incomplete(events) # in memory
at_7pm:
for user in incomplete:
publish_event('send_reminder', user)
WhatsApp consumer handles sending:
def whatsapp_consumer():
reminders = consume_kafka('send_reminder')
send_whatsapp(reminders) # respects rate limits
The result?
7:00 PM now: System steady. No spike. Reminders sent in background. Users don't notice anything. Database CPU stays at 25%.
We moved the work. Didn't rewrite the feature. Just changed where it executes.
The Meta Integration
Adding Meta Conversion API became trivial.
Main app was already publishing events. We just added a new consumer.
def meta_consumer():
events = consume_kafka('screening_completed')
for event in events:
send_to_meta_api(transform(event))
That's it. New consumer. Separate service. Zero changes to main app. Marketing team got their tracking. Took three days to implement.
One event stream. Three consumers. More can be added anytime.
The Results
By December 2024, we were running smoothly:
550+ requests per second sustained. 200,000+ requests per hour during peaks. 100,000 daily active users. Response time: 400 milliseconds. 99.9% uptime through every TV ad spike.
Our database? Downsized to 1/8th its original size. Running at 20 to 30% CPU. Handling 10x more traffic than Day 1.
Infrastructure costs? Down 60% from crisis day.
The campaign ran through December. On June 24, 2025, Colgate announced results at the Oral Health Movement Summit in New Delhi. The Union Minister of Health and Family Welfare was there.
The numbers:
- 4.5 million Indians screened
- 18,000+ pin codes covered
- 700+ districts reached
- 700,000+ dental consultations generated
- 1 in 6 users visited a dentist after screening
The platform generated real national health intelligence. Average oral health score: 2.6 out of 5. India needs to prioritize oral care.
What We Learned
Architecture Beats Hardware
We started with the biggest server we could buy. It wasn't enough.
We ended with 1/8th the database size. Handled 10x more traffic. Cost 60% less.
The difference? Architecture. One index eliminated 20% of database load. Connection pooling fixed exhaustion. Caching reduced queries by 40%. Event streaming offloaded 60% of writes.
Total code changes in main application? Less than 100 lines.
Improvement in performance? 12x faster.
The Power of Constraints
Having no time to test forced us to be smart. We couldn't rewrite code. So we fixed infrastructure instead.
This turned out to be a blessing. Infrastructure changes are lower risk. Database indexes don't introduce bugs. Connection pooling is transparent. Caching is isolated.
We preserved our tested code. Added infrastructure layers. Scaled 10x.
Event-Driven Without Going Async
We kept our Django app synchronous. Simple. Tested. Safe.
We just offloaded heavy work to event consumers. Analytics. Tracking. Reminders. Everything non-critical.
User requests stayed fast. Background work happened separately. Best of both worlds.
One Query Can Kill You
That one query with the missing index. It was 25% of our database load.
Always check Performance Insights. Always audit your queries. Sometimes the solution is embarrassingly simple.
Small Instances, Big Scale
Many small instances beat one huge instance. With spot instances, we saved 70% on compute costs. With auto-scaling, we handled TV ad spikes smoothly. With containerization, we got fault tolerance.
Our baseline: two small instances. Peak: twenty instances. Scale with demand. Pay for what you use.
If We Did It Again
What We'd Keep
The "minimal code changes" principle. Fix infrastructure first. Event-driven for heavy work. Managed services (we couldn't run Kafka ourselves). Comprehensive monitoring.
What We'd Change
Set up event streaming earlier. Before reminders became a problem. More aggressive query auditing from Day 1. Load test the reminder window specifically. Prepare for async patterns in advance.
For Other Teams
Don't rewrite when you can optimize. Database indexes, connection pooling, caching. These are low-hanging fruit. They work. They're safe.
The "no time to test" constraint is valuable. It forces simpler solutions. It encourages infrastructure over code changes.
You don't need fully async code for async architecture. Keep your main app simple. Offload to event consumers. Gradual migration.
Existing features can be salvaged. Our reminders worked in concept. They just didn't scale. Event-driven rescued them. Same logic. Different execution model.
Small changes compound. One index. One caching layer. One event stream. Each solved a specific problem. Combined, they transformed how we scaled.
The Takeaway
When your system is slow, the instinct is to upgrade the hardware. Get a bigger server. Get a bigger database.
Sometimes that's the wrong answer.
We started Day 1 with the biggest servers we could buy. Response times were 8 seconds. Database was dying.
We ended with 1/8th the database size. Response times of 400 milliseconds. Serving 4.5 million users with 99.9% uptime.
The secret? We barely changed the application code.
We added database indexes. We added connection pooling. We added caching. We added event streaming. We changed less than 100 lines of application logic.
And we improved performance by 12x.
The best optimization isn't always the most expensive one. It's the most thoughtful one.
The WhatsApp reminders that broke our database every evening? We didn't rewrite the feature. We just moved it to an event consumer. Problem solved.
The Meta conversion tracking that marketing desperately needed? Event consumer. Three days. Zero changes to main app.
One missing index was 25% of our database load. One architectural shift reduced writes by 60%. The right changes at the right layer beat code rewrites every time.
When you're under pressure and traffic is coming, resist the urge to rewrite everything. Look at your infrastructure first. Understand your bottlenecks. Fix them at the right layer.
Sometimes the best code change is the one you don't make.
This campaign ran from November 2024, serving 4.5 million users across India. Results were announced at the Oral Health Movement Summit on June 24, 2025, attended by the Union Minister of Health and Family Welfare.