π CS graduate from McGill University, passionate about high-performance distributed systems, LLM infrastructure, and full-stack development.
π Recently built:
- π€ LLM Inference API Platform β FastAPI + vLLM with real-time streaming, Prometheus/Grafana observability, supporting 500+ concurrent users & 2.5M tokens/day.
- π« High-Concurrency Ticketing System β Java Spring Boot + Redis + Kafka, optimized with sharding, distributed locks, and Sentinel/Hystrix for extreme-scale traffic.
- π° FinTrack β Full-stack financial tracking app with React/TypeScript + Flask + PostgreSQL, serving 500+ active users with 99.7% uptime.
π± Exploring cloud-native architectures (AWS, Docker, Kubernetes) and scalable AI systems.
π― Open to collaborating on distributed systems, AI infrastructure, and full-stack applications.
π€ Looking for help with optimizing large-scale LLM inference and cost-efficient deployments.
π¬ Ask me about FastAPI, React/TypeScript, PostgreSQL optimization, Redis/Kafka, CI/CD pipelines.
π« Reach me: LinkedIn | GitHub | βοΈ [email protected]
π Pronouns: He/Him
β‘ Fun fact: I once built a promo app that handled 12K+ API requests in a single weekend with 99.9% uptime π
- Designed and implemented a high-concurrency ticket booking platform, supporting user registration, login, event browsing, seat selection, order creation, and payment.
- Database Sharding: Adopted consistent hashing and modulo strategies to achieve horizontal partitioning, improving multi-dimensional query performance and boosting efficiency by 50%+.
- Concurrency Control: Implemented local locks, distributed locks, read-write locks, and fine-grained locking strategies; combined Redis + Lua scripts for atomic operations, improving throughput under peak load by 26%.
- Caching Optimization: Leveraged Redis data structures (hashes, sorted sets, lists) with Lua scripts, integrated Caffeine for in-memory caching, and optimized cache penetration & breakdown issues, significantly reducing DB load.
- Asynchronous Messaging: Integrated Kafka to decouple order creation and event recording, ensuring system resilience against message loss and improving overall fault tolerance.
- Fault Tolerance & Resilience: Adopted Sentinel/Hystrix for circuit breaking, rate limiting, and fallback mechanisms, guaranteeing high availability during extreme traffic spikes.
- Built a real-time LLM inference platform with FastAPI backend and vLLM engine, supporting 500+ concurrent users and processing 2.5M tokens daily with 99.7% uptime.
- Performance Optimization: Improved token throughput by 55% and reduced infrastructure costs by 40% using PagedAttention, continuous batching, and load-aware admission control, achieving p95 latency of 1.1s.
- Observability: Integrated Prometheus for metrics (TTFR, tokens/sec, queue length), Grafana dashboards for real-time visualization, and OpenTelemetry for distributed tracing across gateway and inference server.
- Resilience & Reliability: Deployed circuit breakers, idempotency keys, automated health checks, and load testing with Locust, documented with SLOs & runbook.
- Deployment: Containerized with Docker Compose and deployed on AWS EC2, orchestrating multi-container architecture with zero downtime.
- Built a full-stack financial tracking app with React/TypeScript frontend and Flask backend, serving 500+ active users.
- Designed PostgreSQL schema with indexed queries, handling 2K+ daily transactions with sub-200ms response time.
- Developed data visualization dashboard with Chart.js, showing monthly insights, income vs expenses, and category breakdown.
- Deployed with microservices architecture on AWS EC2, PostgreSQL RDS, and Nginx reverse proxy, achieving 99.7% uptime.
βοΈ From Boyu-Qian

