Skip to content
View Shahsmit075's full-sized avatar
🧿
Focusing
🧿
Focusing

Block or report Shahsmit075

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Shahsmit075/README.md

Hi, I'm Smit Shah

Building data pipelines, distributed systems, and AI infrastructure

Data Engineering Intern @ Sigmoid Analytics Β· CSE @ IIIT Vadodara '26


πŸ‘¨β€πŸ’» About Me

πŸŽ“ B.Tech in Computer Science at IIIT Vadodara (2022 - 2026)
πŸ’Ό Data Engineering Intern @ Sigmoid Analytics Β· ex-Lief Care
πŸ”­ Currently working on PySpark pipelines, Airflow orchestration, and GenAI infrastructure
πŸ’‘ Interested in the stack where data engineering meets AI systems β€” pipelines, models, production


πŸ› οΈ My Technical Toolkit

Data Engineering


PySpark Β· Apache Airflow Β· SQL Β· ETL/ELT Β· Databricks

Backend Development

Databases

DevOps & Cloud

AI / ML


GenAI Β· LLM Pipelines Β· Oracle OCI AI Β· NVIDIA Deep Learning


πŸ’Ό Experience

Data Engineering Intern @ Sigmoid Analytics Β· Jan 2025 – Present
PySpark Apache Airflow AWS S3 SQL

  • Engineering PySpark ETL pipelines for large-scale structured datasets in a production environment
  • Designing and scheduling Airflow DAGs for multi-step ingestion and transformation workflows

Software Developer Intern @ Lief Care Β· Apr 2025 – Jul 2025
Node.js React PostgreSQL GraphQL Docker AWS

  • Built healthcare dashboards and backend APIs used by nursing teams across UK facilities
  • Automated deployments via GitHub Actions on AWS EC2/Lambda with CloudWatch monitoring

πŸš€ Projects

Kinetiq β€” Distributed Ticket Booking System
High-throughput event-driven architecture with Kafka, polyglot persistence (PostgreSQL + MongoDB), and sub-50ms read latency.
Kafka Node.js PostgreSQL MongoDB Redis Docker

Koalayst β€” Real-time SaaS Event Monitoring
Webhook-based event ingestion pipeline with Discord alerting, fine-grained filtering, and RBAC for dev teams.
TypeScript Next.js Prisma PostgreSQL

View all projects β†’


πŸ† Certifications

  • Oracle Cloud Infrastructure β€” AI Foundations Associate Β· Gen AI Professional
  • Amazon ML Summer School (MLSE'24) β€” Machine Learning, Python
  • NVIDIA β€” Fundamentals of Deep Learning
  • Google Cloud β€” Cloud Skills Boost Pathway

πŸ“« Let's Connect

Open to Data Engineering, ML Engineering, or AI Infrastructure roles from 2026.

LinkedIn

Pinned Loading

  1. Kinetiq Kinetiq Public

    Forked from dkb73/Kinetiq

    A distributed and scalable ticketing system

    JavaScript

  2. Koalayst Koalayst Public

    "Koalayst" - Your SaaS analyst and monitoring buddy....

    TypeScript 1

  3. Advanced-configured-U-NET Advanced-configured-U-NET Public

    Forked from dkb73/U-Net

    Python

  4. data-warehouse-medallion-casestudy data-warehouse-medallion-casestudy Public

    Forked from malaikashinchan/data-warehouse-medallion-casestudy

    TSQL

  5. PodVerse PodVerse Public

    Podcast Generator leveraging Open-AI API keys...and much more

    TypeScript 1

  6. Butterfly_Identification Butterfly_Identification Public

    Forked from Hs3636/Butterfly

    Butterfly Identification using SAM and VGG models

    Dart