Skip to content
View NaveedMohiuddin's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report NaveedMohiuddin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
NaveedMohiuddin/README.md

Hi, I'm Naveed Mohiuddin 👋

Data Engineer · AWS & Azure Certified · Chicago, IL
Building scalable cloud-native data platforms, ETL pipelines, and lakehouse architectures.

LinkedIn Portfolio Email


About

Data Engineer with experience building production data platforms across AWS and Azure. I design serverless ETL pipelines, optimize distributed processing with PySpark, model dimensional data, and automate everything from orchestration to deployment.

Currently: Data Engineer at Benda Infotech — building serverless AWS data pipelines (S3 → Lambda → Glue → Redshift) processing 1M+ records daily.

Previously: Software Engineer (Data Engineering) at Applied Information Sciences, building Azure-based data infrastructure for GEICO — ADF, Databricks, Delta Lake, Kafka, and Synapse across 10M+ records.

Education: MS in Computer Science from Illinois Institute of Technology · BE in Computer Science from Osmania University.


Certifications

  • AWS Certified Data Engineer – Associate
  • AWS Certified Solutions Architect – Associate
  • Microsoft Certified Azure Fundamentals

Tech Stack

Area Technologies
Data Engineering ETL/ELT · Data Modeling · Star Schema · SCD Type 2 · Medallion Architecture · Batch & Streaming
Cloud AWS (S3, Glue, Redshift, Lambda, Athena, EventBridge) · Azure (ADF, Databricks, Synapse, ADLS Gen2)
Big Data Apache Spark · PySpark · Kafka · Delta Lake · Iceberg · HDFS · Hadoop · Hive
Orchestration Apache Airflow · CI/CD · Azure DevOps · Docker · Git
Languages Python · SQL · Java · Linux
Analytics Power BI · Athena · Synapse · Redshift

Featured Projects

AWS Lakehouse & Analytics Platform
Serverless data lakehouse on AWS analyzing 100K+ Chicago crime records. Glue PySpark ETL, Athena analytics with Apache Iceberg, dimensional models in Redshift, Airflow orchestration.
AWS S3 Glue Athena Redshift Airflow PySpark Iceberg Lake Formation

ETL Weather Pipeline with Airflow
End-to-end ETL pipeline ingesting weather data from Open-Meteo API, transforming and loading into PostgreSQL using Airflow DAGs in Docker.
Apache Airflow PostgreSQL Docker Python REST API

AWS Multi-AZ Disaster Recovery
Production-style e-commerce order system on AWS with Multi-AZ RDS, custom VPC networking, and automated failover.
RDS EC2 VPC NAT Gateway Multi-AZ

Big Data Processing with Spark
Large-scale data processing on GCP Dataproc using Spark DataFrames with optimized joins, schema validation, and aggregation.
Apache Spark HDFS GCP Dataproc SQL


What I'm Looking For

Actively open to Data Engineer, Cloud Data Engineer, and AWS Data Engineer roles. If my background fits your team's needs, I'd love to connect.

📧 [email protected] · LinkedIn · Portfolio

Pinned Loading

  1. chicago-crime-lakehouse chicago-crime-lakehouse Public

    🚔 Serverless data lakehouse on AWS analyzing 100K+ Chicago crimes with Apache Iceberg, Lake Formation governance, and sub-second Athena queries for $6/month

    Python 1

  2. bigdata-spark-dataproc bigdata-spark-dataproc Public

    Apache Spark Big Data Processing on Google Cloud Dataproc using Scala, Spark SQL, and HDFS.

    Scala 1

  3. real-time-stream-processing-kafka-spark-gcp real-time-stream-processing-kafka-spark-gcp Public

    Real-time stream processing project using Apache Kafka and Spark Streaming on Google Cloud Dataproc. Includes Python producers/consumers, Spark DStream word count, and full deployment with screensh…

    Python 1

  4. ETL_Airflow ETL_Airflow Public

    An end-to-end ETL pipeline using Apache Airflow to automate the extraction of real-time weather data from the Open-Meteo API, transformed into structured format, and loaded into a PostgreSQL databa…

    Python 1

  5. Multi-AZ-Disaster-Recovery Multi-AZ-Disaster-Recovery Public

    Architected and implemented Multi-AZ disaster recovery system with <2 second RTO and zero data loss, featuring automated failover, real-time monitoring dashboard, and secure VPC architecture across…

    HTML 1