Data Engineer · AWS & Azure Certified · Chicago, IL
Building scalable cloud-native data platforms, ETL pipelines, and lakehouse architectures.
Data Engineer with experience building production data platforms across AWS and Azure. I design serverless ETL pipelines, optimize distributed processing with PySpark, model dimensional data, and automate everything from orchestration to deployment.
Currently: Data Engineer at Benda Infotech — building serverless AWS data pipelines (S3 → Lambda → Glue → Redshift) processing 1M+ records daily.
Previously: Software Engineer (Data Engineering) at Applied Information Sciences, building Azure-based data infrastructure for GEICO — ADF, Databricks, Delta Lake, Kafka, and Synapse across 10M+ records.
Education: MS in Computer Science from Illinois Institute of Technology · BE in Computer Science from Osmania University.
- AWS Certified Data Engineer – Associate
- AWS Certified Solutions Architect – Associate
- Microsoft Certified Azure Fundamentals
| Area | Technologies |
|---|---|
| Data Engineering | ETL/ELT · Data Modeling · Star Schema · SCD Type 2 · Medallion Architecture · Batch & Streaming |
| Cloud | AWS (S3, Glue, Redshift, Lambda, Athena, EventBridge) · Azure (ADF, Databricks, Synapse, ADLS Gen2) |
| Big Data | Apache Spark · PySpark · Kafka · Delta Lake · Iceberg · HDFS · Hadoop · Hive |
| Orchestration | Apache Airflow · CI/CD · Azure DevOps · Docker · Git |
| Languages | Python · SQL · Java · Linux |
| Analytics | Power BI · Athena · Synapse · Redshift |
AWS Lakehouse & Analytics Platform
Serverless data lakehouse on AWS analyzing 100K+ Chicago crime records. Glue PySpark ETL, Athena analytics with Apache Iceberg, dimensional models in Redshift, Airflow orchestration.
AWS S3 Glue Athena Redshift Airflow PySpark Iceberg Lake Formation
ETL Weather Pipeline with Airflow
End-to-end ETL pipeline ingesting weather data from Open-Meteo API, transforming and loading into PostgreSQL using Airflow DAGs in Docker.
Apache Airflow PostgreSQL Docker Python REST API
AWS Multi-AZ Disaster Recovery
Production-style e-commerce order system on AWS with Multi-AZ RDS, custom VPC networking, and automated failover.
RDS EC2 VPC NAT Gateway Multi-AZ
Big Data Processing with Spark
Large-scale data processing on GCP Dataproc using Spark DataFrames with optimized joins, schema validation, and aggregation.
Apache Spark HDFS GCP Dataproc SQL
Actively open to Data Engineer, Cloud Data Engineer, and AWS Data Engineer roles. If my background fits your team's needs, I'd love to connect.