Skip to content
View bala93kumar's full-sized avatar
πŸ’­
Works with data pipe lines
πŸ’­
Works with data pipe lines

Block or report bala93kumar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bala93kumar/README.md

πŸ‘‹ Hi, I'm Balakumar

πŸ’Ό Data Engineer | Spark & Databricks Specialist | Cloud Data Pipelines

Welcome to my GitHub! I love building scalable, optimized data pipelines that power analytics and business decisions.


πŸš€ Tech Stack & Expertise

πŸ”Ή Big Data & Distributed Processing

  • Apache Spark (PySpark, Spark SQL)
  • Databricks (Workflows, Delta Lake, Z-Ordering, Optimizations)
  • SAP HANA Data Extraction & Performance Tuning
  • Parallelism, Shuffle Optimization, Cluster Tuning

πŸ”Ή Data Engineering & ETL

  • Complex SQL Transformations & CTE Pipelines
  • Metadata-driven ETL Frameworks
  • Snapshot Validation, Partition Management
  • Incremental Loads & Rolling-window Logic
  • Job Monitoring Dashboards

πŸ”Ή Cloud & Storage

  • AWS Glue ETL
  • Delta Lake
  • Lakehouse Architectures
  • Snowflake (Community Edition)

πŸ”Ή Tools & Languages

  • Python (ETL frameworks, automation)
  • SQL (Analytical queries, joins optimization)
  • REST APIs (Databricks Jobs API)

πŸ“Š What I Work On

  • Optimizing large-scale Spark SQL jobs
  • Improving slow ETL pipelines
  • Building data quality & monitoring frameworks
  • Snapshot comparison systems for B2B analytics
  • Designing scalable metadata-based ETL workflows

πŸ“š Currently Learning

  • Kubernetes (K8s)
  • Docker
  • GitHub Actions
  • Spark on Kubernetes (future goal)

πŸ› οΈ Projects You'll Find Here

  • Automated Snapshot Validation System
  • Databricks Job Monitoring Dashboard
  • Metadata-driven PySpark ETL Framework
  • Delta Lake Optimization Scripts

πŸ“« Contact

If you'd like to collaborate or discuss data engineering ideas β€” feel free to reach out on my linkedin profile Bala !


⭐ Thanks for visiting my profile!

Popular repositories Loading

  1. spring_flight_reserv_app spring_flight_reserv_app Public

    simple flight reservation app using spring

    Java 1

  2. Item_Based_collaborative_filtering Item_Based_collaborative_filtering Public

    sample code for item based collaborative filtering recommendation engine

    Scala

  3. Machine-learning-on-spark Machine-learning-on-spark Public

    Trying Machine learning modules on pyspark

    Jupyter Notebook

  4. Machine-learning-101 Machine-learning-101 Public

    ML 101

    Jupyter Notebook 1

  5. Dice-Roll-Game Dice-Roll-Game Public

    Developed using JS

    JavaScript

  6. NodeJs NodeJs Public

    NodeJs Tutorials

    JavaScript