Production-style data engineering projects focused on ingestion, transformation, orchestration, and analytics.
Each project models real-world data problems and operational tradeoffs rather than toy examples or isolated scripts.
Stack: Airflow, Docker, AWS S3
Modernized legacy batch jobs into orchestrated Airflow DAGs with cloud storage integration. Emphasis on scheduling, dependency management, retries, and operational reliability.
Stack: Python, Airflow, AWS
End-to-end ETL pipeline ingesting API data, transforming it, and loading curated datasets into cloud storage and databases. Built with modular components to support reuse and change.
Stack: Python, Pandas, SQLite, Streamlit
Processes FHIR-formatted healthcare JSON into structured analytical tables and dashboards. Focuses on schema normalization, data quality, and healthcare data complexity.
Stack: Airflow, Docker, SQL
Simulates near real-time marketing events and processes them through scheduled pipelines. Demonstrates time-window aggregation, transformation logic, and analytics readiness.
Stack: Python, PostgreSQL, Airflow
Integrates multiple customer data sources into unified reporting tables. Emphasis on joins, data modeling, and analytics-friendly schema design.
Stack: PySpark, Delta Lake
Distributed pipeline transforming raw sales data into cleaned, analytics-ready layers. Demonstrates scalable processing patterns and layered data architecture.
Stack: AWS Lambda, Python, S3
Serverless scraper extracting job postings and storing structured outputs in cloud storage. Designed to demonstrate automation and event-driven architecture.
Stack: SQL, Pandas
ETL workflow analyzing patient engagement behavior across healthcare datasets for trend analysis and reporting.
Stack: Python, BeautifulSoup
Web scraper tracking product listings and pricing changes with structured, queryable output.
Stack: Python, BeautifulSoup
Scrapes and enriches public library book metadata for local data exploration and enrichment.
Each project includes:
- A clear problem definition
- Data sources and assumptions
- Transformation and modeling logic
- Notes on tradeoffs and design decisions
Projects are updated continuously as new tools and patterns are learned.
- GitHub: https://github.com/bashoori
- Live Portfolio: https://bashoori.github.io/portfolio
- LinkedIn: https://www.linkedin.com/in/bitaashoori








