This repository contains materials, code, and projects from the Data Engineering Zoomcamp.
This repository tracks my learning journey through data engineering concepts, tools, and best practices. The zoomcamp covers various aspects of modern data engineering, including data ingestion, processing, orchestration, and analytics.
- Data Ingestion: Batch and streaming data collection
- Data Processing: Data transformation and cleaning
- Workflow Orchestration: Automating data pipelines
- Data Warehousing: Building and managing data warehouses
- Analytics Engineering: dbt and data modeling
- Batch Processing: Large-scale data processing frameworks
- Streaming: Real-time data processing
- Infrastructure: Cloud platforms and containerization
The zoomcamp typically covers tools such as:
- Python
- Docker
- SQL
- Apache Spark
- Apache Kafka
- dbt
- Airflow/Prefect
- Google Cloud Platform / AWS
- Terraform
data-engineering-zoomcamp/
├── week-1/ # Module 1 materials
├── week-2/ # Module 2 materials
├── week-3/ # Module 3 materials
└── ...
- Python 3.x
- Docker and Docker Compose
- Git
- Clone the repository:
git clone https://github.com/LeviJesus/data-engineering-zoomcamp.git
cd data-engineering-zoomcamp- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies (when available):
pip install -r requirements.txtInstructions for running specific modules and projects will be added as the course progresses.
- Module 1: Introduction & Prerequisites
- Module 2: Workflow Orchestration
- Module 3: Data Warehouse
- Module 4: Analytics Engineering
- Module 5: Batch Processing
- Module 6: Streaming
- Final Project
This repository is for educational purposes and tracks personal learning progress through the Data Engineering Zoomcamp.
This project is for educational purposes.
For questions or feedback, please open an issue in this repository.