Tested with Python 3.10.10
This is a simple Python-based ETL pipeline with three main components:
- Extractor: Reads data from a source (mysql database)
- Transformer: Cleans and transforms the data, including joining multiple tables and updating date types
- Loader: Loads the data into different locations (mysql database, local CSV)
etl.py: Contains the ETL classesconfig.loader.py: Contain the function for loading Database credentials from YAMLmain.py: Main script to run the ETL processdata/: Sample input/output filesconfigs/db_config.yaml: Stores database connection info like host, user, password
pip install -r requirements.txt
python main.py