An end-to-end Data Engineering pipeline that extracts live financial data, performs automated transformations, and loads it into a persistent SQL database.
As a 3rd-year Computer Engineering student at Istanbul Aydin University, I developed this project to demonstrate core Data Engineering principles. The system automates the flow of market data, ensuring data integrity and persistence—skills that are essential for data-driven sectors like banking and finance.
- Language: Python 3.x
- Libraries: Pandas, SQLAlchemy, yfinance
- Database: SQLite (Relational Storage)
- Workflow: ETL (Extract, Transform, Load)
Using the yfinance API, the system retrieves 1-minute interval market data (e.g., BTC-USD) for the last 24 hours.
- Error Handling: Implemented
try-exceptblocks to ensure the pipeline remains robust during API connection issues.
Raw data is processed using Pandas to meet production-ready standards:
- Cleaning: Automatically removes unnecessary columns like
Stock SplitsandDividends. - Normalization: Converts column names to lowercase for seamless SQL compatibility.
- Metadata: Adds
tickersymbols andingested_attimestamps to maintain a clear Audit Trail. - Timezone Handling: Standardizes timestamps by removing UTC offsets to ensure database consistency.
The cleaned data is streamed into a SQLite database (market_data.db) using SQLAlchemy.
- Persistence: Utilizes
if_exists='append'logic to build a continuous time-series dataset, preventing data loss across multiple runs.
During testing, the pipeline successfully processed and stored over 550+ records with 100% data integrity. This architecture serves as a foundation for more complex projects, such as my current work on Hybrid Energy Potential Assessment.
- Clone the repository:
git clone [https://github.com/silapeksen/financial-data-pipeline.git](https://github.com/silapeksen/financial-data-pipeline.git)
- Install dependencies:
pip install -r requirements.txt
- Run the pipeline:
python data_ingestion.py
- Verify the database:
python check_db.py
3rd Year Computer Engineering Student
Passionate about Data Engineering, Automation, and Backend Systems.
"Turning raw data into meaningful insights, one pipeline at a time."