Cost-Optimized Data Pipeline with Cloud-Based Infrastructure and Machine Learning In Informatica
Overview
This project focuses on developing a cost-optimized data pipeline leveraging cloud-based infrastructure and machine learning techniques. By analyzing usage patterns (such as seasonal patterns, bursty behavior, predictable workload, and anomalous behavior) and dynamically adjusting resource allocations, our aim is to minimize costs associated with data processing and storage while maintaining performance and reliability.
Key Features
- Usage Pattern Analysis: Utilize machine learning techniques to analyze usage patterns of the data pipeline.
- Dynamic Resource Allocation: Automatically adjust resource allocations based on detected usage patterns to optimize costs.
- Performance Monitoring: Continuous monitoring of pipeline performance to ensure reliability and maintain performance standards.
- Cost Optimization Strategies: Implement various cost optimization strategies such as scaling, resource pooling, and workload scheduling.
- Anomaly Detection: Identify anomalous behavior in the data pipeline and take corrective actions to mitigate risks and optimize costs.
Technologies Used
- Cloud Platforms {INFORMATICA}
- Containerization and Orchestration Tools (e.g., Docker, Kubernetes)
- Python
- MLalGo {KNN}
Installation
git clone https://github.com/gitsofaryan/Informatica.git
pip install logging pickle pandas
Built With
- jupyter-notebook
- python

Log in or sign up for Devpost to join the conversation.