Cost-Optimized Data Pipeline with Cloud-Based Infrastructure and Machine Learning In Informatica

Overview

This project focuses on developing a cost-optimized data pipeline leveraging cloud-based infrastructure and machine learning techniques. By analyzing usage patterns (such as seasonal patterns, bursty behavior, predictable workload, and anomalous behavior) and dynamically adjusting resource allocations, our aim is to minimize costs associated with data processing and storage while maintaining performance and reliability.

Key Features

Usage Pattern Analysis: Utilize machine learning techniques to analyze usage patterns of the data pipeline.
Dynamic Resource Allocation: Automatically adjust resource allocations based on detected usage patterns to optimize costs.
Performance Monitoring: Continuous monitoring of pipeline performance to ensure reliability and maintain performance standards.
Cost Optimization Strategies: Implement various cost optimization strategies such as scaling, resource pooling, and workload scheduling.
Anomaly Detection: Identify anomalous behavior in the data pipeline and take corrective actions to mitigate risks and optimize costs.

Technologies Used

Cloud Platforms {INFORMATICA}
Containerization and Orchestration Tools (e.g., Docker, Kubernetes)
Python
MLalGo {KNN}

Installation

git clone https://github.com/gitsofaryan/Informatica.git

pip install logging pickle pandas

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
KNN.pkl		KNN.pkl
KNN_model		KNN_model
KNN_model2		KNN_model2
LICENSE		LICENSE
README.md		README.md
cloud.csv		cloud.csv
knn.ipynb		knn.ipynb
predict.py		predict.py
testing.py		testing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cost-Optimized Data Pipeline with Cloud-Based Infrastructure and Machine Learning In Informatica

Overview

Key Features

Technologies Used

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cost-Optimized Data Pipeline with Cloud-Based Infrastructure and Machine Learning In Informatica

Overview

Key Features

Technologies Used

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages