Skip to content

gitsofaryan/Informatica

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cost-Optimized Data Pipeline with Cloud-Based Infrastructure and Machine Learning In Informatica

Overview

This project focuses on developing a cost-optimized data pipeline leveraging cloud-based infrastructure and machine learning techniques. By analyzing usage patterns (such as seasonal patterns, bursty behavior, predictable workload, and anomalous behavior) and dynamically adjusting resource allocations, our aim is to minimize costs associated with data processing and storage while maintaining performance and reliability.

Key Features

  • Usage Pattern Analysis: Utilize machine learning techniques to analyze usage patterns of the data pipeline.
  • Dynamic Resource Allocation: Automatically adjust resource allocations based on detected usage patterns to optimize costs.
  • Performance Monitoring: Continuous monitoring of pipeline performance to ensure reliability and maintain performance standards.
  • Cost Optimization Strategies: Implement various cost optimization strategies such as scaling, resource pooling, and workload scheduling.
  • Anomaly Detection: Identify anomalous behavior in the data pipeline and take corrective actions to mitigate risks and optimize costs.

Technologies Used

  • Cloud Platforms {INFORMATICA}
  • Containerization and Orchestration Tools (e.g., Docker, Kubernetes)
  • Python
  • MLalGo {KNN}

Installation

git clone https://github.com/gitsofaryan/Informatica.git

pip install logging pickle pandas

About

Cloud resource optimization model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors