Skip to content

VechamGautham/openweather-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenWeather Data Engineering Project

This is an ongoing project. Further enhancements and improvements will be made to make the data more useful and accessible for downstream stakeholders for their respective analytical and decision-making purposes.

This project is built using Apache Airflow on Amazon Web Services (AWS) for processing weather data from the OpenWeather API.


Project Architecture

The pipeline follows a standard ETL flow:

  1. Extract weather data from the OpenWeather API using Airflow tasks.
  2. Transform the raw data into a structured, analysis-ready format.
  3. Load the transformed data into an Amazon S3 bucket.

Components

1. Data Source

  • OpenWeather API — Provides real-time weather data, forecasts, and historical information.

2. AWS Services

  • Amazon S3
    Used for storing intermediate and transformed weather data in CSV format for further analysis.

  • Apache Airflow
    Orchestrates the entire data pipeline through DAGs, operators, and schedulers.


Airflow DAG Overview

The pipeline DAG (weather_dag) contains three main tasks:

  • is_weather_api_ready – Uses HttpSensor to check API availability.
  • extract_weather_data – Uses SimpleHttpOperator to pull data from the API.
  • transform_load_weather_data – Uses PythonOperator to clean and save data into S3.

Airflow DAG Screenshot


Airflow Dashboard

You can monitor DAG runs, success/failure states, and task execution duration through the Airflow UI.

Airflow UI Screenshot


S3 Bucket Storage

The transformed weather data is stored in CSV format inside an S3 bucket (openweatherapidata).
Each DAG run creates a new file with a timestamped filename, making it easy to track and manage historical data.

S3 Bucket Screenshot


Pipeline Steps

  1. Extract Data

    • Airflow triggers a call to the OpenWeather API to fetch weather data for a given location.
  2. Transform Data

    • The raw JSON response is parsed and converted into structured CSV format.
  3. Load Data to S3

    • The cleaned data is saved in S3 with proper naming conventions for future analysis.

Setup Instructions

AWS Setup

  • Create an AWS account.
  • Set up an S3 bucket to store weather data.

OpenWeather API Setup

Airflow Setup

  • Deploy Airflow on AWS using EC2, ECS, or your local environment.
  • Define the DAG and operators to orchestrate ETL tasks.
  • Configure Airflow connections for:
    • OpenWeather API
    • Amazon S3

Run the Pipeline

  • Trigger the DAG manually or schedule it using Airflow’s scheduler (@daily).
  • Monitor task execution in the Airflow UI.
  • Verify the generated CSV files in the S3 bucket.

Notes

  • Monitor AWS storage costs.
  • Implement proper logging and error handling for robustness.
  • Set up monitoring and alerts to catch pipeline failures early.

Contributors

  • Vecham Gautham

For questions or feedback, contact [email protected].

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages