DataWings

SFU BigData Lab1 Project

📌 Project Overview

This project analyzes how large-scale climate variability ENSO(El Niño, Neutral, La Niña) influences bird migration patterns across North America.

We built:

A Reusable, configuration-driven ETL pipeline using PySpark and AWS EMR
Machine learning prediction models for bird observations
A real-time data streaming system using Kafka → S3 → Athena → QuickSight
Interactive ML and real-time dashboards

👥 Team

Joohyun Park – Real-Time Pipeline, Kafka, QuickSight Dashboard
Jiayi Li – ML Modeling, Shiny Dashboard
Hongrui Qu – Historical ETL, PySpark, EMR

🔗 Dashboard Links

📊 Historical Analysis & ML Prediction Dashboard

We developed a historical analysis and machine learning–based prediction dashboard
This dashboard visualizes long-term bird migration patterns and model-based observations predictions under different ENSO phases.
Built using R, Shiny, and ggplot2

Key Features

Exploratory Data Analysis：

Historical migration pattern visualization by weekly and per-species
Average Latitude Comparison Between El Niño and La Niña

Prediction Map：

Random Forest–based bird observation prediction under different phase of ENSO(El Niño, Neutral, and La Niña)
Interactive map-based visualization allow User select for month, ENSO phase, and species

🌍 Real-Time Bird Observation Dashboard

We developed a real-time bird observation dashboard using AWS QuickSight.
The dashboard provides an interactive map that allows users to explore live bird observations across Canada and the United States.

Key Features

Interactive map-based exploration
Filter by date, country (CA / US), and species
View real-time observation counts and locations
Built on a real-time data pipeline using Kafka → S3 → Athena → QuickSight

Access Note

You may need to sign in with your AWS account to view the dashboard.
Access is restricted to invited users.

🏗 System Architecture

Historical Pipeline:
eBird & NOAA ENSO datasets → S3 → ETL (EMR/PySpark) → ML (R) → ShinyAPP

ETL Configuration:
Run entire historical ETL at once with: python3 pipeline.py
config.json stores all user settings including species, countries, year ranges, file paths, and the choice of weekly or monthly aggregation
Raw file requirement:
eBird: raw txt file download from eBird and named using the pattern species_country.txt
NOAA ENSO: csv file download from NOAA
Overall structure for ETL

Real-Time Pipeline:
eBird API → Kafka → S3 → Glue → Athena → QuickSight

🛠 Tech Stack

Data Engineering

Python, PySpark, Apache Kafka, Kafka S3 Sink Connector
Amazon S3, EMR, Glue, Athena

Machine Learning

RStudio (randomForest, dplyr, ggplot2)

Visualization

R Shiny
gganimate
AWS QuickSight (Real-Time Dashboard)

Database

SQLite (Real-Time Deduplication)
S3 (Raw & Clean data storage)

📂 Data Source

🗂️ Historical Data

eBird Observation Dataset (Cornell Lab of Ornithology)
NOAA Oceanic Niño Index (ONI)

Scope for historical data

Time Range: 1995 Jan – 2025 Sep
Region: Canada & United States
Species:
- Swainson’s Thrush(olive-backed subspecies)
- Golden-winged Warbler

⏱️ Real-Time Data

eBird “Recent Observations in a Region” API (24-hour rolling window, 30-min interval)

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
ETL		ETL
Machine Learning and Visualization		Machine Learning and Visualization
Real-Time-ETL		Real-Time-ETL
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataWings

📌 Project Overview

👥 Team

🔗 Dashboard Links

📊 Historical Analysis & ML Prediction Dashboard

Key Features

🌍 Real-Time Bird Observation Dashboard

Key Features

Access Note

🏗 System Architecture

🛠 Tech Stack

📂 Data Source

🗂️ Historical Data

⏱️ Real-Time Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DataWings

📌 Project Overview

👥 Team

🔗 Dashboard Links

📊 Historical Analysis & ML Prediction Dashboard

Key Features

🌍 Real-Time Bird Observation Dashboard

Key Features

Access Note

🏗 System Architecture

🛠 Tech Stack

📂 Data Source

🗂️ Historical Data

⏱️ Real-Time Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages