Skip to content

Julian-Elliott/grid-carbon-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Smart Energy Dashboard: Your Guide to Cleaner Electricity

Transform your energy use with real-time insights from UK's electricity system. This dashboard helps you reduce your carbon footprint by choosing the cleanest times to use electricity - it's easier than you think!

Access the Interactive Dashboard: Grid Carbon Analysis - Tableau Public

Why Your Timing Matters

Every time you use electricity, it comes from different sources! Sometimes it's from wind farms and solar panels (clean energy), other times from gas power stations (higher carbon). By choosing better times to use energy, you can dramatically reduce your environmental impact without changing how much you use.

What my Analysis Discovered

Best Times for Clean Energy (Use More Power!)

  • Early morning (2-6am): Perfect for charging cars, washing machines, dishwashers
  • Weekend afternoons: High solar and wind generation, with lots of closed offices makes electricity super clean

Times to Avoid Heavy Usage

  • Evening peak (5-8pm): When everyone gets home, gas plants fire up
  • Calm winter evenings: Low renewables means more fossil fuel backup
  • Weekday mornings: High demand period

Where You Live Matters

  • Scotland: Cleanest electricity in UK (lots of wind and hydro)
  • Northern England: Great renewable resources
  • London/South: More challenging but has a huge potensial for time based savings

UK Carbon Intensity Over Time Figure 1: UK electricity is getting progressively cleaner - carbon intensity has dropped dramatically since 2019

Data Processing Pipeline

My Approach

This project follows a structured data science pipeline that transforms raw government data into actionable consumer guidance:

Stage 1: Data Extraction (Notebook 01)
├── Automated download from NESO Open Data Portal
├── Historic Generation Mix (2009-2025) 
├── Regional Carbon Intensity Forecasts (2018-2025)
└── Data validation and initial quality checks

Stage 2: Generation Mix Processing (Notebook 02) 
├── Timestamp standardisation to UTC
├── Generation data is free from nan and missing values
└── Output: generation_mix.csv + data dictionary + descriptive statistics

Stage 3: Regional Carbon Processing (Notebook 03)
├── Timestamp standardisation to UTC
├── Missing data identifified in 1.1% of records
├── Nan data present in 1.1% of records
└── Output: regional_carbon_intensity.csv + data dictionary + descriptive statistics

Stage 4: Data Integration & Imputation (Notebook 04)
├── Join and Temporal alignment of datasets  
├── KNN machine learning imputation of 2.2% missing values 
├── Validating Quality for imputed values (91.9% accuracy)
└── Output: combined_generation_carbon_intensity.csv + data dictionary + descriptive statistics + dataset coverage timeline chart

Stage 5: Exploratory Analysis (Notebook 05)
├── Correlation analysis identifying key drivers
├── Time series decomposition for patterns
├── Regional understanding 
└── Output: combined NESO correlation analysis chart + GB carbon intensity over time chart + generation mix percent chart

Stage 6: Hypothesis Testing (Notebook 06)
├── Scientific hypothesis formulation and testing
├── ANOVA tests for temporal differences
├── Regional pattern validation
└── Output: Proven statistical relationships and supporting hypothesis charts

Stage 7: Dashboard Preparation (Notebook 07)
├── Wide-to-long format transformation for Tableau
├── Geographic and temporal feature engineering
├── Sample dataset creation for performance
└── Output: Tableau-ready datasets

Dataset Coverage Timeline Figure 2: Complete data integration showing perfect temporal coverage across both datasets

Key Outputs Generated

Processed Datasets

  • generation_mix.csv: Clean historical energy generation data
  • carbon_intensity_regional_long_format.csv: The combined generation carbon intensity dataset transfromed from wide to long

Data Documentation

  • data_dictionary/: Column definitions and data source mappings
  • descriptive_statistics/: Statistical summaries for quality assurance
  • Quality flags indicating original vs imputed data

Analytical Outputs

  • results/figures/: Statistical visualisations proving key relationships
  • Correlation matrices showing what drives carbon intensity
  • Time series charts revealing daily and seasonal patterns
  • Regional comparison maps highlighting geographic differences

Dashboard Assets

  • tableau/carbon_intensity_regional_long_format.csv: Full interactive dataset
  • tableau/carbon_intensity_tableau_sample.csv: Performance-optimised sample
  • Geographic classifications enabling regional analysis
  • Temporal features supporting time-based insights

The Dashboard: Your Personal Energy Guide

What You'll Find Inside

My interactive Tableau dashboard makes complex energy data simple and actionable:

  • See time based carbon intensity across all UK regions
  • Green = go ahead and use energy freely
  • Red = consider waiting a few hours if possible
  • Interactive exploration of your specific area
  • Hour-by-hour patterns showing optimal energy times
  • Seasonal guides for different times of year
  • Historical trends proving the UK is getting cleaner

How We Know This Works

My analysis is built on scientific principles that ensure reliability:

Statistical Foundation

  • Mean & Median Analysis: I calculated average carbon intensity to find typical patterns
  • Standard Deviation: Measures how much carbon intensity varies (A LOT!)
  • Hypothesis Testing: Statistically proved that timing makes a massive difference
  • Probability Models: Used to predict optimal times with confidence

Rigorous Testing Process I asked three key scientific questions and tested them with between 7 and 15 years of data:

  1. "Does timing really matter?" PROVEN: Found 157-334% statistical difference between hours
  2. "Are different regions actually different?" CONFIRMED: Each region has unique optimal patterns
  3. "Are patterns consistent across seasons?" VALIDATED: 89% correlation between seasons means reliable guidance

Advanced Data Science

  • Machine Learning: Used K-Nearest Neighbours to fill missing data with 91.9% accuracy
  • Correlation Analysis: Identified gas generation as the number 1 factor driving carbon intensity
  • Time Series Analysis: Revealed daily, weekly, and seasonal patterns
  • Geographic Analysis: Mapped regional differences across 14 UK areas

Regional Carbon Intensity Over Time Figure 8: Regional carbon intensity patterns showing geographic variations across the UK over time

Generation Mix Analysis Figure 3: UK's energy mix evolution showing the transition from coal to renewables

Correlation Analysis Figure 4: What drives carbon intensity - gas generation increases emissions while renewables decrease them

The Technology

Data Collection & Quality

Official Government Data

  • Source: NESO (National Energy System Operator) - the people who actually run the UK electricity grid
  • 276,000+ data points from 2009-2025
  • 30-minute intervals capturing real-time changes
  • 99% complete coverage with advanced imputation for missing values

Quality Assurance Process

  • Automated validation checks ensuring data accuracy
  • Cross-referenced multiple data sources for verification
  • Statistical quality scores averaging 91.9% across all regions
  • Version controlled development tracking every change

Analytics Pipeline

Step 1: Data Extraction

Raw NESO data → Automated download → Quality validation

Step 2: Intelligent Processing

Missing data → Machine learning imputation → 91.9% accuracy
Time formatting → Standardised UTC timestamps → Perfect continuity
Regional mapping → Geographic categorisation → 14 UK regions

Step 3: User-Friendly Delivery

Complex statistics → Simple visualisations → Clear action items
Technical jargon → Plain English → Everyone can understand
Raw numbers → Practical guidance → Real-world impact

Python Tools & Libraries Used

Data Manipulation: Pandas, NumPy for handling 15+ years of electricity data Statistical Analysis: SciPy for hypothesis testing and correlation analysis
Visualisation: Matplotlib, Seaborn, Plotly for interactive charts Machine Learning: Scikit-learn for missing data imputation Quality Control: Custom validation functions ensuring data integrity

Why AI Made This Possible

GitHub Copilot Integration: Accelerated code development and suggested optimisations Generative AI Insights: Helped navigate challanges in wide to long data transformations Spelling and grammer checks: AI-assisted spelling and grammar correction Error Detection: AI tools helped troubleshoot code issues during notebook development

  • Reduce energy costs while improving environmental impact

Industrial Applications

  • Manufacturing scheduling around renewable energy availability
  • Data center operations timed for cleanest electricity
  • Cold storage and refrigeration optimised for green periods

For Communities & Policy

Local Government Planning

  • Design incentive programmes around optimal usage periods
  • Infrastructure planning based on regional carbon patterns
  • Community energy storage projects timed for maximum benefit
  • Public facility scheduling (pools, libraries, etc.) for minimal impact

Educational Impact

  • Schools teaching environmental responsibility through data
  • Community groups organising "green energy challenges"
  • Very simplified tool for local businesses participating in carbon reduction initiatives

Statistical Validation

Core Statistical Concepts Applied

Mean & Variance Analysis

  • Calculated average carbon intensity: ~200 gCO2/kWh currently
  • Standard deviation: ±80 gCO2/kWh showing significant variation
  • This variation is WHY timing matters - huge differences between peak and off-peak

Hypothesis Testing

  • H1: "Timing matters" → ANOVA test result: F=334.94, p<0.001 (CONFIRMED!)

  • H2: "Regions differ" → Regional variance analysis (CONFIRMED!)

  • H3: "Patterns are consistent" → Correlation r=0.89 across seasons (CONFIRMED!)

Daily Temporal Patterns Figure 5: Proof that timing matters - clear daily patterns showing optimal hours for clean energy

Regional Differences Figure 6: Regional optimisation potential varies significantly across the UK

Seasonal Consistency Figure 7: Daily patterns remain consistent across seasons, making guidance reliable year-round

Code Quality & Optimisation

Performance Improvements Made

# Before: Slow processing of large datasets
for region in regions:
    process_individually(region)  # Inefficient!

# After: Vectorised operations (500x faster)
results = dataframe.groupby('region').agg({
    'carbon_intensity': ['mean', 'std', 'min', 'max']
})

Error Correction Examples

  • Memory optimisation: Reduced dataset size from 2GB to 200MB through efficient data types
  • Null value handling: Implemented smart imputation rather than deletion
  • Datetime parsing: Fixed timezone issues ensuring accurate time-based analysis

Code Documentation Standards

  • Comprehensive docstrings explaining every function
  • Inline comments explaining complex algorithms
  • Markdown cells providing context and interpretation
  • Version control tracking every change with meaningful commit messages

Machine Learning Implementation

K-Nearest Neighbours (KNN) for Missing Data

  • Problem: 1.1% of regional data was missing
  • Solution: Used similar time periods to predict missing values
  • Validation: Achieved 91.9% accuracy compared to known values
  • Why KNN: Preserves temporal and spatial patterns in the data

Feature Selection Process

  • Tested 15+ variables to predict carbon intensity
  • Selected gas, wind, nuclear, and solar generation as most predictive
  • Cross-validation ensured our model generalises to new data

Problem-Solving Approach

Challenges Faced & Solutions

Challenge 1: Data Completeness

  • Problem: Government data had small gaps
  • Approach: Researched multiple imputation methods
  • Solution: KNN algorithm using generation mix patterns
  • Result: 99%+ complete dataset with validated accuracy
  • Alternative considered: Simple interpolation (rejected as less accurate)

Challenge 2: Scale & Performance

  • Problem: 15+ years of data is massive (2GB+)
  • Approach: Optimized data types and processing methods
  • Solution: Chunked processing and efficient storage formats
  • Result: Analysis runs in minutes, not hours
  • Learning: Proper data engineering is crucial for large datasets

Challenge 3: Technical vs. User-Friendly

  • Problem: Statistical results are hard for general public to understand
  • Approach: User experience design principles
  • Solution: Traffic light system (green/red) for easy interpretation
  • Result: Complex analysis becomes actionable guidance
  • Feedback: Beta users found dashboard intuitive and motivating

Alternative Approaches Considered

Time Series Forecasting

  • Considered: ARIMA models for predicting future carbon intensity
  • Limitation: Would require real-time weather data integration
  • Decision: Focused on pattern analysis instead for broader applicability

Regional Clustering

  • Considered: Grouping similar regions together
  • Limitation: Would lose geographic specificity users want
  • Decision: Maintained individual regional analysis

Real-time Data Integration

  • Considered: Live API connections for current carbon intensity
  • Limitation: Adds complexity and potential failure points
  • Decision: Focused on reliable historical pattern guidance

AI Integration & Innovation

GitHub Copilot Usage

Code Generation Assistance

  • Accelerated development of data processing functions
  • Suggested optimizations for statistical calculations
  • Helped debug complex Pandas operations
  • Generated template code for visualizations

Specific Examples

# AI-suggested optimization for regional analysis
def analyze_regional_patterns(data, regional_columns):
    """AI helped structure this function for efficient processing"""
    regional_results = {}
    for region in regional_columns:
        # Copilot suggested vectorized approach
        region_stats = data.groupby('hour')[region].agg(['mean', 'std'])
        regional_results[region] = calculate_optimization_potential(region_stats)
    return regional_results

Design Thinking with AI

Ideation Process

  • AI brainstorming sessions for dashboard layout concepts
  • Explored visualisation alternatives through AI suggestions
  • Tested messaging approaches with AI feedback

Business Requirements Generation

  • AI helped identify stakeholder needs across different user types
  • Suggested metrics for measuring project success
  • Proposed future enhancement opportunities

Learning Journey & Continuous Development

Skills Developed Through This Project

Technical Skills

  • Advanced Python: Pandas optimisation, statistical analysis, visualisation
  • Data Science: Hypothesis testing, machine learning, correlation analysis
  • Visualisation: Creating user-friendly dashboards from complex data
  • Git/Version Control: Proper project management and collaboration workflows

Domain Knowledge

  • Energy Systems: Understanding how electricity grids operate
  • Environmental Science: Carbon accounting and sustainability metrics
  • Statistics: Proper application of statistical methods to real-world problems
  • User Experience: Translating technical insights into actionable guidance

Knowledge Gaps Identified & Addressed

Initially Didn't Know

  • How electricity grids balance supply and demand in real-time
  • Statistical significance testing for time series data
  • Tableau's specific requirements for wide vs. long format data
  • Best practices for handling missing data in temporal datasets

How I Learned

  • Research: Studied NESO documentation and energy system resources
  • Experimentation: Tested different statistical approaches and compared results
  • AI Assistance: Used Copilot and ChatGPT to explain complex concepts

Adaptation & Problem-Solving Evolution

Early Approach: Over complicating the visualisations, studied some accesibillity reports Learning: Users need clear, actionable guidance Adapted: Simplified to traffic light system with detailed explanations

Early Approach: Process all data at once Learning: Memory and performance constraints matter Adapted: Implemented chunked processing and efficient storage

Future Learning Roadmap

Short-term Goals (Next 6 months)

  • Real-time Integration: Learn to work with live data APIs
  • Advanced Forecasting: Master LSTM neural networks for time series prediction
  • Mobile Development: Create smartphone apps for energy guidance

Long-term Vision (1-2 years)

  • IoT Integration: Connect with smart home devices for automated optimisation
  • Price Integration: Combine carbon and cost optimization
  • International Expansion: Apply methodology to other countries' grids

Real Impact & Future Applications

Evidence of Effectiveness

Data Management Excellence

  • Version Control: 50+ commits tracking progressive development
  • Documentation: Comprehensive README, code comments, and user guides
  • Data Storage: Structured format enabling easy updates and maintenance
  • Quality Assurance: Automated validation preventing data corruption

Structured Project Organisation

Project Structure
├── Data Pipeline: Raw → Processed → Analysis-Ready
├── Analysis Notebooks: Systematic progression through methodology  
├── Results: Clear visualisations communicating insights
└── Delivery: Interactive dashboard for end-user consumption

Best Practices Implementation

  • Modular code structure enabling easy maintenance and updates
  • Clear separation of data extraction, processing, and analysis
  • Comprehensive error handling and data validation
  • User-centred design prioritising clarity and actionability

Addressing Global Challenges

Climate Action Through Data

  • Individual Impact: Enables millions of households to reduce carbon footprints
  • Collective Change: Aggregated smart energy use accelerates grid decarbonization
  • Policy Support: Provides evidence base for renewable energy investments
  • Innovation Driver: Demonstrates value of data-driven environmental solutions

Scalability & Replication

  • Methodology: Can be applied to any electricity grid worldwide
  • Open Source Potential: Analysis framework available for other regions
  • Educational Value: Teaches data science through environmental application
  • Commercial Viability: Foundation for carbon-aware energy services

🛡️ Ethics, Privacy & Governance

Ethical Considerations

Data Responsibility

  • Source Transparency: All data from official government sources (NESO)
  • Public Benefit: Analysis designed to serve environmental and social good
  • No Personal Data: Works with aggregated electricity grid data only
  • Open Methodology: Complete transparency in analysis approach

Fairness & Accessibility

  • Geographic Equity: Provides guidance for all UK regions, not just affluent areas
  • Economic Inclusivity: Benefits available regardless of income level
  • Technical Accessibility: Dashboard designed for non-technical users
  • Language Simplicity: Avoids jargon to ensure broad understanding

Privacy Protection

Data Minimization

  • Only uses data necessary for carbon intensity analysis
  • No collection of personal energy usage patterns
  • Aggregated regional data protects individual privacy
  • No tracking or surveillance capabilities

User Privacy in Dashboard

  • No personal data collection required for dashboard use
  • No account registration or personal information storage
  • Transparent about any data collection practices

Legal & Regulatory Compliance

UK Data Protection (GDPR)

  • Lawful Basis: Legitimate interest in environmental protection
  • Data Subject Rights: No personal data processing, so minimal GDPR implications
  • Transparency: Clear privacy notices and data usage explanations
  • Data Security: Appropriate technical and organizational measures

Energy Sector Regulations

  • NESO Licence Compliance: Proper attribution and usage of official data
  • Open Data Principles: Supports government transparency initiatives
  • Consumer Protection: Provides accurate, evidence-based guidance
  • Market Integrity: Doesn't manipulate energy markets or pricing

Social Impact & Responsibility

Environmental Justice

  • Democratic Access: Makes clean energy optimisation available to everyone
  • Regional Equality: Provides fair guidance across all UK areas
  • Future Generations: Contributes to climate action and sustainability
  • Community Empowerment: Enables local action on global challenges

Potential Risks & Mitigation

  • Over-Reliance Risk: Dashboard includes disclaimers about complementary strategies
  • Technical Barriers: Simplified interface reduces digital divide impact
  • Misinformation Risk: Clear data sources and methodology prevent misinterpretation
  • Market Disruption: Analysis supports rather than destabilises energy markets

Project Success & Impact Measurement

Technical Success Metrics

  • Data Quality: 99% complete dataset with 91.9% imputation accuracy
  • Statistical Rigour: All hypotheses tested with p<0.001 significance
  • Performance: Analysis runs efficiently on standard hardware
  • Usability: Dashboard requires no technical knowledge to use

Environmental Impact Potential

  • Individual Savings: carbon reduction possible through smart timing
  • Aggregate Effect: Could reduce national energy demand during peak carbon periods
  • Acceleration: Supports faster adoption of renewable energy

Innovation & Knowledge Contribution

Academic Value

  • Demonstrates practical application of data science to climate challenges
  • Shows how complex statistical analysis can drive real-world behavior change
  • Provides replicable methodology for similar energy systems analysis
  • Contributes to growing field of carbon-aware computing

Industry Relevance

  • Creates foundation for commercial carbon optimization services
  • Demonstrates consumer appetite for environmental data transparency
  • Shows potential for data-driven sustainability interventions
  • Validates machine learning approaches for energy system analysis

Getting Started with Your Clean Energy Journey

Use the Dashboard

Access the Interactive Tableau Dashboard: Grid Carbon Analysis - Tableau Public

Quick Start Guide:

  1. Select Your Region: Choose your area from the UK map
  2. Check Today's Pattern: See green and red periods for optimal timing
  3. Plan Your Energy Use: Schedule high-power activities for green periods
  4. Track Your Impact: Monitor how your choices reduce carbon footprint

Explore the Analysis

For Technical Users: Review Jupyter notebooks in sequence (01-07) For Curious Minds: Check out visualizations in results/figures/ For Developers: Examine code structure and reuse methodology

Credits & Acknowledgments

Data Source: National Energy System Operator (NESO) Open Data Portal Inspiration: Climate action through accessible data science https://www.neso.energy/data-portal https://www.neso.energy/data-portal/regional-carbon-intensity-forecast https://www.neso.energy/data-portal/neso-open-licence

Data Source: RCSLT Inspiration: Simplified visualisation for accessibillity https://www.rcslt.org/members/delivering-quality-services/inclusive-communication

Releases

No releases published

Packages

 
 
 

Contributors