This document outlines the complete data processing pipeline for generating jurisdiction-level EV charging infrastructure maps. The pipeline processes utility circuit line data, federal funding zones, environmental indicators, and demographic data to create priority and feasibility pixel grids.
Update Frequency: Utility circuit line data should be updated twice annually. Other datasets are updated as needed based on availability from source agencies.
- Data Acquisition - Download utility circuit line data from each provider
- Data Cleaning - Standardize columns, convert units, add utility identifiers
- Concatenation - Combine all utility lines into single dataset
- Pixelation - Convert utility lines to 100m x 100m pixel grid
- Attribute Joining - Add demographic, environmental, and funding attributes
- Output Generation - Create jurisdiction-specific priority and feasibility files
Source: PG&E GRIP Portal
Two acquisition methods available:
- Navigate to the GRIP portal
- In the layer list, expand ICA > ICA Results
- Click options menu (three dots) for "ICA, Load Capacity (kW)"
- Select Export > GeoJSON
Note: This method may encounter server timeout issues with large datasets.
Pull data directly from the ArcGIS Feature Server:
import requests
import geopandas as gpd
base_url = "https://services2.arcgis.com/mJaJSax0KPHoCNB6/arcgis/rest/services/DRPComplianceRelProd/FeatureServer/3/query"
params = {
"where": "1=1",
"outFields": "*",
"f": "geojson",
"resultOffset": 0,
"resultRecordCount": 1000,
}
features = []
while True:
print(f"Fetching offset {params['resultOffset']}")
response = requests.get(base_url, params=params)
data = response.json()
if "features" not in data or not data["features"]:
break
features.extend(data["features"])
params["resultOffset"] += params["resultRecordCount"]
pge = gpd.GeoDataFrame.from_features(features)Data Processing:
# Retain only necessary columns
pge = pge[['LoadCapacity_kW', 'geometry']]
# Add utility identifier
pge['Utility'] = 'pge'
# Set CRS and save
pge = gpd.GeoDataFrame(pge, geometry='geometry')
pge.set_crs(epsg=4326, inplace=True)
pge.to_file('pge_load.geojson', driver='GeoJSON')Source: SDG&E ICM API Explorer
Data Acquisition:
- Access the ICM API Explorer (account creation may be required)
- Navigate to Load Capacity Grids map
- Download as GeoJSON or Shapefile
Data Processing:
import geopandas as gpd
# Load data
sdge = gpd.read_file("path/to/sdge.geojson")
# Verify load columns are identical
sdge['equal'] = sdge['ICAWOF_UNILOAD'] == sdge['ICAWNOF_UNILOAD']
sdge.loc[sdge['equal'] == False] # Should return empty table
# Convert MW to kW
sdge['load_kw'] = sdge['ICAWOF_UNILOAD'] * 1000
# Retain only necessary columns
sdge = sdge[['load_kw', 'geometry']]
# Add utility identifier
sdge['Utility'] = 'sdge'
# Set CRS and save
sdge = gpd.GeoDataFrame(sdge, geometry='geometry')
sdge.set_crs(epsg=4326, inplace=True)
sdge.to_file('sdge_load.geojson', driver='GeoJSON')Source: LADWP Power GIS Portal
Data Acquisition:
- Click "Download the 34.5 KV data" link
- Unzip downloaded file to extract .kmz file
- Convert .kmz to .gdb using ArcGIS "KMZ to Layer" tool
Data Processing:
import geopandas as gpd
import pandas as pd
from bs4 import BeautifulSoup
# Load geodatabase
ladwp = gpd.read_file("path/to/ladwp.gdb")
# Extract popup information
def extract_popup_info(html_content):
soup = BeautifulSoup(html_content, 'html.parser')
data = {}
table = soup.find_all('table')[1]
for row in table.find_all('tr'):
cols = row.find_all('td')
if len(cols) == 2:
key = cols[0].get_text(strip=True)
value = cols[1].get_text(strip=True)
data[key] = value
return data
popup_info_df = ladwp['PopupInfo'].apply(extract_popup_info)
popup_info_expanded = pd.json_normalize(popup_info_df)
gdf_expanded = ladwp.drop(columns=['PopupInfo']).join(popup_info_expanded)
# Extract minimum capacity value from range
gdf_expanded['min_value'] = gdf_expanded['CAPACITY_RANGE_KW'].str.extract(r'^\s*(\d+)')
# Retain only necessary columns
ladwp = gdf_expanded[['min_value', 'geometry']]
# Add utility identifier
ladwp['Utility'] = 'ladwp'
# Set CRS and save
ladwp = gpd.GeoDataFrame(ladwp, geometry='geometry')
ladwp.set_crs(epsg=4326, inplace=True)
ladwp.to_file('ladwp_load.geojson', driver='GeoJSON')Source: SCE DRP Portal
Data Acquisition:
- Click "ESRI API" tab
- Navigate to "ICA Layer" > "ICA - Circuit Segments"
- Download as GeoJSON or Shapefile
- Also download "ICA - Circuit Segments, Non-3 Phase" if available
Note: SCE provides separate files for 3-phase and non-3-phase circuits. Verify whether these datasets contain unique data before concatenating. If datasets are identical, only one is needed.
Data Processing:
import geopandas as gpd
# Load data
socaled = gpd.read_file("path/to/socaled.geojson")
# Convert MW to kW (column is stored as string)
socaled['load_kw'] = (socaled['ica_overall_load'].astype('float')) * 1000
# Retain only necessary columns
socaled = socaled[['load_kw', 'geometry']]
# Add utility identifier
socaled['Utility'] = 'socaled'
# Set CRS and save
socaled = gpd.GeoDataFrame(socaled, geometry='geometry')
socaled.set_crs(epsg=4326, inplace=True)
socaled.to_file('socaled_load.geojson', driver='GeoJSON')Combine all processed utility datasets into a single file:
import pandas as pd
import geopandas as gpd
# Load all utility files
pge = gpd.read_file('pge_load.geojson')
ladwp = gpd.read_file('ladwp_load.geojson')
sdge = gpd.read_file('sdge_load.geojson')
socaled = gpd.read_file('socaled_load.geojson')
# Concatenate
utility_lines = pd.concat([pge, ladwp, sdge, socaled], ignore_index=True)
# Set CRS and save
utility_lines = gpd.GeoDataFrame(utility_lines, geometry='geometry')
utility_lines.set_crs(epsg=4326, inplace=True)
utility_lines.to_file('utility_lines.geojson', driver='GeoJSON')Output: Save utility_lines.geojson to jurisdiction_script/data/other/
Convert utility circuit lines into a 100m x 100m pixel grid covering areas within 75 meters of utility infrastructure.
Command:
cd jurisdiction_script
python create_utility_pixels.py \
-i data/other/utility_lines.geojson \
-o data/grids/utilities_pixels.json \
-b 75Process:
- Creates 100m x 100m grid covering California (~98 million grid points)
- Buffers utility lines by 75 meters
- Clips grid to areas within utility buffer (~2 million pixels)
- Converts point centroids to square polygons
- Saves output to
data/grids/utilities_pixels.json
Performance Requirements:
- Memory: 16-32GB RAM
- Processing Time: 45-90 minutes
- Output Size: ~400-500MB
Output: Save utilities_pixels.json to jurisdiction_script/data/grids/
Configuration files are located in jurisdiction_script/config/ as YAML files.
Update the following paths:
- Feasibility pixels: Update to reference new
utilities_pixels.json - Utility lines: Update to reference new
utility_lines.geojson
Execute the main processing script:
cd jurisdiction_script
python jscript.py config_fileReplace config_file with the appropriate configuration file name (without .yaml extension).
Example:
python jscript.py alameda_berkeleyOutput: Priority and feasibility JSON files will be generated in jurisdiction_script/out/
[jurisdiction]_priority.json[jurisdiction]_feasibility.json
| Data Type | Source |
|---|---|
| California County Boundaries | US Census TIGER/Line |
| California Place Boundaries | US Census TIGER/Line Places |
| Utility | Source |
|---|---|
| Pacific Gas & Electric (PG&E) | PG&E DRP Integration Capacity Map |
| Southern California Edison (SCE) | SCE DRP Portal |
| San Diego Gas & Electric (SDG&E) | SDG&E ICM API Explorer |
| Los Angeles Dept. of Water & Power (LADWP) | LADWP Power GIS Portal |
| Data Type | Source |
|---|---|
| CalEnviroScreen 4.0 | OEHHA CalEnviroScreen |
| EJScreen | Harvard Dataverse |
| CEJST | Harvard Dataverse |
| Data Type | Source |
|---|---|
| Non-White Population (2021 5-yr ACS) | Census Data Portal |
| Disability Characteristics (2021 5-yr ACS) | Census Data Portal |
| Commute Time (2021 5-yr ACS) | Census Data Portal |
Current Implementation:
- EJScreen and CEJST indicators use percentile rankings across US census tracts
- CalEnviroScreen provides intra-state (California-only) percentile comparisons
- This provides both interstate and intrastate comparisons for California
Future Considerations: When expanding to states outside California:
- CalEnviroScreen is California-specific and unavailable for other states
- Consider using EJScreen's intrastate tract comparison option
- This would maintain both inter- and intra-state comparison capabilities using CEJST (interstate) and EJScreen (intrastate)
Common Issues:
-
API URL Changes: Utility provider API endpoints may change. Check source portals for updated URLs.
-
Memory Issues: Pixelation process requires significant RAM. Close other applications or use a machine with more memory.
-
Timeout Errors: When downloading large datasets, use API-based methods rather than direct downloads.
-
Missing Dependencies: Ensure all required Python packages are installed:
conda install -c conda-forge geopandas numpy pandas scipy matplotlib pyyaml fiona shapely beautifulsoup4
-
CRS Mismatches: All output files should use EPSG:4326 (WGS84). Verify CRS after loading external datasets.