To Create a project on Time series analysis on weather forecasting using ARIMA model in Python
ARIMA (AutoRegressive Integrated Moving Average) is a popular time series forecasting technique. It combines three components to capture various aspects of time series data.
-
Autoregressive (AR) Component:
- Uses dependency between an observation and several lagged observations.
- Order (p): Number of lagged observations used in the model.
-
Integrated (I) Component:
- Represents the differencing of raw observations to make the time series stationary.
- Order (d): Number of differencing required to remove the trend or make the series stationary.
-
Moving Average (MA) Component:
- Uses dependency between an observation and residual errors from a moving average model applied to lagged observations.
- Order (q): Number of lagged forecast errors used to correct the prediction.
-
Model Notation:
- Expressed as ARIMA(p, d, q), where:
- p: Order of the autoregressive part.
- d: Degree of differencing.
- q: Order of the moving average part.
- Expressed as ARIMA(p, d, q), where:
-
Stationarity:
- ARIMA assumes that the time series is stationary. Differencing is applied to remove trends and seasonality for stationarity.
-
Model Selection:
- The values of p, d, and q can be selected using ACF (Autocorrelation Function), PACF (Partial Autocorrelation Function), and AIC/BIC criteria.
-
Seasonal ARIMA (SARIMA):
- For seasonal time series, an extended version of ARIMA known as SARIMA can be used, represented as SARIMA(p,d,q)(P,D,Q,m) where m is the number of periods in a season.
-
Applications:
- ARIMA is widely used in stock price forecasting, economic data analysis, and various other time series predictions.
- Explore the dataset of weather
- Check for stationarity of time series time series plot ACF plot and PACF plot ADF test Transform to stationary: differencing
- Determine ARIMA models parameters p, q
- Fit the ARIMA model
- Make time series predictions
- Auto-fit the ARIMA model
- Evaluate model predictions
https://github.com/manojvenaram/TEMPERATUREDATA-using-API
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
data = pd.read_csv("/content/seattle-weather.csv")
data['date'] = pd.to_datetime(data['date'])
data.set_index('date', inplace=True)
def arima_model(data, target_variable, order):
train_size = int(len(data) * 0.8)
train_data, test_data = data[:train_size], data[train_size:]
model = ARIMA(train_data[target_variable], order=order)
fitted_model = model.fit()
forecast = fitted_model.forecast(steps=len(test_data))
rmse = np.sqrt(mean_squared_error(test_data[target_variable], forecast))
plt.figure(figsize=(10, 6))
plt.plot(train_data.index, train_data[target_variable], label='Training Data')
plt.plot(test_data.index, test_data[target_variable], label='Testing Data')
plt.plot(test_data.index, forecast, label='Forecasted Data')
plt.xlabel('Date')
plt.ylabel(target_variable)
plt.title('ARIMA Forecasting for ' + target_variable)
plt.legend()
plt.show()
print("Root Mean Squared Error (RMSE):", rmse)
arima_model(data, 'temp_max', order=(5,1,0))
ARIMA assumes the time series is stationary. Non-stationary data requires differencing or transformations, which can be complex.
Choosing the right values for p, d, and q can be difficult. Wrong values can lead to poor forecasting performance. Requires careful tuning using ACF, PACF plots, and trial-and-error with metrics like AIC/BIC.
ARIMA cannot handle seasonality directly. Seasonal ARIMA (SARIMA) is needed, which adds complexity to the model.
Too many parameters may cause overfitting, where the model fits the noise rather than the actual signal, leading to poor generalization.
ARIMA can struggle with large datasets, as it’s a computationally intensive model due to the autoregressive nature and differencing.
ARIMA is sensitive to outliers, which can distort predictions and affect the overall performance. Assumes Linear Relationships:
ARIMA captures only linear relationships. If the data has nonlinear patterns, ARIMA may not perform well.
Forecasting far into the future can become inaccurate, as the model relies on past values and residuals. The further out, the more errors accumulate.
