End-to-end machine learning project for bike demand prediction using EDA, feature engineering, regression modeling, and validation. Includes business insights and data-driven strategies for optimizing fleet allocation, pricing, and operations based on weather, seasonality, and user behavior
This project analyzes a bike-sharing dataset to understand demand patterns and build a predictive model using Linear Regression. The goal is to identify key factors influencing bike rentals and derive actionable business insights.
- Perform Exploratory Data Analysis (EDA)
- Identify key factors affecting bike demand
- Build a regression model to predict bike rentals (
cnt) - Validate model assumptions
- Generate business insights & strategies
The dataset contains daily bike rental data with features such as:
- π‘οΈ Temperature (
temp,atemp) - π§ Humidity (
hum) - π¨ Windspeed (
windspeed) - π¦οΈ Weather situation (
weathersit) - π Date, month, weekday
- π₯ Casual & registered users
- π― Target: Total rentals (
cnt)
- Distribution plots for numerical features
- Boxplots for categorical variables vs demand
- Correlation heatmap
- Pairplot for feature relationships
- Temperature strongly influences demand
- Bad weather reduces usage significantly
- Demand varies across seasons and months
- Converted categorical variables (month, weekday, weather)
- Created dummy variables
- Removed multicollinearity (
temp,casual,registered) - Feature scaling using MinMaxScaler
- Train-test split (70:30)
- Recursive Feature Elimination (RFE)
- Statistical significance (p-values)
- Variance Inflation Factor (VIF)
- yr, atemp, windspeed, season_spring, mnth_Jul, weathersit_C
- Linear Regression (Statsmodels & Sklearn)
- Multiple models compared:
- 15 features
- 7 features
- 6 features (final optimized)
| Model | RΒ² Score | Remarks |
|---|---|---|
| Full Model | ~0.85 | High complexity |
| Reduced Model | ~0.81 | Best balance |
| Final Model | ~0.79 | Simple & stable |
- Residual analysis β approximately normal
- Residual vs predicted β slight heteroscedasticity
- VIF β low multicollinearity
- Strong generalization on test data
- Distribution plots
- Correlation heatmap
- Residual plots
- Actual vs Predicted scatter plot
- Perceived temperature (atemp)
- Year-over-year growth
- Humidity
- Windspeed
- Bad weather conditions
- High demand: Fall, moderate weather
- Low demand: Spring, extreme summer
- Use weather-based prediction models
- Increase prices in high demand
- Discounts in bad weather
- Allocate bikes based on demand patterns
- Boost demand in low seasons
- Schedule maintenance in low-demand periods
- Python
- Pandas, NumPy
- Matplotlib, Seaborn
- Scikit-learn
- Statsmodels
βββ data.csv
βββ Smart_Mobility_Prediction.ipynb
βββ Data dictinonary.txt
βββ README.md
git clone https://github.com/your-username/bike-demand-analysis.git
cd bike-demand-analysis
pip install -r requirements.txt
jupyter notebook