Prediction of Bike Sharing Demand using python
- Kaggle: https://www.kaggle.com/c/bike-sharing-demand
- Dataset: https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
- Predicted the demand feature using the multiple linear regression model.
- Removed irrelevant features using Exploratory Data Analysis and Correlation matrix.
- Solved the problem of Auto-correlation among the data points. Considered top 3 correlations.
- Solved the problem of Non-Normality of demand feature. Demand was log-normally distributed.
- Successfully calculated the RMSLE score of 0.356
- date
- season - (1:winter, 2:spring, 3:summer, 4:fall)
- year
- month - (1:12)
- hour
- holiday - 1: Yes, 0: No
- weekday - 0-6 (Sunday to Saturday)
- workingday - 1: Yes, 0: No
- weather - 1: Clear, 2: Mist, 3: Light rain/Light Snow, 4: Heavy rain + Ice pallets
- temp - Normalized temperature in celsius
- atemp - Normalized feeling temperature in celsius
- humidity
- windspeed
- casual
- registered
- demand
- Step 1 - Import the libraries
- Step 2 - Read the CSV file
- Step 3 - Prelim Analysis and Feature Selection
- Step 4 - Data Visualization
- Step 5 - Check for Outliers
- Step 6 - Check for multiple linear regression assumptions
- Step 7 - Create/modify the variables and solving the problem of normality
- Step 8 - Solving the problem of autocorrelation
- Step 9 - Create the dummy variables and drop first to avoid dummy variable trap using get dummies
- Step 9 - Create Test and Train split
- Step 10 - Create the model. Fit and score the model
- Final step - Calculate RMSLE and compare results
Data visualization Analysis results of Categorical Features
- There is variation in demand based on
- Season - Highest demand in Fall season and Lowest demand in Spring season
- Month - High demand from May to October
- Holiday - Demand is less on holidays
- Hour - Peak demand at 8am and 5pm
- Weather - Highest demand in clear weather and Lowest demand in heavy rainy weather
- No significant change in demand due to weekday or working day
- Year-wise growth pattern not considered due to limited number of years
Features to drop
- Weekdays
- Year
- Working day
Data visualization Analysis results of Continuous Features
- Predicted variable 'demand' is not normally distributed
- Temperature and demand appears to have direct correlation
- The plot for temp and atemp appear almost identical
- Humidity and windspeed need more statistical analysis
Features to drop
- atemp
- windspeed



