This project uses Python to explore bike share data from three major cities in the United States—Chicago, New York City, and Washington. The goal is to write code that imports data and computes descriptive statistics to answer questions about bike share usage patterns. Additionally, the project includes an interactive terminal-based script that allows users to filter and explore the data interactively.
To set up the project locally, ensure you have the following software:
- Python 3, NumPy, and pandas installed (preferably using Anaconda).
- A text editor like Sublime or Atom.
- A terminal application (Terminal on Mac/Linux or Git Bash on Windows).
- Clone the repository:
git clone https://github.com/username/bikeshare-analysis.git
- Navigate to the project directory:
cd bikeshare-analysis - Install the necessary dependencies:
pip install -r requirements.txt
The bikeshare.py script creates an interactive experience in the terminal, allowing users to filter data by city, month, or day of the week. The script then computes and displays various statistics based on the filtered data. To run the script:
python bikeshare.pyThe script prompts users with the following questions:
- Would you like to see data for Chicago, New York, or Washington?
- Would you like to filter the data by month, day, or not at all?
- (If filtering by month) Which month - January, February, March, April, May, or June?
- (If filtering by day) Which day - Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, or Sunday?
Based on the answers, the script displays relevant statistics. Users also have the option to view raw data 5 rows at a time.
- Interactive Filtering: Users can filter data by city, month, and day.
- Descriptive Statistics: Provides insights into popular travel times, stations, trip durations, and user demographics.
- Raw Data Display: Users can choose to view raw data in increments of 5 rows.
The datasets used in this project contain randomly selected data for the first six months of 2017 from the following cities:
- Chicago:
chicago.csv - New York City:
new_york_city.csv - Washington:
washington.csv
These files contain the following columns:
- Start Time: The start time of the trip.
- End Time: The end time of the trip.
- Trip Duration: The duration of the trip in seconds.
- Start Station: The station where the trip started.
- End Station: The station where the trip ended.
- User Type: The type of user (Subscriber or Customer).
Additional columns for Chicago and New York City:
- Gender
- Birth Year
The script computes the following statistics:
- Popular Times of Travel:
- Most common month
- Most common day of the week
- Most common hour of the day
- Popular Stations and Trips:
- Most common start station
- Most common end station
- Most frequent trip from start to end
- Trip Duration:
- Total travel time
- Average travel time
- User Information:
- Counts of each user type
- Counts of each gender (NYC and Chicago only)
- Earliest, most recent, and most common year of birth (NYC and Chicago only)
-
Format your numbers related to time using round()(opens in a new tab) function. It does not make much sense to use decimals like 7.55738383 with seconds/minutes. Instead it should be formatted to 7.6 seconds.
-
User inputs should be made case insensitive, which means the input should accept the string of "Chicago" and its case variants, such as "chicago", "CHICAGO", or "cHicAgo".
-
You should also implement error handlings so your program does not throw any errors due to invalid inputs. For example, if the user enters "Los Angeles" for the city, the error handling should reject the user input and avoid breaking the codes.
-
To further develop your skills in python, you can practice with HackerRank(opens in a new tab), which challenges you with increasingly difficult problems.