This project analyzes product trends from Amazon's Best Sellers (amazon.com/Best-Sellers) to identify sales patterns, top-performing categories/brands, and customer preferences. The analysis reveals actionable insights for sellers, marketers, and e-commerce professionals.
Total Data Scraped: 600 rows × 14 columns
After Handling Missing Values: 597 rows × 14 columns
Explore Full Dashboard on Tableau Public
- As the number of reviews increases, the sales of a product tend to increase significantly.
- Products with the highest number of reviews tend to rank closer to #1 in the seller rankings.
- Parent Category: Clothing, Shoes & Jewelry
- Subcategory: Men's Mules & Clogs
(Products in these categories with more reviews are purchased more frequently.)
- The top brand is Ring, followed by others identified in the analysis.
- The leading product is Crocs Unisex Adult Classic Clog.
- Products released earlier (e.g., 2004) show lower sales in recent months, while newer products (e.g., 2024 releases) demonstrate higher sales growth.
- Based on this trend, products released in 2025 are expected to achieve the highest sales.
- A geographical analysis highlights the number of distinct brands across countries.
- For instance, China has 126 distinct brands, the highest among all countries.
- Scraped 600 products using Python (BeautifulSoup/Selenium)
- Raw dataset:
600 rows × 14 columns - Cleaned dataset:
597 rows × 14 columns(Data Preparation Notebook)
- Python > pandas, selenium, BeautifulSoup
- Tableau
- Jupyter Notebook
To replicate or extend this analysis, follow the steps below:
Ensure Python is installed on your machine.
-
Clone the Repository
git clone https://github.com/mominurr/Amazon-Best-Sellers-Data-Analysis.git
-
Create a Virtual Environment
python -m venv myvenv
-
Install Dependencies
pip install -r requirements.txt -
Run the Scraper Script
Execute the script to scrape data from Amazon.
python scraper.py
- The scraped data will be saved as
data/raw_data.csv.
Open and run the data_preparation.ipynb notebook to handle missing and duplicate values.
- The cleaned data will be saved as
data/cleaned_data.csv.
This project is licensed under the MIT License – see the LICENSE file for details.
Contributions are welcome! Feel free to fork the repo and submit a pull request.
For any inquiries or collaborations:
- Portfolio: mominur.dev
- GitHub: github.com/mominurr
- LinkedIn: linkedin.com/in/mominur--rahman
- Email: [email protected]
🚀 Star this repo ⭐ if you find it useful!